Human voice recognition systems (VRSs) are a prerequisite for voice-controlled human-machine interfaces (HMIs). In order to avoid interference from unexpected background noises, skin-attachable VRSs are proposed to directly detect physiological mechanoacoustic signals based on the vibrations of vocal cords. However, the sensitivity and response time of existing VRSs are bottlenecks for efficient HMIs. In addition, water-based contaminants in our daily lives, such as skin moisture and raindrops, normally result in performance degradation or even functional failure of VRSs. Herein, we present a skin-attachable self-cleaning ultrasensitive and ultrafast acoustic sensor based on a reduced graphene oxide/polydimethylsiloxane composite film with bioinspired microcracks and hierarchical surface textures. Benefitting from the synergetic effect of the spider-slit-organ-like multiscale jagged microcracks and the lotus-leaf-like hierarchical structures, our superhydrophobic VRS exhibits an ultrahigh sensitivity (gauge factor, GF = 8699), an ultralow detection limit (ε = 0.000 064%), an ultrafast response/recovery behavior, an excellent device durability (>10 000 cycles), and reliable detection of acoustic vibrations over the audible frequency range (20–20 000 Hz) with high signal-to-noise ratios. These superb performances endow our skin-attachable VRS with anti-interference perception of human voices with high precision even in noisy environments, which will expedite the voice-controlled HMIs.