Working Of the Speech Production

Below is the diagram of the human vocal organs:

The larynx provide short pulses with a fundamental frequency the spectrum of the output are impulses with frequency multiples of the fundamental frequency and decreasing amplitude. The larynx takes air as the input from the lungs and gives pulses of air as output. This is done with the help of the vocal folds the sub glottal pressure forces the vocal folds to open and then move outward; when the glottis is open and the pressure between the folds drops, they reverse and then move inward the momentum of inward movement and, finally, the increased suction of the Bernoulli effect causes the folds to close abruptly. The sub glottal pressure and elastic restoring forces during closure causes the cycle to begin again

This is called the source. The variation of the frequency cannot be explained well with the changes in the larynx so we represent the larynx and vocal fold by an artificial vocal. The vibration of the fold in larynx makes the vocal sound. The tension in the fold may vary causing a change in the fundamental frequency The spectrum of the output of the larynx is as shown below. This is a typical output characteristic.

The amplitude decreases for higher frequencies.

The vocal tract is the most important part in the production of the sound. The output of the source is nearly the same for a person speaking at a constant pitch. The vocal tract acts as a resonating tube of varying cross-section. The resonances in the speech production are called the formants.

The frequency of resonant oscillation of the air in the vocal-tract is similar to in that in a tube like bottle. When we uncork a bottle a pulse starts it oscillates between the two ends of the bottle with a frequency inversely proportional to the length. In, vowel production, pulses of airflow are emitted by the glottis into the pharynx each glottal pulse is propagated upward to the open mouth as a pressure wave; at the mouth the pressure pulse produce an outward pulse of air flow, the escaping air particles of which now appear as a rarefaction (negative) pressure pulse for propagation back towards the vocal fold surfaces. The vocal fold acts like the bottom of a bottle and reflect the rarefactions pulse; it then is propagated upward again and so on. Thus the glottal pulse is repeatedly reflected back and froth between the vocal folds and the mouth. This round tip propagation is so fast that typically 10 such round between a glottal pulse. The resonances of the vocal tract are called the formants.
The frequency response of the tract (filter) is typically as below.
The typical response for no constriction tract

The no constriction tract response is also called the neutral position the vowel produced is [ ]
The formant for the above is 500Hz typical for a male sound. The formants are the acoustic properties of the vocal tract that produced the spectrum. The formant frequency locations for vowels are affected by three factors:-
· Length of the oral tract
· Location of the constriction in the tract
· Narrowness of the constrictions

Working of the Ear:

An ear is divided into three parts: outer, middle and inner ear.

The outer ear consists of “pinna”. It has a quite significant effect on incoming sound, particularly at high frequencies, and contribute to out ability to localize sound- that is, where they are coming from. The incoming sound waves travel down the “auditory meatus “and cause the eardrum to vibrate. Eardrum is the interface between the outer and middle ear. The vibrations of the eardrum are transmitted through the middle ear by three small bones, called the “auditory ossicles”, to a membrane covered opening in bony wall of inner ear.

The inner ear consists of a fluid-filled spiral-shaped structure called “cochlea”. The middle ear ensures that the vibrations of eardrum are transferred efficiently to fluids inside cochlea. The middle ear also increases the sound pressure by large factor. This extra intensity ensures that much larger fraction of the sound energy is transmitted to the inner ear.

Cochlea is spiral shaped tube with bony walls filled with incompressible fluid. It is the most important part of ear. This is the part responsible for the signal processing. Main component of cochlea is “Basilar Membrane”, which is not rigid. The displacement in basilar membrane caused by the inward motion of the stapes is transmitted along the basilar membrane just like the impulse is transmitted along a string which has given a sharp flick. Basilar Membrane does not have a homogeneous structure. Instead, it is rather narrow and stiff at basal end and gets wider and more flexible near apical end. So, when cochlea responds to a sinusoidal stimulus, the wavelength and amplitude of the wave in basilar membrane change as it travels along it. The wavelength gets shorter and the amplitude gradually increases until it reaches a certain point on the basilar membrane, after which it rapidly decreases. Whatever, the stimulus frequency, the disturbance on the basilar membrane does not travel much further after the amplitude has peaked. Since the point on the basilar membrane at which maximum displacement occurs varies with frequency, the basilar membrane effectively separates out the sinusoidal components in the stimulus. In other words, it performs a crude form of Fourier analysis.

Now we need to know what is the frequency resolution of basilar membrane? That is how close must two simple tones be in frequency before the basilar membrane is unable to distinguish them. we can measure this by assuming that it behaves as a band pass filter, with particular frequency, and band width and a sloping transition band at upper and lower cutoff frequencies.

The basilar membrane is Non-Linear. This means that one can’t predict how it will respond to quite sounds by examining how it responds to loud ones. After processing in basilar membrane the spectral information is transferred to auditory nerve. This happens through organ of “Corti”. It is combination of “tectorial membrane” and “hair cells”. The hairs on outer hair cell actually make contact with the tectorial membrane but the inner hair cells probably do not. The tectorial membrane is effectively hinged at its edge so that when basilar membrane moves up and down, the tectorial membrane slides over it with a shearing motion, causing the hairs on the hair cells to be displaced. This causes the inner hair cells to fire and send signals up the auditory nerve to the brain.