Tutorials » Speech Coding Tutorial: Basics and Techniques

Speech Coding Tutorial: Basics and Techniques

speech coding audio codec signal processing wireless communication pcm

This tutorial describes speech coding basics and covers various speech codec techniques such as PCM, ADPCM, CELP, and EVRC employed in today’s wireless networks such as GSM, CDMA, and more.

Speech Coding Basics

Speech coding refers to the process of converting speech data into digital data while maintaining acceptable speech quality and using fewer bits for each digitized voice sample. Speech data is sampled and quantized before each sample is mapped. The oldest technique used in all telephone exchanges is PCM, which provides speech data at 64 kbps. Later techniques were designed to decrease the speech rate due to bandwidth limitations in the air interface standards of various wireless technologies such as GSM, CDMA, LTE, and more. The reduction in speech codec data rate should not impact the quality of the speech. This is the highest priority for all speech codecs.

Speech Codec Techniques

Various techniques are adopted in wireless mobile phones for speech codec (compression and decompression) purposes. This includes PCM, ADPCM, CELP, and EVRC. GSM uses a 13 kbps speech data rate using the CELP technique. The other speech codecs available in GSM include FR (Full Rate), HR (Half Rate), EFR (Enhanced Full Rate), and AMR (Adaptive Multi Rate). FR provides 13 kbps, HR provides 6.5 kbps, EFR provides 12.2 kbps, and AMR provides from 4.75 to about 12.2 kbps.

CDMA uses various speech codec rates such as 8.55 kbps/9.6 kbps/13.3 kbps with the CELP speech codec. We will explore various speech codec techniques below.

PCM Speech Codec

PCM means Pulse Code Modulation and is a technique used in telephone exchanges on subscriber wire lines. It has about a 64 kbps line rate. The following diagram depicts a typical PCM encoding and PCM decoding process.

speech coding PCM encoder decoder

Fig.1 speech coding PCM encoder decoder

As shown in the figure, the speech signal is first filtered with a 300Hz to 3.3 KHz band pass filter, and then a compression operation is applied. There are two compression laws known as A-law and μ-law. A-law and μ-law are defined in G.711, which is the ITU-T standard for audio companding operation. A-law is used in Europe and most of the world, while μ-law is used in North America and Japan. A-law converts a 13-bit signed speech sample (linear) to an 8-bit value. μ-law converts a 14-bit signed sample to an 8-bit value after being increased in magnitude by about 32 times.

ADPCM Speech Codec

Adaptive Differential Pulse Code Modulation is a technique used to reduce the speech coding rate achieved using the previous PCM technique. With ADPCM, one can achieve about 16-32 kbps of speech rate. It also helps achieve better S/N over the previous PCM technique. PCM and ADPCM both operate in the time domain.

CELP Speech Codec

With CELP (Code Excited Linear Prediction), algorithms are designed to achieve about 8kbps/4.8Kbps of speech compression while maintaining acceptable speech quality.

Following are the standards most popular for CELP based codec design.

G.728 - It is the standard which performs speech coding at 16 kbps, and LD-CELP( Low Delay Code Excited Linear Prediction) speech codec is used for this purpose.
G.729 - It is the standard which performs audio compression and provides an 8 kbps speech rate. CS-ACELP (Conjugate Structure Algebraic Code Excited Linear Prediction) technique is used for this purpose.

Enhanced Variable Rate Codec(EVRC)

EVRC technique offers high-quality voice by reducing the number of bits needed for coefficients (predictor-linear type). It suppresses background noise and hence further enhances voice quality. EVRC uses the RCELP (Relaxed Code Excited Linear Prediction) algorithm.

In EVRC, there are speech categories which include Full rate-8.55 kbps, half rate-4 kbps and 1/8 rate-0.8 kbps, used in CDMA. The frames are formed every 20 ms similar to GSM.