Speech coding Tutorial
This tutorial describes speech coding basics and covers various speech codec techniques such as PCM, ADPCM, CELP,EVRC employed in today's wireless networks such as GSM, CDMA and more.
Speech coding basics
Speech coding means the rate at which speech data is converted into digital data by maintaining acceptable speech quality and mapping fewer bits for each digitized voice samples.
Speech data is sampled and quantized before mapping each sample. The oldest technique used in all the telephone exchanges is PCM which provides speech data at 64 kbps. Later techniques were designed to decrease the speech rate due to limitation of bandwidth in the air interface standards of various wireless technologies such as GSM, CDMA, LTE and more. The reduction in speech codec data rate should not impact the quality of the speech. This is the utmost priority of all the speech codec.
Speech codec techniques
There are various techniques adopted in wireless mobile phones for speech codec (compression and decompression) purpose. This includes PCM, ADPCM, CELP, EVRC.
GSM uses 13 kbps speech data rate using CELP technique. The other speech codec available in GSM include FR (Full Rate), HR (Half Rate), EFR (Enhanced Full Rate) and AMR (Adaptive Multi Rate). FR provides 13 kbps, HR provides 6.5kbps, EFR provides 12.2kbps and AMR provides from 4.75 to about 12.2 kbps. CDMA uses various speech codec rates such as 8.55kbps/9.6kbps/13.3kbps with CELP speech codec. We will see various speech codec techniques below.
PCM speech codec :
PCM means Pulse Code Modulation and is the technique used in telephone exchanges on subscriber wire line. It has about 64kbps line rate. Following diagram depicts typical PCM encoding and PCM decoding process.
Fig.1 speech coding PCM encoder decoder
As shown in the figure speech signal is first filtered with 300Hz to 3.3 KHz band pass filter and then compression operation is applied.
There are two compression laws known as A law and μ law. A law and μ law are defined in G.711 which is the ITU-T standard for audio companding operation. A law is used in Europe and most of the world while μ law is used in North America and Japan. A law converts 13 bit signed speech sample (linear) to 8 bit value. μ law converts 14 bit signed sample to 8 bit value after being increased in magnitude by about 32 times.
ADPCM speech codec :
Adaptive Differential Pulse Code Modulation is a technique used to reduce speech coding rate achieved using previous PCM technique. With ADPCM one can achieve about 16-32 kbps of speech rate. It also helps achieve better S/N over previous PCM technique. PCM and ADPCM both operate in time domain.
CELP speech codec:
With CELP (Code Excited Linear Prediction) algorithms are designed to achieve about 8kbps/4.8Kbps of speech compression maintaining the acceptable speech quality. Following are the standards most popular for CELP based codec design.
G.728 - It is the standard which performs speech coding at 16 kbps and LD-CELP( Low Delay Code Excited Linear Prediction) speech codec is used for this purpose.
G.729 - It is the standard which performs audio compression and provides 8 kbps speech rate. CS-ACELP (Conjugate Structure Algebraic Code Excited Linear Prediction) technique is used for this purpose.
Enhanced Variable Rate Codec(EVRC):
EVRC technique offers high quality voice by reducing no. of bits needed for coefficients(predictor-linear type). It suppresses background noise and hence further enhances voice quality. EVRC uses RCELP(Relaxed Code Excited Linear Prediction) algorithm. In EVRC, there are speech categories which include Full rate-8.55kbps,half rate-4kbps and 1/8 rate-0.8kbps, used in CDMA. The frames are formed every 20 ms similar to GSM.
This page describes AMR(Adaptive Multi Rate) basics in GSM.
G.711 can be available on link- https://www.itu.int/rec/T-REC-G.711