GSM Speech Processing : An Overview of Codecs and Compression Techniques
Speech processing in GSM involves converting voice signals into digital form, using compression and encoding techniques to optimize bandwidth usage while maintaining voice quality. GSM employs codecs like Full Rate (FR), Half Rate (HR), and Enhanced Full Rate (EFR) to balance between bandwidth efficiency and sound clarity. This article explores the core speech processing technologies that enable efficient voice transmission in GSM networks.
GSM speech processing is a vital aspect of the GSM mobile communication system, enabling efficient transmission of voice signals over limited bandwidth while maintaining good voice quality. The process involves converting voice into digital signals, compressing them, and encoding them for transmission. GSM employs various codecs and compression techniques to achieve this goal.
Key Concepts in GSM Speech Processing:
• Voice Digitization: In GSM, the first step is converting analog voice signals into digital data
through a process known as Pulse Code Modulation (PCM). The voice is sampled at a rate of 8 kHz,
and each sample is quantized into a digital format.
• Speech Compression: After digitization, speech compression is applied to reduce the
data rate for efficient transmission. GSM uses compression techniques that minimize the bandwidth
required for voice transmission while maintaining intelligible speech quality.
GSM Codecs and Their Role
1. Full Rate (FR) Codec:
Bit Rate: 13 kbps
The Full Rate codec was the original codec used in GSM, designed to
strike a balance between voice quality and data rate. It uses Regular Pulse Excitation -
Long Term Prediction (RPE-LTP) coding to compress speech data.
Pros: Offers reasonable voice quality for the available bandwidth.
Cons: Requires more bandwidth compared to later codecs.
2. Half Rate (HR) Codec:
Bit Rate: 6.5 kbps
The Half Rate codec further compresses speech, requiring half the bandwidth of the Full Rate codec.
It uses Vector-Sum Excited Linear Prediction (VSELP) to achieve this compression.
Pros: Doubles the network capacity as it requires less bandwidth.
Cons: Voice quality is slightly reduced compared to the Full Rate codec.
3. Enhanced Full Rate (EFR) Codec:
Bit Rate: 12.2 kbps
The Enhanced Full Rate codec was introduced to improve voice quality over the Full
Rate codec without significantly increasing bandwidth usage. It uses Algebraic
Code-Excited Linear Prediction (ACELP), a more efficient compression algorithm.
Pros: Provides better speech quality, comparable to modern mobile standards.
Cons: Slightly more complex processing than the Full Rate codec.
4. Adaptive Multi-Rate (AMR) Codec:
Bit Rate: Varies between 4.75 kbps and 12.2 kbps
AMR is a flexible codec that dynamically adjusts its data rate based on network conditions.
It uses different modes, switching between them as needed to balance voice quality and bandwidth efficiency.
Pros: Adapts to network congestion and radio conditions, providing optimal performance in various scenarios.
Cons: Increased complexity due to the adaptive nature of the codec.
GSM Compression Techniques
• Linear Predictive Coding (LPC):
GSM uses LPC-based algorithms to predict the next sample of a speech
signal based on previous samples, reducing the amount of data needed for transmission.
It models the human vocal tract to efficiently represent speech.
• Speech Frames:
In GSM, compressed speech data is organized into 20 ms frames.
Each frame is transmitted over the network, with error detection and correction mechanisms
in place to maintain the integrity of the data.
• Error Resilience:
GSM employs techniques like Forward Error Correction (FEC) and error concealment
to enhance the robustness of speech transmission. This ensures that minor errors
due to poor signal conditions do not significantly degrade voice quality.
GSM Physical Layer
These modules are speech coding, channel coding, interleaving, ciphering, burst assembly, modulation. Speech coding block uses 13kbps RELP (Residually Excited Linear Predictive coder). Channel coding block uses convolution coding of rate 1/2 with constraint length of 5. Interleaving block does diagonal interleaving, after 456 encoded bits in 20ms duration are broken into 57 bits sub-blocks.
There will be about total 8 sub blocks of 57 bits each. Ciphering block uses A3 and A5 encryption algorithms. Encryption is changed call by call to enhance privacy. Burst assembly block frames the burst as required by GSM frame structure. The same is modulated and Gaussian filtered. Modulation block minimizes the occupied BW using GMSK modulation with BT of 0.3.
Benefits of GSM Speech Processing
• GSM’s speech processing techniques, especially with
adaptive codecs like AMR, allow for the efficient use of limited radio spectrum.
• Despite compression, GSM’s codecs maintain voice clarity, providing a
reliable communication experience even in challenging network conditions.
• The ability to use half-rate channels and adaptive codecs allows GSM to
support more users within the same bandwidth.
Conclusion
GSM speech processing, through its use of advanced codecs and compression techniques, has revolutionized mobile voice communication by efficiently transmitting high-quality voice over limited bandwidth. With the introduction of adaptive codecs like AMR, GSM can dynamically optimize voice quality and network capacity, making it one of the most successful mobile standards globally.