DSP Tutorial-Architecture for complex DSP algorithm porting
This DSP tutorial page covers factors while implementing DSP algorithm to suit DSP architecture. It also covers concept of Multiplier,Barrel Shifter,MAC Unit,ALU,On chip memory,Parallelism and Pipelining. DSP is widely used in baseband development for wireless technologies such as wimax,LTE and WLAN-11ac,11ad etc.
Common DSP should have on chip registers to store variable/data/intermediate results. It will have on chip memory or external memory interfaced with DSP to store input and output signal vectors. It will have on chip program memory or external memory to store code/program and constant data.
DSP algorithms usually need to operate at higher speed and should provide accurate results to meet system requirements in today's complex communication systems such as LTE, WiMAX and CDMA etc. In order to achieve this following modifications/changes are required in DSP architecture to make DSP work efficiently.
Multiplier - Parallel and array multipliers are usually designed for DSP applications. Speed, accuracy and dynamic range are considered for this.
Barrel Shifter - Usually one clock cycle is required to shift one bit either left or right. Such scheme consume large amount of cycles for multiple bit shifts. For DSP special kind of shifter known popularly as barrel shifter is designed, this shifts multiple bits in a single instruction cycle hence resuces drastic amount of cycles.
MAC Unit - MAC unit is designed such that it will operate both multiplication and accumulation operations in a single instruction cycle as it operate both these operations in parallel. To carry out 512 MAC operations, 513 execution cycles are required. If one MAC unit takes 100 nsec, total time required will be about 513 X 100 X 10-9= 51.3 micro-second.
ALU - Arithmetic Logic Unit is specifically designed for DSP operations taking into considerations overflow, underflow and signs.
Special addressing modes such as circular addressing and bit reversed addressing are used in DSP algorithms. Circular addressing is used to take care of continuous stream of time domain signal in a circular buffer in a baseband receiver chain. Bit reversed addressing is used in implementation of IFFT/FFT algorithms in a complex communication baseband Transmitter/Receiver design.
DSP architecture for Bus and memory
For example we need to execute following instruction in a single cycle.
ADD A, B
In order to map this and run in a single instruction cycle, DSP need to have separate program and two data memories with their own separate address/data buses. This makes DSP fetch and execute this instruction in a single instruction cycle.
On chip memory On chip program memory will be faster than off chip memory, as off chip memory requires de-multiplex address/data bus while accessing code/data from external memory.
Parallelism means availability of multiple function units (arithmetic units) so that computation on address and data will be done by separate units in parallel.
Other example is the design of MAC hardware taking into consideration speed perspective.Use of number of MAC units is limited to reduce the cycle count for the code execution.Pipelining
Pipelining means execution of instruction in parallel, this speed up the execution of the program.
Following parameters to be determined for each algorithm in order to decide which DSP architecture is best to map the DSP algorithm under development.
1.SNR range within which algorithm works best.
2.Input and output data rate.
3.Memory size for data/Variables.
4.Processing time or latency.
5.Code size or program size.
6.Power consumption of the module in ASIC flow.
7.Type the operations viz. Arithmetic and logical.
8.Multiply and Accumulate operation
9.Scaling of the signal i.e. up sampling or down sampling
Useful DSP Links
Refer our page on DSP-FPGA evaluation boards very useful for IP core porting of various wireless and DSP algorithms.
Refer our page on DSP Chip vendors/manufacturers.
Refer our page on what a DSP does.