|1525μm x 1525μm
Turbo codes are used in various communication standards due to their excellent error-rate performance. One of the throughput bottlenecks of turbo decoders is the M-BCJR algorithm, which forms the core processing unit.
In this project, an 8-state radix-2 max-log M-BCJR architecture designed in a previous semester project has been optimized for high throughput. A radix-4 architecture has been chosen which processes two data items in one clock cycle (compared to one data item in one clock cycle for a corresponding radix-2 architecture). Additionally, all memories have been realized as arrays of latches, which has the benefit of a lower circuit area (compared to SRAM macro-cells), while not limiting the maximum clock frequency.
The resulting throughput-optimized 8-state radix-4 max-log M-BCJR decoder has been implemented in 180 nm 1P/6M CMOS technology. It achieves a sustained decoding throughput of 700 Mbits/s at 350 MHz. The radix-4 M-BCJR implementation requires a circuit area of 0.79 mm2. Compared to the reference radix-2 M-BCJR implementation (area 0.36 mm2 at 375 MHz), the radix-4 implementation has shown to be slightly less efficient in terms of area per throughput, but achieves twice the throughput of the reference implementation at the same clock frequency