Application | Communication |
Technology | 180 |
Manufacturer | UMC |
Type | Research Project |
Package | LCC84 |
Dimensions | 2500μm x 2500μm |
Gates | 5e+19 |
Voltage | 1.8 V |
Clock | 152 MHz |
The QR decomposition is a key preprocessing algorithm for multiple-input multiple-output (MIMO) communication systems. Numerous advanced MIMO detection algorithms require the QR decomposition of the channel matrix as starting point.This chip contains several modules related to QR decomposition:
VLSI Architectures for MMSE Sorted QR Decomposition
The aim of this project was the evaluation and implementation of different VLSI architectures for minimum mean squared error (MMSE) sorted QR decomposition (SQRD). A subset of three different architectures has been chosen for integration: a high-speed dual-core variant, and two low-speed iterative decomposed variants that require small silicon area. These architectures have been compared in terms of hardware complexity, numeric precision, unified throughput, and power consumption.
Moreover, the dual-core MMSE-SQRD architecture has been designed and optimized with system-level aspects in mind for future deployment in the MASCOT testbed. The high-speed dual-core variant is designed to run at a clock frequency of 152 MHz, delivering a sustained throughput of 3.8 M complex-valued 4x4-dimensional matrix decompositions per second. In an IEEE 802.11n scenario employing 52 data subcarriers, this would lead to a total preprocessing latency of 13.7 ms. One core of the dual-core architecture requires 50 k gate equivalents (GE), the small cores need 17 k GE and 13 k GE, respectively.
Iterative Decomposed MMSE Sorted QR Decomposition
The task of this semester project was the evaluation and implementation of an iteratively decomposed MMSE sorted QR decomposition (MMSE-SQRD), with the optimization for small area as first, and for reasonable throughput as second objective. The circuit is an alternative to the existing VLSI implementation of the high-speed MMSESQRD architecture [D11], which allows to compare both architectures with respect to unified throughput and hardware complexity.
The architecture has a single memory for storing the matrix to be decomposed. It needs 3052 cycles to perform the complete matrix decomposition.
The design was integrated in UMC 0.18 mm 1P/6M CMOS technology and it runs at a clock frequency of 188 MHz. It requires 10.2 k gate equivalents, corresponding to 0.16 mm2 core area in 180 nm technology. The achieved throughput of this MMSE-SQRD solution amounts to 62'000 matrix decompositions per second.