by

Application | Communication |

Technology | 180 |

Manufacturer | UMC |

Type | Research Project |

Package | LCC84 |

Dimensions | 2500μm x 2500μm |

Gates | 5e+19 |

Voltage | 1.8 V |

Clock | 152 MHz |

The QR decomposition is a key preprocessing algorithm for multiple-input multiple-output (MIMO) communication systems. Numerous advanced MIMO detection algorithms require the QR decomposition of the channel matrix as starting point.This chip contains several modules related to QR decomposition:

**VLSI Architectures for MMSE Sorted QR Decomposition**

The aim of this project was the evaluation and implementation of different VLSI architectures for minimum mean squared error (MMSE) sorted QR decomposition (SQRD). A subset of three different architectures has been chosen for integration: a high-speed dual-core variant, and two low-speed iterative decomposed variants that require small silicon area. These architectures have been compared in terms of hardware complexity, numeric precision, unified throughput, and power consumption.

Moreover, the dual-core MMSE-SQRD architecture has been designed and optimized with system-level aspects in mind for future deployment in the MASCOT testbed. The high-speed dual-core variant is designed to run at a clock frequency of 152 MHz, delivering a sustained throughput of 3.8 M complex-valued 4x4-dimensional matrix decompositions per second. In an IEEE 802.11n scenario employing 52 data subcarriers, this would lead to a total preprocessing latency of 13.7 ms. One core of the dual-core architecture requires 50 k gate equivalents (GE), the small cores need 17 k GE and 13 k GE, respectively.

**Iterative Decomposed MMSE Sorted QR Decomposition**

The task of this semester project was the evaluation and implementation of an iteratively decomposed MMSE sorted QR decomposition (MMSE-SQRD), with the optimization for small area as first, and for reasonable throughput as second objective. The circuit is an alternative to the existing VLSI implementation of the high-speed MMSESQRD architecture [D11], which allows to compare both architectures with respect to unified throughput and hardware complexity.

The architecture has a single memory for storing the matrix to be decomposed. It needs 3052 cycles to perform the complete matrix decomposition.

The design was integrated in UMC 0.18 mm 1P/6M CMOS technology and it runs at a clock frequency of 188 MHz. It requires 10.2 k gate equivalents, corresponding to 0.16 mm2 core area in 180 nm technology. The achieved throughput of this MMSE-SQRD solution amounts to 62'000 matrix decompositions per second.

- Peter Luethi, Andreas Burg, Simon Haene, David Perels, Norbert Felber, Wolfgang Fichtner, "VLSI Implementation of a High-Speed Iterative Sorted MMSE QR Decomposition",
*IEEE International Symposium on Circuits and Systems, ISCAS 2007, Page(s): 1421 - 1424*,**DOI:**10.1109/ISCAS.2007.378495

Created by make_cg.pl on Wed Apr 9 08:48:30 2014