The IIS Chip Gallery

Occamy (2022)

Additional pictures below, click to see larger versions

Gianna Paulin, Florian Zaruba, Stefan Mach, Manuel Eggimann, Matheus Cavalcante, Paul Scheffler, Yichao Zhang, Tim Fischer, Nils Wistoff, Luca Bertaccini, Thomas Benz, Luca Colagrande, Alfio Di Mauro, Andreas Kurth, Samuel Riedel, Noah Huetter, Gianmarco Ottavi, Zerun Jiang, Beat Muheim, Frank K. Gurkaynak, Davide Rossi, Luca Benini

Main Details

Application	Pulp
Technology	12
Manufacturer	GF
Type	Research
Package	Custom
Dimensions	10500μm x 6950μm
Gates	600 MGE
Voltage	1.2 V
Power	10 W @1GHz
Clock	1 GHz

Description

Occamy is a research prototype to demonstrate and explore the scalability, performance, and efficiency of our RISC-V-based architecture in a 2.5D integrated chiplet system showcasing GlobalFoundries' technologies and its IP ecosystem, as well as Rambus' and Micron's IP ecosystem.

The Occamy project started as a serendipitous outcome of the Manticore high-performance architecture concept we presented at the Hot Chips conference in 2020. After Hot Chips 2020, the PULP Platform team was approached by GlobalFoundries with an exciting proposal to turn a concept architecture into a real silicon design. The project was made possible by the generous contribution and strong support of GlobalFoundries (technology access, expert advice, ecosystem enablement, and silicon budget), Rambus (HBM2e controller IP and integration support), Micron (HBM2e DRAMs supply and integration support), Synopsys (EDA tool licenses and support) and Avery (HBM2e DRAM verification model). We kick-started the Occamy project on the 20th of April 2021 and taped out the Occamy compute chiplet in GlobalFoundries 12nm FinFet technology in July 2022 after less than 15 months of hard work with a team of only <25 people, mostly doctoral students.

In this work, we combine a small and super-efficient, in-order, 32-bit RISC-V integer core called Snitch with a large multi-precision capable floating-point unit (FPU) enhanced with single instruction multiple data (SIMD) capabilities for the following FP formats: FP64 (11,52), FP32 (8,23), FP16 (5,10), FP16alt (8,7), FP8 (5,2), FP8alt (4,3). In addition to the standard RISC-V fused multiply-accumulate (FMA) instructions, the two 8-bit and two 16-bit FP formats have the new expanding sum-dot-product and three-addend summation (exsdotp, exvsum, and vsum) instructions.

To achieve ultra-efficient computation on data-parallel FP workloads, two custom architectural extensions are exploited: data-prefetchable register file entries and repetition buffers. The corresponding RISC-V ISA extensions stream semantic registers (SSRs) and FP repetition instructions (FREP) enable the Snitch core to achieve FPU utilization higher than 90% for compute-bound kernels.

Each Occamy chiplet contains more than 216 Snitch cores organized in groups of four compute clusters. Each cluster shares a tightly-coupled memory among eight compute cores and a high-bandwidth (512-bit) DMA-enhanced core orchestrating the data flow. An AXI-based wide, multi-stage interconnect and dedicated DMA engines help manage the massive on-chip bandwidth. A CVA6 Linux-capable RISC-V core manages all compute clusters and system peripherals. Each chiplet has a private 16GB high-bandwidth memory (HBM2e) and can communicate with a neighboring chiplet over a 19.5 GB/s wide, source-synchronous technology-independent die-to-die DDR link. The dual-chiplet Occamy system achieves and estimated peak performances of 0.768 TFLOp/s for FP64, 1.536 TFLOp/s for FP32, 3.072 TFLOp/s for FP16/FP16alt, and 6.144 TFLOp/s for FP8/FP8alt.

Related Publication

Gianna Paulin, Paul Scheffler, Thomas Benz, Matheus Cavalcante, Tim Fischer, Manuel Eggimann, Yichao Zhang, Nils Wistoff, Luca Bertaccini, Luca Colagrande, Gianmarco Ottavi, Frank K. Gurkaynak, Davide Rossi, Luca Benini, "Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-Based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET", In Proc. IEEE Symposium on VLSI Technology and Circuits 2024, Honolulu, HI, USA, 2024, pp. 1-2,, DOI: 10.1109/VLSITechnologyandCir46783.2024.10631529

Created by make_cg.pl on Mon Jan 27 16:55:02 2025