Using CFD simulations to remove 17.6 kW of heat from the world’s largest chip

By Katie Tormala

Ever wonder how smart assistants like Siri or Alexa can understand what you’re saying and come up with an answer quickly? These smart assistants, like self-driving cars, use powerful machine learning systems and natural language processing. An engineer trained a computer to accomplish specific tasks by processing large amounts of data and recognizing patterns in the data. As consumer adoption of these technologies continues, high-level artificial intelligence (AI) will also grow.

High-level artificial intelligence requires powerful computing. So, how do you evaluate options to optimize the removal of 17.6kW of heat from a chip over 20 cm on a side containing 84 integrated circuits (IC) sites? In the Thermal management for AI hardware and electronics cooling for a deep learning machine webinar, Guy Wagner from Electronics Cooling Solutions discusses how Simcenter Flortherm XT computational fluid dynamics (CFD) simulations helped the team evaluate options for cooling and determine the best solution.

Evaluating traditional options for air cooling a large, hot chip

First, the team developed quick simulations to assess whether an air-cooled solution could keep within the model constraints. Considering reasonable temperature rises, the team simulated 2,000 CFM and a heat sink.

Why air cooling wouldn’t work:

  • Deafening noise levels
  • Unable to fit the fan configuration in a standard 19” rack
  • Unable to keep the IC sites within a narrow temperature range (not even close)

So, the team at Electronics Cooling Solutions looked at the next viable option – liquid cooling.

Using CFD simulations for reviewing options for liquid cooling a large, hot chip

The team was able to rule out air-cooled as an option quickly and started building liquid-cooled simulations. In this example, a coolant picks up the heat at the cold plate and flows to the heat exchanger (HEX), where the heat is removed and then flows through the pump and back to the cold plate at the heat source.

  1. Cold plate: removes the heat load from the electronics at the required flow rate
  2. Pump: provides the proper pressure and flow rate for the system
  3. Air-to-liquid HEX: removes heat from the coolant at the available airflow and liquid coolant flow rate

Once the team set up the potential solution, they ran simulations to test the required constraints. Simcenter Flotherm XT modeled the fluid flow through the entire module (including all the micro-channels).

Using CFD simulations to cool the largest chip ever built

What was the team trying to cool? Here are a few stats about Cerebra’s Wafer Scale Engine:

  • 46,225 mm2 silicon
  • 1.2 trillion transistors
  • 400,000 AI-optimized cores
  • 18 Gigabytes of on-chip memory
  • 9 Petabytes of memory bandwidth
  • 100 Petabytes/s of fabric bandwidth
  • TSMC 16 nm process

Are you interested in the details of the simulations? Check out the on-demand webinar: Thermal management for AI hardware and electronics cooling for a deep learning machine.

Leave a Reply

This article first appeared on the Siemens Digital Industries Software blog at