{"id":50006,"date":"2023-06-13T13:58:53","date_gmt":"2023-06-13T17:58:53","guid":{"rendered":"https:\/\/blogs.sw.siemens.com\/simcenter\/?p=50006"},"modified":"2026-03-26T06:32:40","modified_gmt":"2026-03-26T10:32:40","slug":"cfd-benchmark-4th-gen-amd-eypc-v-cache","status":"publish","type":"post","link":"https:\/\/blogs.sw.siemens.com\/simcenter\/cfd-benchmark-4th-gen-amd-eypc-v-cache\/","title":{"rendered":"CPU Cache and CFD \u2013 a Core Friendship"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Why don\u2019t we have a <strong>single number<\/strong> that tells us how fast the system will do the simulation?<\/p>\n<cite>Jean Doe, Head of Simulation, Engineering Ltd.<\/cite><\/blockquote>\n\n\n\n<p>It is just not that easy! As we tried to point out in earlier blogs, there is not one single criterion to measure the computing speed of a processor. Nor is there a synthetic benchmark that can tell how processors will really perform in your daily tasks.<\/p>\n\n\n\n<p>If your tasks are Computational Fluid Dynamic simulations, published results from hardware and software vendors can give you an idea of <a href=\"https:\/\/blogs.sw.siemens.com\/simcenter\/an-engineers-guide-to-the-cfd-hardware-galaxy\/\">which hardware<\/a> is suited for your application. But how do processors actually solve your CFD simulation? Let us just imagine your job was not a CFD simulation but renovating your house&#8230;<\/p>\n\n\n\n<figure class=\"wp-block-video\"><video controls src=\"https:\/\/videos.mentor-cdn.com\/mgc\/videos\/5400\/ec75ca2b-9d92-4d0f-bd11-edbef4ab5e26-en-US-video.mp4\"><\/video><\/figure>\n\n\n\n<h1 class=\"wp-block-heading\">The easy part: the more, the merrier!<\/h1>\n\n\n\n<p>If you renovate your house, just invite all your friends* and share the work among them. The job will be done in no time if you find, say, 95 extra pairs of helping hands. That\u2019s the same thing processors do. The latest 4<sup>th<\/sup> Gen AMD EPYC\u2122 processors can share all tasks among their up to 96 cores. This way you will not only finish earlier, but altogether you will also <a href=\"https:\/\/blogs.sw.siemens.com\/simcenter\/amd-epyc-4th-generation-9004-series-cpu-cfd-benchmark\/\">save effort \u2013 or energy<\/a>!<\/p>\n\n\n\n<p>However, not all friends are the same. We all have that one friend that we ask for help \u2013 and in the end doing the job alone would have been faster. But there are also those who put in mad hustle to get more work done. Processor cores are just the same. Some are faster; some are slower. That\u2019s due largely to clock speed, measured in Gigahertz. But beware! Some people just look extremely stressed and busy \u2013 and don\u2019t really finish much in all that hustle. For processors, this is why it\u2019s important to look at \u201cinstructions per cycle\u201d \u2013 how much work processors will actually do in their clock time.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Fast friends finish faster?<\/h1>\n\n\n\n<p>The issue is: even if you invite the most productive bunch of friends to your construction site, there is no guarantee that you will finish fast! It is also a matter of tools and where you store them. Let\u2019s say one of your friends has a workshop with all the tools humans have ever invented \u2013 and you want to pick all the relevant tools and machines and load them into your van before you go to the construction site. In our computer, this workshop would equate to the hard drive, the main storage. For your CFD simulation, you should pick all the relevant tools and machines and load them into your van before you go to the construction site. This van is our representation for our Random Access Memory, short RAM or just memory.<\/p>\n\n\n\n<p>Once you arrive at your renovation site, you put a limited selectionthe most important tools for this job into your toolbox, which you share with your 95 friends. In a processor, this toolbox is called Level 3 CPU cache, L3 for short. Everybody also has a toolbelt where they put a few favorite tools, but these aren\u2019t shared; they are just for one worker. (Ask your friends, do they like others to reach into their pocket??) This represents our L2 CPU cache, where there\u2019s only enough space for some tools.<\/p>\n\n\n\n<p>While doing the actual renovation, everybody has a tool in hand and hopefully a set of instructions in the other. This is the processor\u2019s closest cache, with Level 1 data (our tool) plus Level 1 instructions. As it was with L2, this is only accessible for one friend &#8212; don\u2019t rip the screwdriver out of their hands! Put it into the shared toolbox. Likewise, processor cores are polite; they wait for the tool to be laid into the L3 CPU cache before they get their hands on them.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"814\" src=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/1-Worker-1024x814.png\" alt=\"\" class=\"wp-image-50175\" srcset=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/1-Worker-1024x814.png 1024w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/1-Worker-600x477.png 600w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/1-Worker-768x611.png 768w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/1-Worker-900x716.png 900w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/1-Worker.png 1377w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>L1 instructions and data are stored in hand. If you need them, they are right there. But there is only space for one tool and a small instruction set. More can be stored in L2, but to get to it, our friends need to put away the stuff in hand and reach into their pocket.<\/strong><\/p>\n\n\n\n<p>Over the renovation project, there will be countless occasions where the tools you need are not in your hand or in your pocket, so you have to grab them from the toolbox, our L3 CPU cache in the processor world. <strong>There is far more space here, but you also take more time to reach it.<\/strong><\/p>\n\n\n\n<p>Things take even more time if you must go to the van because you forgot to put everything in your toolbox. Walking to the van takes a few minutes, but this is still business as usual, on any construction site or in any CFD simulation. Most of the data used for CFD is already present in memory.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"650\" height=\"718\" src=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/2-Arrow.png\" alt=\"\" class=\"wp-image-50169\" srcset=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/2-Arrow.png 650w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/2-Arrow-543x600.png 543w\" sizes=\"auto, (max-width: 650px) 100vw, 650px\" \/><\/figure>\n\n\n\n<p>What really would ruin your day is if you forgot something at the workshop and had to drive back and forth\u2026 but that should never happen in the world of CFD simulation (\u201cswapping\u201d).<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Making your friends more productive<\/h1>\n\n\n\n<p>It\u2019s clear that the ideal scenario would be for your toolbox to have space for more tools. That way the renovation would finish faster, because your most productive friends would waste less time walking to the van. And this is exactly what AMD has done with their AMD 3D V-Cache\u2122 technology. They added extra L3 CPU cache \u2013 that now exceeds one gigabyte! I can still recall the days when computers had whole hard drives of that size! &nbsp;Also, bigger toolboxes are more practical than fancy working trousers with 17 more pockets \u2013 or a third hand for each of your 95 friends!<\/p>\n\n\n\n<p>When AMD brought this huge leap forward to the market, by introducing their AMD EPYC\u2122 processors with AMD 3D V-Cache technology in 2022, our faster friends upgraded their vans. This latest generation is able to use even more and faster DDR5 memory than their predecessor.<\/p>\n\n\n\n<p>Furthermore, the introduction of AMD EPYC\u2122 9654 processor was a new high-water mark for total core count in one x86-processor. But now, the AMD EPYC\u2122 9684X has 96 cores AND comes equipped with a total of 1150 MB L3 AMD 3D V-Cache. Depending on the job at hand, this means you can fit significantly more tools into your toolbox.<\/p>\n\n\n\n<p>Our benchmark charts show that in bigger, industry-standard cases with large meshes, the additional CPU cache directly contributes to a speedup in Computational Fluid Dynamics simulations:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"725\" src=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/3-RelativePerformanceAverage-1-1024x725.png\" alt=\"\" class=\"wp-image-50183\" srcset=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/3-RelativePerformanceAverage-1-1024x725.png 1024w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/3-RelativePerformanceAverage-1-600x425.png 600w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/3-RelativePerformanceAverage-1-768x544.png 768w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/3-RelativePerformanceAverage-1-1536x1088.png 1536w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/3-RelativePerformanceAverage-1-900x637.png 900w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/3-RelativePerformanceAverage-1.png 1946w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The diagram above shows the relative performance of all CFD benchmarks performed in Simcenter STAR-CCM+. Green shades represent the standard 4<sup>th<\/sup> Gen AMD EPYC\u2122 processors, whereas the blue shaded bars show the performance of the 3D V-Cache enabled counterparts.<\/p>\n\n\n\n<p>Let\u2019s look at performance by the previously fastest processor for CFD \u2013 the AMD EPYC\u2122 7773X. This processor from 3<sup>rd<\/sup> Gen AMD EPYC has 64 cores and 768 MB L3 CPU cache. Thanks to AMD 3D V-Cache, it outperformed the AMD EPYC 7763 significantly for CFD applications \u2013 with the same amount of cores. That speedup came from the tripled L3 CPU cache size.<\/p>\n\n\n\n<p>Where previous generation processors needed 64 cores to run that fast, the 4th Gen AMD EPYC processors are even faster with half the core count! The 9384X with 32 cores outperforms the 7773X with 64 cores, by 27%.<\/p>\n\n\n\n<p>If 32 cores are enough to beat former 64-core champions \u2013 what happens if we take the core count to the max? The recently introduced AMD EPYC 9654 features 96 cores delivering a speedup of 93% compared to the 3D V-Cache enabled 3<sup>rd<\/sup> gen leader, despite only having half the CPU cache. What happens if you triple that?<\/p>\n\n\n\n<p>Then it is time to crown a new king, as the AMD EPYC 9684X squeezes a whopping 118% speedup out of the 1,150 MB V-Cache.<\/p>\n\n\n\n<p>Just like the necessary tools differ for each renovation project, some simulations in Simcenter STAR-CCM+ benefit more from AMD 3D V-Cache than others. Because of that, we looked at benchmarks covering several typical use cases, from a small 3 million cell mesh around a ship hull combined with multiphase physics to vehicle thermal management simulations with nearly 180 million cells &#8211; as shown in the first video above!<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"756\" src=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/4-AllCases-1024x756.png\" alt=\"\" class=\"wp-image-50179\" srcset=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/4-AllCases-1024x756.png 1024w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/4-AllCases-600x443.png 600w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/4-AllCases-768x567.png 768w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/4-AllCases-1536x1134.png 1536w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/4-AllCases-2048x1512.png 2048w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/4-AllCases-900x665.png 900w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The use cases with bigger meshes profit especially well from the 4<sup>th<\/sup> Gen AMD EPYC processors. In all use cases AMD 3D V-Cache is faster than any frequency-optimized counterpart.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Using family-scale effects<\/h1>\n\n\n\n<p>If you renovate your house, you should not only rely on your friends. What about your partner, your family, your kids? Everybody can bring their bunch of friends and share the work among them! This typically happens on CFD clusters, where several computational nodes with lots of cores work on one simulation.<\/p>\n\n\n\n<p>If it was your renovation project, the big crowd of people might help&#8230;. but also, everyone gets in each other\u2019s way the more crowded the workplace is. You might expect processors to behave this way too, right?<\/p>\n\n\n\n<p>Similarly, if you distribute small simulations with a few million cells over several nodes, you will not see a linear speedup. And if you spread a 3 million cell mesh onto eight nodes with 2&#215;96 cores, these 1,536 cores are only twice as fast as the initial 2&#215;96. Not even 3D V-Cache can change this. But as every skilled CFD engineer knows, 3 million cells are no match for modern processors and for efficient solvers like Simcenter STAR-CCM+.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"719\" src=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/5-Scalability-1024x719.png\" alt=\"\" class=\"wp-image-50172\" srcset=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/5-Scalability-1024x719.png 1024w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/5-Scalability-600x421.png 600w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/5-Scalability-768x539.png 768w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/5-Scalability-1536x1078.png 1536w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/5-Scalability-900x632.png 900w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/5-Scalability.png 1960w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>But if you have a CFD simulation with 100 million cells or more, Simcenter STAR-CCM+ and AMD 3D V-Cache just roll up their sleeves and get the job done. Now the CPU cache size pays back even more than before!<\/p>\n\n\n\n<p>Now, if you were renovating not your home but a castle, there would be plenty of room for everyone. In this case, short paths to tools pay back even more! Because one person or group working in different rooms still have to walk the whole castle to reach the toolbox. Likewise, when you add nodes to the cluster, you see even better performance with 3D V-Cache Technology.<\/p>\n\n\n\n<p>With superlinear scaling from AMD EPYC processors, eight computational nodes are not just eight times faster but a whopping 11.12x faster! It is phenomenal how new generations of processors make efficient solvers run even faster than should logically be possible. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"605\" height=\"279\" src=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/6-Joke.png\" alt=\"\" class=\"wp-image-50173\" srcset=\"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/6-Joke.png 605w, https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/6-Joke-600x277.png 600w\" sizes=\"auto, (max-width: 605px) 100vw, 605px\" \/><\/figure>\n\n\n\n<p>\u201cAMD, the AMD arrow logo, EPYC, AMD 3D V-Cache and combinations thereof are trademarks of Advanced Micro Devices, Inc.\u201d<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>* This analogy is meant for non-engineers. We engineers do know how CPU cache works, we don\u2019t need silly metaphors. And also, we don\u2019t have 95 friends.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Why don\u2019t we have a single number that tells us how fast the system will do the simulation? Jean Doe,&#8230;<\/p>\n","protected":false},"author":85330,"featured_media":50168,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spanish_translation":"","french_translation":"","german_translation":"","italian_translation":"","polish_translation":"","japanese_translation":"","chinese_translation":"","footnotes":""},"categories":[1],"tags":[242],"industry":[125,89,137,145,150,155,165,166,172,171],"product":[513],"coauthors":[24061],"class_list":["post-50006","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news","tag-computational-fluid-dynamics-cfd","industry-aerospace-defense","industry-automotive-transportation","industry-consumer-products-retail","industry-electronics-semiconductors","industry-energy-utilities","industry-industrial-machinery-heavy-equipment","industry-media-telecommunications","industry-medical-devices-pharmaceuticals","industry-small-medium-business","industry-software-development","product-simcenter-star-ccm"],"featured_image_url":"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/6\/2023\/06\/Worker.png","_links":{"self":[{"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/posts\/50006","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/users\/85330"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/comments?post=50006"}],"version-history":[{"count":5,"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/posts\/50006\/revisions"}],"predecessor-version":[{"id":50184,"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/posts\/50006\/revisions\/50184"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/media\/50168"}],"wp:attachment":[{"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/media?parent=50006"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/categories?post=50006"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/tags?post=50006"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/industry?post=50006"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/product?post=50006"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/simcenter\/wp-json\/wp\/v2\/coauthors?post=50006"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}