{"id":127,"date":"2020-05-28T15:49:00","date_gmt":"2020-05-28T19:49:00","guid":{"rendered":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/?p=127"},"modified":"2026-03-26T15:59:04","modified_gmt":"2026-03-26T19:59:04","slug":"semiwiki-what-a-difference-an-architecture-makes-optimizing-ai-for-iot","status":"publish","type":"post","link":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/2020\/05\/28\/semiwiki-what-a-difference-an-architecture-makes-optimizing-ai-for-iot\/","title":{"rendered":"SemiWiki: What a Difference an Architecture Makes: Optimizing AI for IoT"},"content":{"rendered":"\n<p>Excerpt from article: \u201c<a href=\"https:\/\/semiwiki.com\/eda\/286012-what-a-difference-an-architecture-makes-optimizing-ai-for-iot\/\" target=\"_blank\" rel=\"noreferrer noopener\">What a Difference an Architecture Makes: Optimizing AI for IoT<\/a>\u201d<\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/semiwiki.com\/wp-content\/uploads\/2020\/05\/HLS-PPA-min.png\" alt=\"HLS PPA results\" width=\"452\" height=\"172\"\/><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Last week <a href=\"https:\/\/semiengineering.com\/entities\/mentor-a-siemens-business\/\" target=\"_blank\" rel=\"noreferrer noopener\">Siemens EDA<\/a> hosted a virtual event on designing an AI accelerator with HLS, integrating it together with an Arm Corstone SSE-200 platform and characterizing\/optimizing for performance and power. Though in some ways a recap of earlier presentations, there were some added insights in this session, particularly in characterizing various architecture options.<\/p><p>Mike Fingeroff kicked off with high-level design for the accelerator, showing a progression from a na\u00efve implementation of a 2d image convolution with supporting functions (eg pooling, RELU) in software. This delivered 14 seconds per inference where the final goal was 1 second. His first goal was to unroll loops and pipeline. New here (to me at least) is that Catapult generates a GANTT chart, giving a nice schedule view to guide optimization. So Mike unrolls and finds he has memory bottlenecks, also highlighted by a Silexica analysis. Not surprising since he\u2019s using a 1-port memory, again with na\u00efve reads and writes. He switches to a shift-register and line-buffer architecture supporting a 3\u00d73 sliding window in convolution and the bottleneck problem is solved. He also looks at Silexica analyses to decide how\/if to buffer weights. Now he\u2019s down to just over a second per inference with bias, RELU and pooling still in software (running on the embedded CPU).<\/p><\/blockquote>\n\n\n\n<p>Read the entire article on <a href=\"https:\/\/semiwiki.com\/eda\/286012-what-a-difference-an-architecture-makes-optimizing-ai-for-iot\/\" target=\"_blank\" rel=\"noreferrer noopener\">SemiWiki<\/a> originally published on May 28th, 2020.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Excerpt from article: \u201cWhat a Difference an Architecture Makes: Optimizing AI for IoT\u201d Last week Siemens EDA hosted a virtual&#8230;<\/p>\n","protected":false},"author":77876,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spanish_translation":"","french_translation":"","german_translation":"","italian_translation":"","polish_translation":"","japanese_translation":"","chinese_translation":"","footnotes":""},"categories":[1],"tags":[368,367,366],"industry":[],"product":[84],"coauthors":[349],"class_list":["post-127","post","type-post","status-publish","format-standard","hentry","category-news","tag-architecture","tag-optimizing-ai","tag-optimizing-ai-for-iot","product-catapult"],"_links":{"self":[{"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/posts\/127","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/users\/77876"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/comments?post=127"}],"version-history":[{"count":1,"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/posts\/127\/revisions"}],"predecessor-version":[{"id":128,"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/posts\/127\/revisions\/128"}],"wp:attachment":[{"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/media?parent=127"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/categories?post=127"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/tags?post=127"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/industry?post=127"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/product?post=127"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/hlsdesign-verification\/wp-json\/wp\/v2\/coauthors?post=127"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}