Bigger isn’t always better or why data compression is the future of AI
The model of continuous growth has been the status quo for AI over the last 20 years as the technology has risen to ever-great prominence in everything from washing machines to state-of-the-art factories. But along the way, AI models have ballooned from thousands up to millions or billions of parameters, exponentially increasing the demands for both training data and computing power. This is not a sustainable trend however, as both meaningful training data and the power required to operate datacenters are finite. In his recent keynote at the 59th Design Automation Conference, visionary technologist, entrepreneur and CEO of Perceive Steve Teig offered a potential solution: data compression.
Data compression has long been seen as a black sheep by the AI/ML community, the idea being that doing away with “big data” by compressing it would lead to reduced accuracy in the models trained on that data. Steve Teig, however, makes exactly the opposite claim. To substantiate this bold claim, he first offers a definition of what compression means in this case, that being the ability to find structure in data. Data compression works by finding patterns in data and then creating a means to represent that data in a smaller, more efficient way. Steve gives the example of transmitting a formula for pi rather than some number of digits, this saves on the amount of data needing to be stored and transmitted while also capturing completely the value of pi. In contrast to compressible data, random data has no meaningful structure and cannot be represented in a smaller way.
The notion that a minimal loss in accuracy in exchange for a massive decrease in power and data consumption being a worthwhile tradeoff is something most people would likely agree to, but the reality of the situation actually surpasses that. While the savings in power and data requirements are very real, accuracy doesn’t suffer at all. In fact, with judicious compression of training data, accuracy can actually improve to an extent. The key to this remarkable improvement lies in the signal-to-noise ratio of the training data and the fact that all data is not of equal value in training AI. Consider an AI algorithm being trained to recognize sailboats, in this context images containing sailboats have much more value than those without – to the extent that any image not containing a sailboat is no different then noise in the data. Similarly, anything within a given picture itself that is not sailboat can also be considered noise. Returning now to the idea of patterns and you see how compression can provide increased accuracy. By continuing the previous example, it’s intuitive that all sailboats contain certain similar core elements which form a basis for the definition of sailboat. By compressing this data and extracting the pattern for “sailboat” not only is there less data that needs processing, but the algorithm becomes less susceptible to the noise or junk data present in the data set, instead focusing on the strong pattern it needs to recognize.
Steve also makes the argument that a lot of arbitrariness exists in model selection since the process involves narrowing down a multi-million parameter model with millions of pieces of training data divided into thousands of classes to just a single model. The problem with this approach is that there isn’t just one good solution but rather an entire region of equally good solutions that exist. However, current methods are unable to effectively distinguish between these different, good solutions meaning that “all” solutions are selected – effectively creating a form of compression, rendering a large portion of the big data used in training useless. With that in mind, by compressing the dataset and changing the definition of what is correct to encompass all these possible solutions can not only provide superior accuracy but at a reduced cost as well.
At the end of his keynote, Steve took a small computer chip out of his pocket to demonstrate the impact of his research. He showed the coin-sized device running multi-million parameter neural networks for image recognition at hundreds of FPS while using less than 50mw of power, a feat impossible using the general-purpose processors current AI runs on. While it will take time for the AI industry to adopt these new principles and next-generation AI chips, what we see today is an inspiring look forward at what will be possible with AI in the near future. With such small and power efficient chips, integrating AI into edge devices will no longer require offloading to distant datacenters, which themselves may become obsolete as the amount of AI processing power they offer shrinks to fit in the palm of your hand.
Check out the full recording of Steve’s keynote here.
Siemens Digital Industries Software helps organizations of all sizes digitally transform using software, hardware and services from the Siemens Xcelerator business platform. Siemens’ software and the comprehensive digital twin enable companies to optimize their design, engineering and manufacturing processes to turn today’s ideas into the sustainable products of the future. From chips to entire systems, from product to process, across all industries. Siemens Digital Industries Software – Accelerating transformation.