Day 2: AI in the lab @ the MIT Technology Review EmTech Digital Conference
A wrap up of Day 2 at the EmTech Digital Conference, hosted by the MIT Technology Review. On Day 2 the conference covered real AI projects in the lab, algorithms, and chips. The first segment covered “Inside the Leading Labs”:
- Dario Gil, IBM Research, talked about their latest AI advancements. He discussed the urgency of science that we have today, like the Covid-19 vaccine search. In the context of the scientific method, his lab is looking at how AI fits into accelerating solving science problems. He believes that we are in an era of scientific discovery.
His lab uses a combination of super and quantum computers and AI assisted programs. He spoke of an example of accelerating discovery of new materials for semiconductors by 90%. They applied deep search, AI-enriched simulation, generative models, and their autonomous lab to the problem. Deep search involved looking at all the literature on materials using natural language processing and graphics processing to produce structured graphs and information. Then, they use multi-physics simulation to calculate new materials based on input parameters. Generative models are used to analyze parameters and to output a set of chemical molecules for a new material. Finally, they use their autonomous lab to synthesize and create the new material. This process can be used in any business to discover new things in a compressed amount of time. Therefore, the scientific method can be deployed with the business community without the need for scientists. See research.ibm.com for all the details and their flow will be open source for you to try in the future. - Bratin Saha, Amazon Web Services (AWS), talked about making their Cloud smarter by providing ML to AI developers at any business. He walked through use cases: Domino’s used ML to predict pizza orders before they came in so that they sped up delivery. The Nation Football League (NFL) used ML to generate statistics to give viewers information in real time. Lyft used ML to accelerate training of their predictive ride models from days to hours.
Their vision is to make ML accessible to all customers and not just scientists, but developers,
researchers and business analysts. To do this they provide ML infrastructure (based on custom silicon), ML techniques, and ML industrialization (moving data to executable models, similar to today’s software workflow). Like IBM, AWS is looking to support long term science with their ML solutions that solve business problems. - Ilya Sutskever, OpenAI, talked about their huge neural network. The goal of this company to is to make AI advances and deploy them in an open source manner. He covered GPT-3 which is a large neural network to predict the next word in a sentence by analyzing a large corpus of text. This solves many natural language issues. The future is to apply this technology to images and videos. The first step is DALL-E which generates an image based on a text statement, like generate a painting of an owl in realist style. The system learns using images and text. Next, he discussed Clip which maps visual concepts in text to learn on a set of unusual images in order to make recognition more accurate. Standard image recognition systems drastically lose the ability to recognize stylized or blurred images, while Clip retains a high rate of recognition.
Next, he covered Clarity analysis which tries to understand what the Clip neural network is doing. This technology is used to understand why a mistake was made during learning, to tune algorithms and to detect bias. Clarity has shown that individual neurons in their network are tuned to specific things. For example, one neuron is activated by images of hearts. The fascinating aspect of OpenAI is that they might be discovering what the human brain could be doing to recognize images. And, given a set of parameters, the brain can also conjure up a new image in the mind. But, there are still aspects of the results that their neural network produces that they do not understand yet. The results they get are often startling, but correct.
The next segment covered AI algorithms:
- Anima Anandkumar, NVIDIA, covered the mathematics of AI. Her example covered climate predictions which are currently not accurate and very expensive to compute. She explained her work with the Clima Project at Cal Tech. The ability to predict cloud cover is the key to prediction. The project tries to map global measurements to simulations of local cloud cover to predict weather events. They employ operator learning, which is a function-to-function (not vector-to-vector) mapping using partial differential equations (PDEs) in infinite dimensional space. That is different than current neural networks, so they created their own. The physical world is governed by PDEs, but you want to reduce the number of the PDEs for learning. NVIDIA uses the Green’s function for linear PDEs to create a Fourier transform filter for global convolution in their neural network to learn the weights. So far, their work predicts local cloud cover versus the actual cloud cover at the time very accurately. To learn more, check out this article.
- Alexander D’Amour, Google, discussed why the way we train AI is flawed. He covered failure modes he has tested. A common issue is that the model has been trained to solve the wrong problem due to data issues. Another issue is that the model is asked to solve a fragment of an overall problem but when deployed out in the world it fails to solve the overall problem because teams did not test hidden conditions. He likened this to shipping untested software. He then expanded on the second problem called “under specification.” He has found that this issue is very common in the ML world by examining open source algorithms in action using open source data like ImageNet. The way they detected this issue in existing neural networks was: create stress tests to probe a particular requirement, construct a set of models that satisfies evaluations that only varied on simple parameters (like seeding), and then measuring performance. The key indicator was a wide variation of performance between the models. Unfortunately, it was the parameters that affect the results. So, changing the random seed to start training the models, for example, had a big effect on the results. To fix this, training needs to account for requirements and parameters to fully specify the model.
- Xuedong Huang, Azure Cognitive Services at Microsoft, spoke about creating cognitive AI. Image and language processing is very accurate today. But that is not enough because the brain can do so much more using all our senses. Microsoft is involved in the XYZ Code Project: X is text, Z is multi-lingual speech and y is sensory input (currently vision and speech). At the intersection of these three areas is the solution they are working on now and in the future. He relayed an example of the European Parliament where attendees speak 24 different languages. They wanted live translation in real time for their sessions in English, French, and German. He demoed the system showing the speech in all 3 languages in real time from an actual session at the Parliament. The Azure system now supports all 24 languages. Their model takes multiple languages at the same time as input to the training, which is unique.
The final segment of the day covered the actual implementation of AI in chips:
- Bike Xie, Kneron Inc., talked about trends and challenges of AI at the edge. AI today mostly lives in the Cloud and is very power-hungry. We need to move AI to the edge for mass AI adoption by targeting cost, speed, and privacy. Kneron is currently marketing to companies involved in smartphones, smart factories and automotive edge applications, but any company could find an application. They offer two chips currently for image/video and audio inferencing – the KL520 and KL720. The key to using these chips is small AI models, tools for compression and compilation of these models that attain minimal accuracy loss (less than 1%) and lowest power consumption. The technical challenges are: versatility versus efficiency trade-off, knowledge gaps in quantization (reducing bit size) and compression, and sorting requirements between “must-have” versus “nice-to-have.” Over time some of the nice-to-have requirements find their way into the chip, like built-in video encoders. Currently, the application of these chips are targeted to particular applications and the company works closely with customers and their requirements. There is no chip yet that provides a generic AI platform that can be used for any purpose at the edge. And, chips on the edge are not currently able to do on-the-fly training to improve the application. Current AI at the edge solutions are typically connect to powerful devices that can offload compute activities like smartphones, the backbone of autonomous drive systems or factory systems.
- Nicholas Harris, Lightmatter (a spinoff from MIT), covered photonic computing platforms that are purpose-built for AI. Transistors keep shrinking, but the bottleneck is the ability to cool the chip. The size of transistors no longer controls speed, power consumption does. That is why people are working on quantum computing and photonics. Commercial AI neural nets consume power at a rate 5x Moore’s law, consuming up to 15% of the world’s power. We cannot cover the world with data centers, so Nicholas is working on a photonic chip for AI. Light can represent data in different colors to send multiple datasets at the same time through a single AI array (multiply/accumulate) to get big throughput (high clock rates with huge bandwidth) with low power consumption. They offer a dual core, 3d stack chip called Envise. It contains the AI compute array and onboard memory (500 MB of SRAM). The stack is the photonic processing unit layer and the memory layer. They also offer Envise Blade that is a set of the chips on a board with onboard laser generators. And finally, they offer the Envise Rack that houses multiple blade boards. Nicholas is trying to shoot for a general AI inference platform with these products. Another benefit of photonics is the impact on power consumption around the world versus the datacenters that we have today. Photonic chips deal with many domains – digital, analog, physics, light, so they have faced many challenges. It is very clear that Lightmatter sees NVIDA as their competition as the benchmarks they present show orders of magnitude of improvement versus NVIDA GPUs.
The themes of the day: base solutions on the scientific method to support AI for scientific discovery and then map to any business. For the chips segment, the edge is currently targeting inferencing for specific applications and training remains in the lab. But, to transform the world of AI you also need a new platform that is faster and more power efficient than the current transistor-based platform. That platform could be photonics-based.