Tesla's AI Day 2022 in review
Tesla hosted their second AI Day on 30th September 2022 as a recruitment drive to attract new, talented engineers to join their team and for the rest of us, provide an update on the progress of their three biggest technologies; Optimus, Full Self Driving and Dojo. As with last year's event, the news was mind blowing.
If you missed it, the full stream is available on their official channel - above. They demonstrated and explained in great detail the leaps and bounds in all three domains over the last 12 months, which I'll summarise below.
Optimus
In just 6 months since the last event in 2021, a functioning prototype was built and tested using off-the-shelf parts. This time it walked out on stage unassisted, dancing and waving to the crowd to showcase it's mobility. Sure Boston Dynamics have robots that dance, do backflips or obstacle courses, but that took years of work and very careful video editing. For Tesla however, thanks to their existing Full-Self-Driving software it's already able to navigate the world with bipedal locomotion and perform simple tasks, like picking up and moving boxes. With a successful framework in place, they set about building their own custom actuators and a self contained battery system, using the knowledge and parts available from their vehicle production. Taking inspiration from the same biological mechanisms that make our movement possible, because our world is tailored specifically to human ergonomics.
This new generation is being heavily refined and optimised, the team are using cutting-edge simulation models to tune the perfect component design for the job, but also reduce the number of distinct parts. By combining functionality where possible to ensure the easiest production process (leaning on tough lessons learned in the automotive industry over the years) and due to this, deliveries are estimated to begin in the next 3-5 years with an estimated cost of less than $20,000. That is incredible news, getting from project inception to this point in such a short time is truly exceptional and it cannot be understated just how revolutionary this product will become once it starts disrupting the labour market.
Highlights:
52V, 2.3 kWh battery pack with fully-integrated electronics - good for a full day's work
Brain based on FSD computer with custom SoC, Wi-Fi, LTE, Audio and hardware security
28 Structural actuators, 11 Degrees-of-Freedom hands with opposable thumbs
Capable of carrying a 9kg bag and operating tools, along with fine precision grip
Real-world navigation and task comprehension capabilities provided by FSD software
Adding interaction layers (personality etc.) will be trivial in future, along with physical customisations
Expected in 3-5 years for less than 20k USD
Especially now with Optimus, all of their work in AI is bleeding heavily into the Artificial General Intelligence space, which makes sense. It was a clear focus of the event, being mentioned several times, that's the direction this is going and we should not be surprised to hear one day that Tesla are the first to pioneer true, non-biological intelligence.
Full Self Driving
Now running in over 160k cars in the US and Canada, the system continues to receive incremental updates and although there wasn't much new to showcase in terms of features, a lot of the discussion revolved around how they're addressing the hardest challenges and the insanely high level of attention paid to every detail from top to bottom of the training process. They have developed a bespoke file format, '.smol' to be literally as efficient as possible when searching for data, as well as a custom network transport protocol - TTP or Tesla Transport Protocol to facilitate the massively high-bandwidth and low-latency connections required by their training system's hardware.
Speaking of hardware, the team have levelled-up their auto-labelling software, in large part thanks to Dojo which we'll discuss shortly, but their 'conventional' supercomputer is currently using; 14k GPUs - 10k for training and 4k for clip auto-labelling, drawing from a 30 Petabyte distributed video cache, containing 160 Billion frames, cycling 500k video clips through it per day. A new multi-trip reconstruction system is able to seamless splice related clips from the same geographical location together, to amalgamate the data and allow much easier automated 3D labelling; for 10k trips it reduces 5 Million hours of manual work down to just 12 hours of cluster compute time.
They explained how the system runs through and weights all interaction scenarios it faces in split-seconds, something we take very much for granted. How it's constantly learning from the 'shadow' autopilot model that's running in all cars, predicting what should be done at any moment and taking notes back to Tesla when the human driver does something different. With over 3 million cars on the road, their dataset is unparalleled by anyone.
Highlights:
35 releases in the last year containing 281 new data models from the 75k+ that have been trained
Current dataset of 4.8M video clips
+30% faster auto-labelling and 5 million hours of engineer time saved / 10k clips
New language-based model for their lanes neural network
Infinitely configurable simulation modelling using UE5 to test behaviour models
In-car networks running 1 billion parameters, over 150k neural network layers, with > 375k connections
Expected to be ready for worldwide beta rollout by the end of 2022
Elon raised an excellent point regarding the question of when the software should be considered good enough for unlimited public use and from a logical standpoint, as soon as it's proven to be consistently even slightly safer than human drivers, it's dangerous and unethical to consider not deploying it.
Dojo
Tesla's own supercomputer is built on the idea of a unified processing accelerator working from a massive, shared data ingestion layer - between inter-connected chips in inter-connected cabinets. Creating a 'fabric' of processing tiles, each containing 25 of their custom D1 dies, 6 of these tiles create a single system tray and two of those trays are contained in each Dojo cabinet. They have access to 640GB of high-bandwidth DRAM, 1TB/s Ethernet bandwidth and use 100+ Kw of power to provide 54 Petaflops of compute.
The ExaPOD is simply enough of these cabinets as to provide a combined 1 Exaflop of machine-learning compute power and is capable of running 2 full accelerators simultaneously. In terms of what this means for us everyday folk, their Autopilot machine learning is projected to see performance improvements of 3.2x in auto-labelling and 2.9x in occupancy networks. Or in simpler terms, their Dojo hardware has the capability of providing the same compute as 6 traditional GPU-based boxes, for less than the cost of one.
The problem with having a massive computer to crunch numbers is ensuring it consistently has data fed into it for processing, that's proven to be the bottleneck in Tesla's training networks. From the initial prototype tile they saw just 4% usage with the rest of the time spent waiting for input, an issue not observed with the previous GPU-based system. A multitude of small optimisations and efficiency gains have been implemented, the complexities are far too intricate to go into here, but essentially they have worked as usual from the ground up to remove all unnecessary bloat from every area imaginable, to be able to perform their desired task at the most optimum level. Thus they've since managed to bring this number up to 97% and run their network at almost peak output.
Their Dojo cabinet is unbelievably good at what it was designed to do and although they admitted that due to the strict, high-performance requirements of managing them, it's unlikely to be able to purchase them for other projects, however they openly discussed the future possibilities of running an IaaS business model similar to AWS to hire out their raw hardware potential.
Highlights:
Single scalable compute plane with globally addressable fast memory
Orders of magnitude latency advantage over GPUs
New D1 chip > 3x power improvement over initial version
54% reduction in coefficient of thermal expansion (testing 14 versions in 24 months)
72 traditional GPU-based cabinets are superseded by just 4 Dojo cabinets
1st Dojo cabinet installed in late 2021, now at a build rate of 1 tile / day
1st ExaPOD will be complete by Q1 2023
Total of 7 ExaPODs planned for installation in Palo Alto facility
Q&A
They finished the session with a lengthy Q&A where some very in-depth questions were posed by the audience and they were not shy to discuss any topic. The question of mars did not come up, but clearly a humanoid robot is planned as being one of the first occupants of the initial construction phase and will be indispensable in creating a human-friendly habitat there.
There's certainly far more to unpack than what I've managed to glance over in this summary, and many complex details that all but the most up-to-date technical minds in the field are able to follow, but the whole event was fascinating in more ways than just raw information. The fact that Tesla are the only company who are willing to spend over 3 hours going into extremely fine detail, explaining every facet of their technology (and reasoning behind it) with the general public, without fear of competition stealing their intellectual property speaks volumes. They know that what they are doing is necessary to achieve their spectacular goals and just how to attract the right kind of people to help them. Along with the fact that anybody attempting to copy it will be barely scratching the surface by the time Tesla are years further ahead.
Elon is likely correct that an age of abundance is dawning and right now is the best time in history to be alive. Market analysts are unlikely to agree since the presentation lacked a certain level of polish and the usual marketing speak, but this event was the greatest display of honesty, confidence and technical ability from any company around today. I have no doubt Tesla will become the largest company on the planet because they so consistently break the mould and exceed expectations.