• IBM Consulting

    DBA Consulting can help you with IBM BI and Web related work. Also IBM Linux is our portfolio.

  • Oracle Consulting

    For Oracle related consulting and Database work and support and Migration call DBA Consulting.

  • Novell/RedHat Consulting

    For all Novell Suse Linux and SAP on Suse Linux questions releated to OS and BI solutions. And offcourse also for the great RedHat products like RedHat Enterprise Server and JBoss middelware and BI on RedHat.

  • Microsoft Consulting

    For Microsoft Server 2012 onwards, Microsoft Client Windows 7 and higher, Microsoft Cloud Services (Azure,Office 365, etc.) related consulting services.

  • Citrix Consulting

    Citrix VDI in a box, Desktop Vertualizations and Citrix Netscaler security.

  • Web Development

    Web Development (Static Websites, CMS Websites (Drupal 7/8, WordPress, Joomla, Responsive Websites and Adaptive Websites).

20 October 2021

NeuroMorphic Photonic Computing and Better AI

 

Taking Neuromorphic Computing to the Next Level with Loihi 2

Intel Labs’ new Loihi 2 research chip outperforms its predecessor by up to 10x and comes with an open-source, community-driven neuromorphic computing framework

Today, Intel introduced Loihi 2, its second-generation neuromorphic research chip, and Lava, an open-source software framework for developing neuro-inspired applications. Their introduction signals Intel’s ongoing progress in advancing neuromorphic technology.

“Loihi 2 and Lava harvest insights from several years of collaborative research using Loihi. Our second-generation chip greatly improves the speed, programmability, and capacity of neuromorphic processing, broadening its usages in power and latency constrained intelligent computing applications. We are open sourcing Lava to address the need for software convergence, benchmarking, and cross-platform collaboration in the field, and to accelerate our progress toward commercial viability.”

–Mike Davies, director of Intel’s Neuromorphic Computing Lab

Why It Matters: Neuromorphic computing, which draws insights from neuroscience to create chips that function more like the biological brain, aspires to deliver orders of magnitude improvements in energy efficiency, speed of computation and efficiency of learning across a range of edge applications: from vision, voice and gesture recognition to search retrieval, robotics, and constrained optimization problems.

Neuromorphic Chipsets - Industry Adoption Analysis


Applications Intel and its partners have demonstrated to date include robotic arms, neuromorphic skins and olfactory sensing.

About Loihi 2: The research chip incorporates learnings from three years of use with the first-generation research chip and leverages progress in Intel’s process technology and asynchronous design methods.

Advances in Loihi 2 allow the architecture to support new classes of neuro-inspired algorithms and applications, while providing up to 10 times faster processing1, up to 15 times greater resource density2 with up to 1 million neurons per chip, and improved energy efficiency. Benefitting from a close collaboration with Intel’s Technology Development Group, Loihi 2 has been fabricated with a pre-production version of the Intel 4 process, which underscores the health and progress of Intel 4. The use of extreme ultraviolet (EUV) lithography in Intel 4 has simplified the layout design rules compared to past process technologies. This has made it possible to rapidly develop Loihi 2.

The Lava software framework addresses the need for a common software framework in the neuromorphic research community. As an open, modular, and extensible framework, Lava will allow researchers and application developers to build on each other’s progress and converge on a common set of tools, methods, and libraries. Lava runs seamlessly on heterogeneous architectures across conventional and neuromorphic processors, enabling cross-platform execution and interoperability with a variety of artificial intelligence, neuromorphic and robotics frameworks. Developers can begin building neuromorphic applications without access to specialized neuromorphic hardware and can contribute to the Lava code base, including porting it to run on other platforms.

Architectures for Accelerating Deep Neural Nets

"Investigators at Los Alamos National Laboratory have been using the Loihi neuromorphic platform to investigate the trade-offs between quantum and neuromorphic computing, as well as implementing learning processes on-chip,” said Dr. Gerd J. Kunde, staff scientist, Los Alamos National Laboratory. “This research has shown some exciting equivalences between spiking neural networks and quantum annealing approaches for solving hard optimization problems. We have also demonstrated that the backpropagation algorithm, a foundational building block for training neural networks and previously believed not to be implementable on neuromorphic architectures, can be realized efficiently on Loihi. Our team is excited to continue this research with the second generation Loihi 2 chip."

About Key Breakthroughs: Loihi 2 and Lava provide tools for researchers to develop and characterize new neuro-inspired applications for real-time processing, problem-solving, adaptation and learning. Notable highlights include:

  • Faster and more general optimization: Loihi 2’s greater programmability will allow a wider class of difficult optimization problems to be supported, including real-time optimization, planning, and decision-making from edge to datacenter systems.
  • New approaches for continual and associative learning: Loihi 2 improves support for advanced learning methods, including variations of backpropagation, the workhorse algorithm of deep learning. This expands the scope of adaptation and data efficient learning algorithms that can be supported by low-power form factors operating in online settings.
  • Novel neural networks trainable by deep learning: Fully programmable neuron models and generalized spike messaging in Loihi 2 open the door to a wide range of new neural network models that can be trained in deep learning. Early evaluations suggest reductions of over 60 times fewer ops per inference on Loihi 2 compared to standard deep networks running on the original Loihi without loss in accuracy3. Loihi 2 addresses a practical limitation of Loihi by incorporating faster, more flexible, and more standard input/output interfaces. 
  • Seamless integration with real-world robotics systems, conventional processors, and novel sensors: Loihi 2 addresses a practical limitation of Loihi by incorporating faster, more flexible, and more standard input/output interfaces. Loihi 2 chips will support Ethernet interfaces, glueless integration with a wider range of event-based vision sensors, and larger meshed networks of Loihi 2 chips.

More details may be found in the Loihi 2/Lava technical brief.

About the Intel Neuromorphic Research Community: The Intel Neuromorphic Research Community (INRC) has grown to nearly 150 members, with several new additions this year, including Ford, Georgia Institute of Technology, Southwest Research Institute (SwRI) and Teledyne-FLIR. New partners join a robust community of academic, government and industry partners that are working with Intel to drive advances in real-world commercial usages of neuromorphic computing. (Read what our partners are saying about Loihi technology.)

“Advances like the new Loihi 2 chip and the Lava API are important steps forward in neuromorphic computing,” said Edy Liongosari, chief research scientist and managing director at Accenture Labs. “Next-generation neuromorphic architecture will be crucial for Accenture Labs’ research on brain-inspired computer vision algorithms for intelligent edge computing that could power future extended-reality headsets or intelligent mobile robots. The new chip provides features that will make it more efficient for hyper-dimensional computing and can enable more advanced on-chip learning, while the Lava API provides developers with a simpler and more streamlined interface to build neuromorphic systems.”

Deep learning: Hardware Landscape

About the Path to Commercialization: Advancing neuromorphic computing from laboratory research to commercially viable technology is a three-pronged effort. It requires continual iterative improvement of neuromorphic hardware in response to the results of algorithmic and application research; development of a common cross-platform software framework so developers can benchmark, integrate, and improve on the best algorithmic ideas from different groups; and deep collaborations across industry, academia and governments to build a rich, productive neuromorphic ecosystem for exploring commercial use cases that offer near-term business value.

Today’s announcements from Intel span all these areas, putting new tools into the hands of an expanding ecosystem of neuromorphic researchers engaged in re-thinking computing from its foundations to deliver breakthroughs in intelligent information processing.

What’s Next: Intel currently offers two Loihi 2 based neuromorphic systems through the Neuromorphic Research cloud to engaged members of the INRC: Oheo Gulch, a single chip system for early evaluation and Kapoho Point, an eight-chip system that will be available soon.

Introduction

Recent breakthroughs in AI have swelled our appetite for intelligence in computing devices at all scales and form factors. This new intelligence ranges from recommendation systems, automated call centers, and gaming systems in the data center to autonomous vehicles and robots to more intuitive and predictive interfacing with our personal computing devices to smart city and road infrastructure that immediately responds to emergencies. Meanwhile, as today’s AI technology matures, a clear view of its limitations is emerging. While deep neural networks (DNNs) demonstrate a near limitless capacity to scale to solve large problems, these gains come at a very high price in computational power and pre-collected data. Many emerging AI applications—especially those that must operate in unpredictable real-world environments with power, latency, and data constraints—require fundamentally new approaches. Neuromorphic computing represents a fundamental rethinking of computer architecture at the transistor level, inspired by the form and function of the brain’s biological neural networks. Despite many decades of progress in computing, biological neural circuits remain unrivaled in their ability to intelligently process, respond to, and learn from real-world data at microwatt power levels and millisecond response times. Guided by the principles of biological neural computation, neuromorphic computing intentionally departs from the familiar algorithms and programming abstractions of conventional computing so it can unlock orders of magnitude gains in efficiency and performance compared to conventional architectures. The goal is to discover a computer architecture that is inherently suited for the full breadth of intelligent information processing that living brains effortlessly support.

Advances in neuromorphic computing technology

Three Years of Loihi Research

Intel Labs is pioneering research that drives the evolution of compute and algorithms toward next-generation AI. In 2018, Intel Labs launched the Intel Neuromorphic Research Community (Intel NRC) and released the Loihi research processor for external use. The Loihi chip represented a milestone in the neuromorphic research field. It incorporated self-learning capabilities, novel neuron models, asynchronous spike-based communication, and many other properties inspired from neuroscience modeling, with leading silicon integration scale and circuit speeds. Over the past three years, Intel NRC members have evaluated Loihi in a wide range of application demonstrations. Some examples include:

 • Adaptive robot arm control 

• Visual-tactile sensory perception 

• Learning and recognizing new odors and gestures 

• Drone motor control with state-of-the-art latency in response to visual input 

• Fast database similarity search • Modeling diffusion processes for scientific computing applications 

• Solving hard optimization problems such as railway scheduling In most of these demonstrations, Loihi consumes far less than 1 watt of power, compared to the tens to hundreds of watts that standard CPU and GPU solutions consume. 

With relative gains often reaching several orders of magnitude, these Loihi demonstrations represent breakthroughs in energy efficiency.1 Furthermore, for the best applications, Loihi simultaneously demonstrates state-of-the-art response times to arriving data samples, while also adapting and learning from incoming data streams. 

This combination of low power and low latency, with continuous adaptation, has the potential to bring new intelligent functionality to power- and latencyconstrained systems at a scale and versatility beyond what any other programmable architecture supports today. Loihi has also exposed limitations and weaknesses found in today’s neuromorphic computing approaches. 

While Loihi has one of the most flexible feature sets of any neuromorphic chip, many of the more promising applications stretch the range of its capabilities, such as its supported neuron models and learning rules. Interfacing with conventional sensors, processors, and data formats proved to be a challenge and often a bottleneck for performance. 

While Loihi applications show good scalability in large-scale systems such as the 768-chip Pohoiki Springs system, with gains often increasing relative to conventional solutions at larger scales, congestion in inter-chip links limited application performance. Loihi’s integrated compute-and-memory architecture foregoes off-chip DRAM memory, so scaling up workloads requires increasing the number of Loihi chips in an application. This means the economic viability of the technology depends on achieving significant improvements in the resource density of neuromorphic chips to minimize the number of required chips in commercial deployments. 

Wei Lu (U Mich) Neuromorphic Computing Based on Memristive Materials and Devices

Photonics for Computing: from Optical Interconnects to Neuromorphic Architectures

One of the biggest challenges holding back the commercialization of neuromorphic technology is the lack of software maturity and convergence. Since neuromorphic architecture is fundamentally incompatible with standard programming models, including today’s machine-learning and AI frameworks in wide use, neuromorphic software and application development is often fragmented across research teams, with different groups taking different approaches and often reinventing common functionality. 

Yet to emerge is a single, common software framework for neuromorphic computing that supports the full range of approaches pursued by the research community that presents compelling and productive abstractions to application developers. 

The Nx SDK software developed by Intel Labs for programming Loihi focused on low-level programming abstractions and did not attempt to address the larger community’s need for a more comprehensive and open neuromorphic software framework that runs on a wide range of platforms and allows contributions from throughout the community. This changes with the release of Lava.

 Intel Labs is pioneering research that drives the evolution of compute and algorithms toward next-generation AI.


Loihi 2: A New Generation of Neuromorphic Computing Architecture 

Building on the insights gained from the research performed on the Loihi chip, Intel Labs introduces Loihi 2. A complete tour of the new features, optimizations, and innovations of this chip is provided in the final section. Here are some highlights: • Generalized event-based messaging. Loihi originally supported only binary-valued spike messages. Loihi 2 permits spikes to carry integer-valued payloads with little extra cost in either performance or energy. These generalized spike messages support event-based messaging, preserving the desirable sparse and time-coded communication properties of spiking neural networks (SNNs), while also providing greater numerical precision. • Greater neuron model programmability. Loihi was specialized for a specific SNN model. Loihi 2 now implements its neuron models with a programmable pipeline in each neuromorphic core to support common arithmetic, comparison, and program control flow instructions. Loihi 2’s programmability greatly expands its range of neuron models without compromising performance or efficiency compared to Loihi, thereby enabling a richer space of use cases and applications.

 • Enhanced learning capabilities. Loihi primarily supported two-factor learning rules on its synapses, with a third modulatory term available from nonlocalized “reward” broadcasts. Loihi 2 allows networks to map localized “third factors” to specific synapses. This provides support for many of the latest neuroinspired learning algorithms under study, including approximations of the error backpropagation algorithm, the workhorse of deep learning. While Loihi was able to prototype some of these algorithms in proof-of-concept demonstrations, Loihi 2 will be able to scale these examples up, for example, so new gestures can be learned faster with a greater range of presented hand motions. 

 • Numerous capacity optimizations to improve resource density. Loihi 2 has been fabricated with a preproduction version of the Intel 4 process to address the need to achieve greater application scales within a single neuromorphic chip. Loihi 2 also incorporates numerous architectural optimizations to compress and maximize the efficiency of each chip’s neural memory resources. Together, these innovations improve the overall resource density of Intel’s neuromorphic silicon architecture from 2x to over 160x, depending on properties of the programmed networks. 

 • Faster circuit speeds. Loihi 2’s asynchronous circuits have been fully redesigned and optimized, improving on Loihi down to the lowest levels of pipeline sequencing. This has provided gains in processing speeds from 2x for simple neuron state updates to 5x for synaptic operations to 10x for spike generation.2 Loihi 2 supports minimum chip-wide time steps under 200ns; it can now process neuromorphic networks up to 5000x faster than biological neurons. 

 • Interface improvements. Loihi 2 offers more standard chip interfaces than Loihi. These interfaces are both faster and higher-radix. Loihi 2 chips support 4x faster asynchronous chip-to-chip signaling bandwidths,3 a destination spike broadcast feature that reduces interchip bandwidth utilization by 10x or more in common networks,4 and three-dimensional mesh network topologies with six scalability ports per chip. Loihi 2 supports glueless integration with a wider range of both standard chips, over its new Ethernet interface, as well as emerging event-based vision (and other) sensor devices. 

Photonic reservoir computing for high-speed neuromorphic computing applications - A.Lugnan

 Using these enhancements, Loihi 2 now supports a new deep neural network (DNN) implementation known as the Sigma-Delta Neural Network (SDNN) that provides great gains in speed and efficiency compared to the rate-coded spiking neural network approach commonly used on Loihi. SDNNs compute graded activation values in the same way that conventional DNNs do, but they only communicate significant changes as they happen in a sparse, eventdriven manner. Simulation characterizations show that SDNNs on Loihi 2 can improve on Loihi’s rate-coded SNNs for DNN inference workloads by over 10x in both inference speeds and energy efficiency.

A First Tour of Loihi 2 

 Loihi 2 has the same base architecture as its predecessor Loihi, but comes with several improvements to extend its functionality, improve its flexibility, increase its capacity, accelerate its performance, and make it easier to both scale and integrate into a larger system (see Figure 1). 

 Base Architecture Building on the strengths of its predecessor, each Loihi 2 chip consists of microprocessor cores and up to 128 fully asynchronous neuron cores connected by a network-on-chip (NoC). The neuron cores are optimized for neuromorphic workloads, each implementing a group of spiking neurons, including all synapses connecting to the neurons. All communication between neuron cores is in the form of spike messages. The number of embedded microprocessor cores has doubled from three in Loihi to six in Loihi 2. Microprocessor cores are optimized for spike-based communication and execute standard C code to assist with data I/O as well as network configuration, management, and monitoring. Parallel I/O interfaces extend the on-chip mesh across multiple chips—up to 16,384—with direct pin-to-pin wiring between neighbors. 

Programmable Photonic Integrated Circuits for Quantum Information Processing and Machine Learning

 New Functionality Loihi 2 supports fully programmable neuron models with graded spikes. Each neuron model takes the form of a program, which is a short sequence of microcode instructions describing the behavior of a single neuron. The microcode instruction set supports bitwise and basic math operations in addition to conditional branching, memory access, and specialized instructions for spike generation and probing. 

The second-generation “Loihi” processor from Intel has been made available to advance research into neuromorphic computing approaches that more closely mimic the behavior of biological cognitive processes. Loihi 2 outperforms the previous chip version in terms of density, energy efficiency, and other factors. This is part of an effort to create semiconductors that are more like a biological brain, which might lead to significant improvements in computer performance and efficiency.

Intel Announces Loihi 2, Lava Software Framework For Advancing Neuromorphic  Computing - Phoronix

The first generation of artificial intelligence was built on the foundation of defining rules and emulating classical logic to arrive at rational conclusions within a narrowly defined problem domain. It was ideal for monitoring and optimizing operations. The second generation is dominated by the use of deep learning networks to examine the contents and data that were mostly concerned with sensing and perception. The third generation of AI focuses on drawing similarities to human cognitive processes, like interpretation and autonomous adaptation. 

This is achieved by simulating neurons firing in the same way as humans’ nervous systems do, a method known as neuromorphic computing.

Neuromorphic computing is not a new concept. It was initially suggested in the 1980s by Carver Mead, who coined the phrase “neuromorphic engineering.” Carver had spent more than four decades building analytic systems that simulated human senses and processing mechanisms including sensation, seeing, hearing, and thinking. Neuromorphic computing is a subset of neuromorphic engineering that focuses on the human-like systems’ “thinking” and “processing” capabilities. Today, neuromorphic computing is gaining traction as the next milestone in artificial intelligence technology.

Intel Rolls Out New Loihi 2 Neuromorphic Chip: Built on Early Intel 4  Process

In 2017, Intel released the first-generation Loihi chip, a 14-nanometer chipset with a 60-millimeter die size. It has more than 2 billion transistors and three orchestration Lakemont cores. It also features 128 core packs and a configurable microcode engine for asynchronous spiking neural network-on-chip training. The benefit of having spiking neural networks enabled Loihi to be entirely asynchronous and event-driven, rather than being active and updating on a synchronized clock signal. When a charge builds up in the neurons, “spikes” are sent along active synapses. These spikes are mostly time-based, with time being recorded as part of the data. The core fires out its own spikes to its linked neurons when spikes accumulate in a neuron for a particular amount of time and reach a certain threshold.

Even though Loihi 2 has 128 neuromorphic cores, each core now has 8 times the number of neurons and synapses. Each of the 128 cores has 192 KB of flexible memory, compared to the prior limit of 24. Each neuron may now be assigned up to 4096 states depending on the model, compared to the previous limit of 24. The Neuron model can now be entirely programmable, similar to an FPGA, which gives it more versatility – allowing for new sorts of neuromorphic applications.

One of the drawbacks of Loihi was that spike signals were not programmable and had no context or range of values. Loihi 2 addresses all of these issues while also providing 2-10x (2X for neuron state updates, up to 10X for spike generation) faster circuits, eight times more neurons, and four times more link bandwidth for increased scalability.

Loihi 2 was created using the Intel 4 pre-production process and benefited from the usage of EUV technology in that node. The Intel 4 process allowed to halve the size of the chip from 60 mm2 to 31 mm2, with the number of transistors rising to 2.3 billion. In comparison to previous process technologies, the use of extreme ultraviolet (EUV) lithography in Intel 4 has simplified the layout design guidelines. This has allowed Loihi 2 to be developed quickly.

Programmable Photonic Circuits: a flexible way of manipulating light on chips

Support for three-factor learning rules has been added to the Loihi 2 architecture, as well as improved synaptic (internal interconnections) compression for quicker internal data transmission. Loihi 2 also features parallel off-chip connections (that enable the same types of compression as internal synapses) that may be utilized to extend an on-chip mesh network across many physical chips to create a very powerful neuromorphic computer system. Loihi 2 also features new approaches for continual and associative learning. Furthermore, the chip features 10GbE, GPIO, and SPI interfaces to make it easier to integrate Loihi 2 with traditional systems.

Loihi 2 further improves flexibility by integrating faster, standardized I/O interfaces that support Ethernet connections, vision sensors, and bigger mesh networks. These improvements are intended to improve the chip’s compatibility with robots and sensors, which have long been a part of Loihi’s use cases.

Another significant change is in the portion of the processor that assesses the condition of the neuron before deciding whether or not to transmit a spike. Earlier, users had to make such conclusions using a simple bit of arithmetic in the original processor. Now, they only need to conduct comparisons and regulate the flow of instructions in Loihi 2 thanks to a simpler programmable pipeline.

ESA+ Colloquium - Programmable Photonics - Wim Bogaerts - 3 May 2021

Intel claims Loihi 2’s enhanced architecture allows it to be compatible in carrying back-propagation processes, which is a key component of many AI models. This may help in accelerating the commercialization of neuromorphic chips. Loihi 2 has also been proven to execute inference calculations, with up to 60 times fewer operations per inference compared to Loihi – without any loss in accuracy. Often inference calculations are used by AI models to interpret given data.

The Neuromorphic Research Cloud is presently offering two Loihi 2-based neuromorphic devices to researchers. These are:

Oheo Gulch is a single-chip add-in card that comes with an Intel Arria 10 FPGA for interfacing with Loihi 2 which will be used for early assessment.

Kapoho Point, an 8-chip system board that mounts eight Loihi 2 chips in a 4×4-inch form factor, will be available shortly. It will have GPIO pins along with “standard synchronous and asynchronous interfaces” that will allow it to be used with things like sensors and actuators for embedded robotics applications

These will be available via a cloud service to members of the Intel Neuromorphic Research Community (INRC) and Lava via GitHub for free.

Intel has also created Lava to address the requirement for software convergence, benchmarking, and cross-platform collaboration in the realm of neuromorphic computing. As an open, modular, and extendable framework, it will enable academics and application developers to build on one other’s efforts and eventually converge on a common set of tools, techniques, and libraries. 

Intel Announces Loihi 2, Lava Software Framework For Advancing Neuromorphic  Computing - Phoronix

Lava operates on a range of conventional and neuromorphic processor architectures, allowing for cross-platform execution and compatibility with a variety of artificial intelligence, neuromorphic, and robotics frameworks. Users can get the Lava Software Framework for free on GitHub.

Edy Liongosari, chief research scientist and managing director for Accenture Labs believes that advances like the new Loihi-2 chip and the Lava API will be crucial to the future of neuromorphic computing. “Next-generation neuromorphic architecture will be crucial for Accenture Labs’ research on brain-inspired computer vision algorithms for intelligent edge computing that could power future extended-reality headsets or intelligent mobile robots,” says Edy.

For now, Loihi 2 has piqued the interest of the Queensland University of Technology. The institute is looking to work on more sophisticated neural modules to aid in the implementation of biologically inspired navigation and map formation algorithms. The first generation Loihi is already being used at Los Alamos National Lab to study tradeoffs between quantum and neuromorphic computing. It is also being used in the backpropagation algorithm, which is used to train neural networks.

Intel has unveiled its second-generation neuromorphic computing chip, Loihi 2, the first chip to be built on its Intel 4 process technology. Designed for research into cutting-edge neuromorphic neural networks, Loihi 2 brings a range of improvements. They include a new instruction set for neurons that provides more programmability, allowing spikes to have integer values beyond just 1 and 0, and the ability to scale into three-dimensional meshes of chips for larger systems.

The chipmaker also unveiled Lava, an open-source software framework for developing neuro-inspired applications. Intel hopes to engage neuromorphic researchers in development of Lava, which when up and running will allow research teams to build on each other’s work.

Loihi is Intel’s version of what neuromorphic hardware, designed for brain-inspired spiking neural networks (SNNs), should look like. SNNs are used in event-based computing, in which the timing of input spikes encodes the information. In general, spikes that arrive sooner have more computational effect than those arriving later.

Karlheinz meier - How neuromorphic computing may affect our future life HBP

Intel’s Loihi 2 second-generation neuromorphic processor. (Source: Intel)

Among the key differences between neuromorphic hardware and standard CPUs is fine-grained distribution of memory, meaning Loihi’s memory is embedded into individual cores. Since Loihi’s spikes rely on timing, the architecture is asynchronous.

“In neuromorphic computing, the computation is emerging through the interaction between these dynamical elements,” explained Mike Davies, director of Intel’s Neuromorphic Computing Lab. “In this case, it’s neurons that have this dynamical property of adapting online to the input it receives, and the programmer may not know the precise trajectory of steps that the chip will go through to arrive at an answer.

“It goes through a dynamical process of self-organizing its states and it settles into some new condition. That final fixed point as we call it, or equilibrium state, is what is encoding the answer to the problem that you want to solve,” Davies added. “So it’s very fundamentally different from how we even think about computing in other architectures.”

First-generation Loihi chips have thus far been demonstrated in a variety of research applications, including adaptive robot arm control, where the motion adapts to changes in the system, reducing friction and wear on the arm. Loihi is able to adapt its control algorithm to compensate for errors or unpredictable behavior, enabling robots to operate with the desired accuracy. Loihi has also been used in a system that recognizes different smells. In this scenario, it can learn and detect new odors much more efficiently than a deep learning-based equivalent. A project with Deutsche Bahn also used Loihi for train scheduling. The system reacted quickly to changes such as track closures or stalled trains.

Second-gen features

Built on a pre-production version of the Intel 4 process, Loihi 2 aims to increase programmability and performance without compromising energy efficiency. Like its predecessor, it typically consumes around 100 mW (up to 1 W).

An increase in resource density is one of the most important changes; while the chip still incorporates 128 cores, the neuron count jumps by a factor of eight.

“Getting to a higher amount of storage, neurons and synapses in a single chip is essential for the commercial viability… and commercializing them in a way that makes sense for customer applications,” said Davies.

Loihi 2 features. (Source: Intel)

With Loihi 1, workloads would often map onto the architecture in non-optimal ways. For example, the neuron count would often max out while free memory was still available. The amount of memory in Loihi 2 is similar in total, but has been broken up into memory banks that are more flexible. Additional compression has been added to network parameters to minimize the amount of memory required for larger models. This frees up memory that can be reallocated for neurons.

The upshot is that Loihi 2 can tackle larger problems with the same amount of memory, delivering a roughly 15-fold increase in neural network capacity per millimeter 2 of chip area–bearing in mind that die area is halved overall by new process technology.

Neuron programmability

Programmability is another important architectural modification. Neurons that were previously fixed-function, though configurable, in Loihi 1 gain a full instruction set in Loihi 2. The instruction set includes common arithmetic, comparison and program control flow instructions. That level of programmability would allow varied SNN types to be run more efficiently.

“This is a kind of microcode that allows us to program almost arbitrary neuron models,” Davies said. “This covers the limits of Loihi [1], and where generally we’re finding more application value could be unlocked with even more complex and richer neuron models, which is not what we were expecting at the beginning of Loihi. But now we can actually encompass that full extent of neuron models that our partners are trying to investigate, and what the computational neuroscience domain [is] proposing and characterizing.”

The Loihi 2 die is the first to be fabricated on a pre-production version of Intel 4 process technology. (Source: Intel)

Programmable Photonic Circuits

For Loihi 2, the idea of spikes has also been generalized. Loihi 1 employed strict binary spikes to mirror what is seen in biology, where spikes have no magnitude. All information is represented by spike timing, and earlier spikes would have greater computational effect than later spikes. In Loihi 2, spikes carry a configurable integer payload available to the programmable neuron model. While biological brains don’t do this, Davies said it was relatively easy for Intel to add to the silicon architecture without compromising performance.

“This is an instance where we’re departing from the strict biological fidelity, specifically because we understand what the importance is, the time-coding aspect of it,” he said. “But [we realized] that we can do better, and we can solve the same problems with fewer resources if we have this extra magnitude that can be sent alongside with this spike.”

Generalized event-based messaging is key to Loihi 2’s support of a deep neural network called the sigma-delta neural network (SDNN), which is much faster than the timing approach used on Loihi 1. SDNNs compute graded-activation values in the same way that conventional DNNs do, but only communicate significant changes as they happen in a sparse, event-driven manner.

3D Scaling

Loihi 2 is billed as up to 10 times faster than its predecessor at the circuit level. Combined with functional improvements, the design can deliver up to 10X speed gains, Davies claimed. Loihi 2 supports minimum chip-wide time steps under 200ns; it can also process neuromorphic networks up to 5,000 times faster than biological neurons.

Programmable Photonics - Wim Bogaerts - Stanford

The new chip also features scalability ports which allow Intel to scale neural networks into the third dimension. Without external memory on which to run larger neural networks, Loihi 1 required multiple devices (such as in Intel’s 768-Loihi chip system, Pohoiki Springs). Planar meshes of Loihi 1 chips become 3D meshes in Loihi 2. Meanwhile, chip-to-chip bandwidth has been improved by a factor of four, with compression and new protocols providing one-tenth the redundant spike traffic sent between chips. Davies said the combined capacity boost is around 60-fold for most workloads, avoiding bottlenecks caused by inter-chip links.

Also supported is three-factor learning, which is popular in cutting-edge neuromorphic algorithm research. The same modification, which maps third factors to specific synapses, can be used to approximate back-propagation, the training method used in deep learning. That creates new ways of learning via Loihi.

Loihi 2 will be available to researchers as a single-chip board for developing edge applications (Oheo Gulch). It will also be offered as an eight-chip board intended to scale for more demanding applications. (Source: Intel)

Lava

The Lava software framework rounds out the Loihi enhancements. The open-source project is available to the neuromorphic research community.

“Software continues to hold back the field,” Davies said. “There hasn’t been a lot of progress, not at the same pace as the hardware over the past several years. And there hasn’t been an emergence of a single software framework, as we’ve seen in the deep learning world where we have TensorFlow and PyTorch gathering huge momentum and a user base.”

While Intel has a portfolio of applications demonstrated for Loihi, code sharing among development teams has been limited. That makes it harder for developers to build on progress made elsewhere.

Promoted as a new project, not a product, Davies said Lava is intended as a way to build a framework that supports Loihi researchers working on a range of algorithms. While Lava is aimed at event-based asynchronous message passing, it will also support heterogeneous execution. That allows researchers to develop applications that initially run on CPUs. With access to Loihi hardware, researchers can then map parts of the workload onto the neuromorphic chip. The hope is that approach would help lower the barrier to entry.

“We see a need for convergence and a communal development here towards this greater goal which is going to be necessary for commercializing neuromorphic technology,” Davies said.

Loihi 2 will be used by researchers developing advanced neuromorphic algorithms. Oheo Gulch, a single-chip system for lab testing, will initially be available to researchers, followed by Kapoho Point, an eight-chip Loihi 2 version of Kapoho Bay. Kapoho Point includes an Ethernet interface designed to allow boards to be stacked for applications such as robotics requiring more computing power.

More Information:

https://www.youtube.com/c/PhotonicsResearchGroupUGentimec/videos

https://ecosystem.photonhub.eu/trainings/product/?action=view&id_form=7&id_form_data=14

https://aip.scitation.org/doi/10.1063/5.0047946

https://www.intel.com/content/www/us/en/research/neuromorphic-computing.html

https://www.intel.com/content/www/us/en/newsroom/resources/press-kits-neuromorphic-computing.html

https://www.photonics.com/Articles/Neuromorphic_Processing_Set_to_Propel_Growth_in_AI/a66821

https://www.embedded.com/intel-offers-loihi-2-neuromorphic-chip-and-software-framework/

https://github.com/Linaro/lava




19 September 2021

Tesla Dojo and Hydranet and AI and Deep Learning with New Super Computer Dojo and D1 Chip

 

Tesla Dojo and Hydranet and AI and Deep Learning

Tesla Has Done Something No Other Automaker Has: Assumed The Mantle Of Moore’s Law

Steve Jurvetson shared on Twitter that Tesla now holds the mantle of Moore’s law in the same manner NVIDIA took leadership from Intel a decade ago. He noted that the substrates have shifted several times, but humanity’s capacity to compute has compounded for 122 years. He shared a log scale with details.

https://www.flickr.com/photos/jurvetson/51391518506/

The link Jurvetson shared included a detailed article explaining how Tesla holds the mantel of Moore’s Law. Tesla’s introduced its D1 chip for the DOJO Supercomputer and he said:


“This should not be a surprise, as Intel ceded leadership to NVIDIA a decade ago, and further handoffs were inevitable. The computational frontier has shifted across many technology substrates over the past 120 years, most recently from the CPU to the GPU to ASICs optimized for neural networks (the majority of new compute cycles).”


“Of all of the depictions of Moore’s Law, this is the one I find to be most useful, as it captures what customers actually value — computation per $ spent (note: on a log scale, so a straight line is an exponential; each y-axis tick is 100x).”

“Humanity’s capacity to compute has compounded for as long as we can measure it, exogenous to the economy, and starting long before Intel co-founder Gordon Moore noticed a refraction of the longer-term trend in the belly of the fledgling semiconductor industry in 1965.”

Project Dojo: Check out Tesla Bot AI chip! (full presentation)


“In the modern era of accelerating change, it is hard to find even five-year trends with any predictive value, let alone trends that span the centuries. I would go further and assert that this is the most important graph ever conceived (my earlier blog post on its origins and importance).”

“Why the transition within the integrated circuit era? Intel lost to NVIDIA for neural networks because the fine-grained parallel compute architecture of a GPU maps better to the needs of deep learning. There is a poetic beauty to the computational similarity of a processor optimized for graphics processing and the computational needs of a sensory cortex, as commonly seen in neural networks today. A custom chip (like the Tesla D1 ASIC) optimized for neural networks extends that trend to its inevitable future in the digital domain. Further advances are possible in analog in-memory compute, an even closer biomimicry of the human cortex. The best business planning assumption is that Moore’s Law, as depicted here, will continue for the next 20 years as it has for the past 120.”

In the detailed description of the chart, Jurvetson pointed out that in the perception of Moore’s Law, computer chips are compounding in their complexity at near-constant per unit cost. He explained that this is one of many abstractions of the law. Moore’s Law is both a prediction and an abstraction this abstraction is related to the compounding of transistor density in two dimensions. He explained that others related to speed or computational power.

He also added:

“What Moore observed in the belly of the early IC industry was a derivative metric, a refracted signal, from a longer-term trend, a trend that begs various philosophical questions and predicts mind-bending futures.”

“Ray Kurzweil’s abstraction of Moore’s Law shows computational power on a logarithmic scale, and finds a double exponential curve that holds over 120 years! A straight line would represent a geometrically compounding curve of progress.”



He explained that, through five paradigm shifts, the computation power that $1,000 buys has doubled every two years. And it has been doubling every year for the past 30 years. In this graph, he explained that each dot represented a frontier of the computational price performance of the day. He gave these examples: one machine used in the 1890 Census, one cracked the Nazi Enigma cipher in WW2, and one predicted Eisenhower’s win in the 1956 presidential election.

He also pointed out that each dot represented a human drama and that before Moore’s first paper in 1965, none of them realized that they were on a predictive curve. The dots represent an attempt to build the best computer with the tools of the day, he explained. And with those creations, we use them to make better design software and manufacturing control algorithms.

“Notice that the pace of innovation is exogenous to the economy. The Great Depression and the World Wars and various recessions do not introduce a meaningful change in the long-term trajectory of Moore’s Law. Certainly, the adoption rates, revenue, profits, and economic fates of the computer companies behind the various dots on the graph may go through wild oscillations, but the long-term trend emerges nevertheless.”

Tesla now holds the mantle of Moore’s Law, with the D1 chip introduced last night for the DOJO supercomputer (video, news summary).

Tesla’s BREAKTHROUGH DOJO Supercomputer Hardware Explained

This should not be a surprise, as Intel ceded leadership to NVIDIA a decade ago, and further handoffs were inevitable. The computational frontier has shifted across many technology substrates over the past 120 years, most recently from the CPU to the GPU to ASICs optimized for neural networks (the majority of new compute cycles). The ASIC approach is being pursued by scores of new companies and Google TPUs now added to the chart by popular request (see note below for methodology), as well as the Mythic analog M.2

By taking on the mantle of Moore’s Law, Tesla is achieving something that no other automaker has achieved. I used the term “automaker” since Tesla is often referred to as such by the media, friends, family, and those who don’t really follow the company closely. Tesla started out as an automaker and that’s what people remember most about it: “a car for rich people,” one of my close friends told me. (She was shocked when I told her how much a Model 3 cost. She thought it was over $100K for the base model.)

Jurvetson’s post is very technical, but it reflects the truth: Tesla has done something unique for the auto industry. Tesla has progressed an industry that was outdated and challenged the legacy OEMs to evolve. This was is a hard thing for them to do, as there hasn’t been any new revolutionary technology introduced to this industry since Henry Ford moved humanity from the horse and buggy to automobiles.

Sure, over the years, designs of vehicles changed along with pricing, specs, and other details, but until Tesla, none of these changes affected the industry largely as a whole. None of these changes made the industry so uncomfortable that they laughed at the idea before lated getting scared of being left behind. The only company to have done this is Tesla, and now new companies are trying to be the next Tesla or create competing cars — and do whatever they can to keep up with Tesla’s lead.

Teaching a Car to Drive Itself by Imitation and Imagination (Google I/O'19)

For the auto industry, Tesla represents a jump in evolution, and not many people understand this. I think most automakers have figured this out, though. Ford and VW especially.

Of all of the depictions of Moore’s Law, this is the one I find to be most useful, as it captures what customers actually value — computation per $ spent (note: on a log scale, so a straight line is an exponential; each y-axis tick is 100x).

Humanity’s capacity to compute has compounded for as long as we can measure it, exogenous to the economy, and starting long before Intel co-founder Gordon Moore noticed a refraction of the longer-term trend in the belly of the fledgling semiconductor industry in 1965.

Why the transition within the integrated circuit era? Intel lost to NVIDIA for neural networks because the fine-grained parallel compute architecture of a GPU maps better to the needs of deep learning. There is a poetic beauty to the computational similarity of a processor optimized for graphics processing and the computational needs of a sensory cortex, as commonly seen in neural networks today. A custom chip (like the Tesla D1 ASIC) optimized for neural networks extends that trend to its inevitable future in the digital domain. Further advances are possible in analog in-memory compute, an even closer biomimicry of the human cortex. The best business planning assumption is that Moore’s Law, as depicted here, will continue for the next 20 years as it has for the past 120.

For those unfamiliar with this chart, here is a more detailed description:

Moore's Law is both a prediction and an abstraction

Moore’s Law is commonly reported as a doubling of transistor density every 18 months. But this is not something the co-founder of Intel, Gordon Moore, has ever said. It is a nice blending of his two predictions; in 1965, he predicted an annual doubling of transistor counts in the most cost effective chip and revised it in 1975 to every 24 months. With a little hand waving, most reports attribute 18 months to Moore’s Law, but there is quite a bit of variability. The popular perception of Moore’s Law is that computer chips are compounding in their complexity at near constant per unit cost. This is one of the many abstractions of Moore’s Law, and it relates to the compounding of transistor density in two dimensions. Others relate to speed (the signals have less distance to travel) and computational power (speed x density).

Unless you work for a chip company and focus on fab-yield optimization, you do not care about transistor counts. Integrated circuit customers do not buy transistors. Consumers of technology purchase computational speed and data storage density. When recast in these terms, Moore’s Law is no longer a transistor-centric metric, and this abstraction allows for longer-term analysis.

Tesla’s MIND BLOWING Dojo AI Chip (changes everything)

What Moore observed in the belly of the early IC industry was a derivative metric, a refracted signal, from a longer-term trend, a trend that begs various philosophical questions and predicts mind-bending futures.

Ray Kurzweil’s abstraction of Moore’s Law shows computational power on a logarithmic scale, and finds a double exponential curve that holds over 120 years! A straight line would represent a geometrically compounding curve of progress. 

Through five paradigm shifts – such as electro-mechanical calculators and vacuum tube computers – the computational power that $1000 buys has doubled every two years. For the past 35 years, it has been doubling every year. 

Each dot is the frontier of computational price performance of the day. One machine was used in the 1890 Census; one cracked the Nazi Enigma cipher in World War II; one predicted Eisenhower’s win in the 1956 Presidential election. Many of them can be seen in the Computer History Museum. 

Each dot represents a human drama. Prior to Moore’s first paper in 1965, none of them even knew they were on a predictive curve. Each dot represents an attempt to build the best computer with the tools of the day. Of course, we use these computers to make better design software and manufacturing control algorithms. And so the progress continues.

Notice that the pace of innovation is exogenous to the economy. The Great Depression and the World Wars and various recessions do not introduce a meaningful change in the long-term trajectory of Moore’s Law. Certainly, the adoption rates, revenue, profits and economic fates of the computer companies behind the various dots on the graph may go though wild oscillations, but the long-term trend emerges nevertheless.

Any one technology, such as the CMOS transistor, follows an elongated S-shaped curve of slow progress during initial development, upward progress during a rapid adoption phase, and then slower growth from market saturation over time. But a more generalized capability, such as computation, storage, or bandwidth, tends to follow a pure exponential – bridging across a variety of technologies and their cascade of S-curves.

In the modern era of accelerating change in the tech industry, it is hard to find even five-year trends with any predictive value, let alone trends that span the centuries. I would go further and assert that this is the most important graph ever conceived.

Why is this the most important graph in human history?

A large and growing set of industries depends on continued exponential cost declines in computational power and storage density. Moore’s Law drives electronics, communications and computers and has become a primary driver in drug discovery, biotech and bioinformatics, medical imaging and diagnostics. As Moore’s Law crosses critical thresholds, a formerly lab science of trial and error experimentation becomes a simulation science, and the pace of progress accelerates dramatically, creating opportunities for new entrants in new industries. Boeing used to rely on the wind tunnels to test novel aircraft design performance. Ever since CFD modeling became powerful enough, design moves to the rapid pace of iterative simulations, and the nearby wind tunnels of NASA Ames lie fallow. The engineer can iterate at a rapid rate while simply sitting at their desk.

Tesla unveils "Dojo" Computer Chip | Tesla AI Day 

Every industry on our planet is going to become an information business. Consider agriculture. If you ask a farmer in 20 years’ time about how they compete, it will depend on how they use information, from satellite imagery driving robotic field optimization to the code in their seeds. It will have nothing to do with workmanship or labor. That will eventually percolate through every industry as IT innervates the economy.

Non-linear shifts in the marketplace are also essential for entrepreneurship and meaningful change. Technology’s exponential pace of progress has been the primary juggernaut of perpetual market disruption, spawning wave after wave of opportunities for new companies. Without disruption, entrepreneurs would not exist.

Moore’s Law is not just exogenous to the economy; it is why we have economic growth and an accelerating pace of progress. At Future Ventures, we see that in the growing diversity and global impact of the entrepreneurial ideas that we see each year. The industries impacted by the current wave of tech entrepreneurs are more diverse, and an order of magnitude larger than those of the 90’s — from automobiles and aerospace to energy and chemicals.

At the cutting edge of computational capture is biology; we are actively reengineering the information systems of biology and creating synthetic microbes whose DNA is manufactured from bare computer code and an organic chemistry printer. But what to build? So far, we largely copy large tracts of code from nature. But the question spans across all the complex systems that we might wish to build, from cities to designer microbes, to computer intelligence.

Reengineering engineering

As these systems transcend human comprehension, we will shift from traditional engineering to evolutionary algorithms and iterative learning algorithms like deep learning and machine learning. As we design for evolvability, the locus of learning shifts from the artifacts themselves to the process that created them. There is no mathematical shortcut for the decomposition of a neural network or genetic program, no way to "reverse evolve" with the ease that we can reverse engineer the artifacts of purposeful design. The beauty of compounding iterative algorithms (evolution, fractals, organic growth, art) derives from their irreducibility. And it empowers us to design complex systems that exceed human understanding.

Tesla AI Day

Why does progress perpetually accelerate?

All new technologies are combinations of technologies that already exist. Innovation does not occur in a vacuum; it is a combination of ideas from before. In any academic field, the advances today are built on a large edifice of history. . This is why major innovations tend to be 'ripe' and tend to be discovered at the nearly the same time by multiple people. The compounding of ideas is the foundation of progress, something that was not so evident to the casual observer before the age of science. Science tuned the process parameters for innovation, and became the best method for a culture to learn.

From this conceptual base, come the origin of economic growth and accelerating technological change, as the combinatorial explosion of possible idea pairings grows exponentially as new ideas come into the mix (on the order of 2^n of possible groupings per Reed’s Law). It explains the innovative power of urbanization and networked globalization. And it explains why interdisciplinary ideas are so powerfully disruptive; it is like the differential immunity of epidemiology, whereby islands of cognitive isolation (e.g., academic disciplines) are vulnerable to disruptive memes hopping across, much like South America was to smallpox from Cort├ęs and the Conquistadors. If disruption is what you seek, cognitive island-hopping is good place to start, mining the interstices between academic disciplines.

Predicting cut-ins (Andrej Karpathy)

It is the combinatorial explosion of possible innovation-pairings that creates economic growth, and it’s about to go into overdrive. In recent years, we have begun to see the global innovation effects of a new factor: the internet. People can exchange ideas like never before Long ago, people were not communicating across continents; ideas were partitioned, and so the success of nations and regions pivoted on their own innovations. Richard Dawkins states that in biology it is genes which really matter, and we as people are just vessels for the conveyance of genes. It’s the same with ideas or “memes”. We are the vessels that hold and communicate ideas, and now that pool of ideas percolates on a global basis more rapidly than ever before.

In the next 6 years, three billion minds will come online for the first time to join this global conversation (via inexpensive smart phones in the developing world). This rapid influx of three billion people to the global economy is unprecedented in human history, and so to, will the pace of idea-pairings and progress.

We live in interesting times, at the cusp of the frontiers of the unknown and breathtaking advances. But, it should always feel that way, engendering a perpetual sense of future shock.

The D1 is the second semiconductor designed internally by Tesla, following the in-car supercomputer released in 2019. According to Tesla Official, each D1 packs 362 teraflops (TFLOPs) of processing power, meaning it can perform 362 trillion floating-point operations per second.

Is the ‘D1’ AI chip speeding Tesla towards full autonomy?

The company has designed a super powerful and efficient chip for self-driving, but can be used for many other things

Tesla on its AI day, unveiled a custom chip for training artificial intelligence networks in data centers

The D1 chip is part of Tesla’s Dojo supercomputer system, uses a 7-nm manufacturing process, with 362 teraflops of processing power

The chips can help train models to recognize items from camera feeds inside Tesla vehicles

Will the just-announced Tesla Bot make future working optional for humans - or obsolete?

Elon Musk says Tesla robot will make physical work a ‘choice’

Back at the Tesla 2019 Autonomy Day, CEO Elon Musk unveiled its first custom artificial intelligence (AI) chip, which promised to propel the company toward its goal of full autonomy. The automaker then started producing cars with its custom AI within the same year. This year, as the world grapples with a chip shortage conundrum, the company presented its in-house D1 chip — the processor that will power its Dojo supercomputer.

Tesla's Dojo Supercomputer, Full Analysis (Part 1/2)

Tesla's Dojo Supercomputer, Full Analysis (Part 2/2)


The D1 is the second semiconductor designed internally by Tesla, following the in-car supercomputer released in 2019. According to Tesla Official, each D1 packs 362 teraflops (TFLOPs) of processing power, meaning it can perform 362 trillion floating-point operations per second. 

Tesla combines 25 chips into a training tile and links 120 training tiles together across several server cabinets. In simple terms, each training tile clocks in at 9 petaflops, meaning Dojo will boast over 1 exaflop of computing power. In other words, Dojo can easily be the most powerful AI training machine in the world.

The company believes that AI has limitless possibilities and the system is getting smarter than an average human. Tesla announced that to speed up the AI software workloads, its D1 Dojo custom application-specific integrated circuit (ASIC) for AI training will be of great use, the software that the company presented during this year’s AI Day that was held last week.

Although many companies including tech giants like Amazing, Baidu, Intel and NVIDIA are building ASICs for AI workloads, not everyone has the right formula or satisfies each workload perfectly. Experts reckon it is the reason why Tesla opted to develop its own ASIC for AI training purposes.

Tesla and its foray into AI

The system which is called the D1 resembles a part of the Dojo supercomputer used to train AI models inside Tesla’s headquarters. It is fair to note that the chip is a product of Taiwan’s TSMC’s manufacturing efforts and is produced using the 7nm semiconductor node. The chip reportedly is packed with over 50 billion transistors and boasts a huge die size of 645mm^2.

Now, with the introduction of an exascale supercomputer which management says will be operational next year, Tesla has reinforced that advantage. Since AI training requires two things: massive amounts of data, and a powerful supercomputer that can use that data to train deep neural nets, Tesla has the added advantage. With over one million autopilot-enabled EVs on the road, Tesla already has a vast dataset edge over other automakers. 

All this work comes two years after Tesla began producing vehicles containing AI chips it built in-house. Those chips help the car’s onboard software make decisions very quickly in response to what’s happening on the road. This time, Musk noted that its latest supercomputer tech can be used for many other things and that Tesla is willing to open up other automakers and tech companies who are interested. 


At first it seems improbable — how could it be that Tesla, who has never designed a chip before — would design the best chip in the world? But that is objectively what has occurred. Not best by a small margin, best by a huge margin. It’s in the cars right now,” Musk said. With that, his newest big prediction is that Tesla will have self-driving cars on the road next year — without humans inside — operating in a so-called robo-taxi fleet. 


Tesla introduced the Tesla D1, a new chip designed specifically for artificial intelligence that is capable of delivering a power of 362 TFLOPs in BF16 / CFP8. This was announced at Tesla’s recent AI Day event.

The Tesla D1 adds a total of 354 training nodes that form a network of functional units, which are interconnected to create a massive chip. Each functional unit comes with a quad-core, 64-bit ISA CPU that uses a specialized, custom design for transpositions, compilations, broadcasts, and link traversal. This CPU adopts a superscalar implementation (4-wide scalar and 2-wide vector pipelines).

This new Tesla silicon is manufactured in 7nm process, has a total of 50,000 million transistors, and occupies an area of ​​645 mm square, which means that it is smaller than the GA100 GPU, used in the NVIDIA A100 accelerator, which is 826 mm square in size.

Each functional unit has 1.25 MB SRAM and 512 GB/sec bandwidth in any direction on the unit network. The CPUs are joined in multichip configurations of 25 D1 units, which Tesla calls "Dojo Interface Processors" (DIPs).



Tesla claims its Dojo chip will process computer vision data four times faster than existing systems, enabling the company to bring its self-driving system to full autonomy, but the two most difficult technological feats have not been accomplished by Tesla yet, this is the tile to tile interconnect and software. Each tile has more external bandwidth than the highest end networking switches. To achieve this, Tesla developed custom interconnects. Tesla says the first Dojo cluster will be running by next year.

The same technology that undergirds Tesla’s cars will drive the forthcoming Tesla Bot, which is intended to perform mundane tasks like grocery shopping or assembly-line work. Its design spec calls for 45-pound carrying capacity, “human-level hands,” and a top speed of 5 miles per hour (so humans can outrun it).

IBM’s Telum Processor is the latest silicon wafer chip and a competitor to the Tesla D1. IBM’s first commercial processor, the Telum contains on-chip acceleration and allows clients to use deep-learning interference at scale. IBM claims that the on-chip acceleration empowers the system to conduct inference at a great speed.

IBM’s Telum is integral in fraud detection during the early periods of transaction processing while Tesla’s Dojo is mainly essential for computer vision for self-driving cars using cameras. While Telum is a silicon wafer, Dojo has gone against industry standards: the chips are designed to connect without any glue.

The most powerful supercomputer in the world, Fugaku, lives at the RIKEN Center for Computational Science in Japan. At its tested limit it is capable of 442,010 TFLOPs per second, and theoretically it could perform up to 537,212 TFLOPs per second. Dojo, Tesla said, could end up being capable of breaking the exaflop barrier, something that no supercomputing company, university or government has been capable of doing.

Tesla unveils "Dojo" Computer Chip | Tesla AI Day

Dojo is made up of a mere 10 cabinets and is thus also the smallest supercomputer in the world when it comes to size. Fugaku on the other hand is made up of 256 cabinets. If Tesla was to add 54 cabinets to Dojo V1 for a total of 64 cabinets, Dojo would surpass Fugaku in computer performance.

All along, Tesla seemed positioned to gain an edge in artificial intelligence. Sure, Elon Musk’s Neuralink — along with SpaceX and The Boring Company — are separately held companies from Tesla, but certainly seepage among the companies occurs. So, at the Tesla AI event last month, when the company announced it would be designing its own silicon chips, more than ever it seemed Tesla had an advantage.

The AI event culminated with a dancing human posing as a humanoid robot, previewing the Tesla Bot the company intends to build. But the more immediate and important reveal was the custom AI chip “D1,” which would be used for training the machine-learning algorithm behind Tesla’s Autopilot self-driving system. Tesla has a keen focus on this technology, with a single giant neural network known as a “transformer” receiving input from 8 cameras at once.

“We are effectively building a synthetic animal from the ground up,” Tesla’s AI chief, Andrej Karpathy, said during the August, 2021 event. “The car can be thought of as an animal. It moves around autonomously, senses the environment, and acts autonomously.”

CleanTechnica‘s Johnna Crider, who attended the AI event, shared that, “At the very beginning of the event, Tesla CEO Musk said that Tesla is much more than an electric car company, and that it has ‘deep AI activity in hardware on the inference level and on the training level.’” She concluded that, “by unveiling the Dojo supercomputer plans and getting into the details of how it is solving computer vision problems, Tesla showed the world another side to its identity.”

Tesla’s Foray into Silicon Chips

Tesla is the latest nontraditional chipmaker, as described in a recent Wired analysis. Intel Corporation is the world’s largest semiconductor chip maker, based on its 2020 sales. It is the inventor of the x86 series of microprocessors found in most personal computers today. Yet, as AI gains prominence and silicon chips become essential ingredients in technology-integrated manufacturing, many others, including Google, Amazon, and Microsoft, are now designing their own chips.

Tesla FSD chip explained! Tesla vs Nvidia vs Intel chips

For Tesla, the key to silicon chip success will be deriving optimal performance out of the computer system used to train the company’s neural network. “If it takes a couple of days for a model to train versus a couple of hours,” CEO Elon Musk said at the AI event, “it’s a big deal.”

Initially, Tesla relied on Nvidia hardware for its silicon chips. That changed in 2019, when Tesla turned in-house to design chips that interpret sensor input in its cars. However, manufacturing the chips needed to train AI algorithms — moving the creative process from vision to execution — is quite a sophisticated, costly, and demanding endeavor.

The D1 chip, part of Tesla’s Dojo supercomputer system, uses a 7-nanometer manufacturing process, with 362 teraflops of processing power, said Ganesh Venkataramanan, senior director of Autopilot hardware. Tesla places 25 of these chips onto a single “training tile,” and 120 of these tiles come together across several server cabinets, amounting to over an exaflop of power. “We are assembling our first cabinets pretty soon,” Venkataramanan disclosed.

CleanTechnica‘s Chanan Bos deconstructed the D1 chip intricately in a series of articles (in case you missed them) and related that, under its specifications, the D1 chip boasts that it has 50 billion transistors. When it comes to processors, that absolutely beats the current record held by AMD’s Epyc Rome chip of 39.54 billion transistors.


Tesla says on its website that the company believes “that an approach based on advanced AI for vision and planning, supported by efficient use of inference hardware, is the only way to achieve a general solution for full self-driving and beyond.” To do so, the company will:

Build silicon chips that power the full self-driving software from the ground up, taking every small architectural and micro-architectural improvement into account while pushing hard to squeeze maximum silicon performance-per-watt;

Perform floor-planning, timing, and power analyses on the design;

Write robust, randomized tests and scoreboards to verify functionality and performance;

Implement compilers and drivers to program and communicate with the chip, with a strong focus on performance optimization and power savings; and,

Validate the silicon chip and bring it to mass production.

“We should have Dojo operational next year,” CEO Elon Musk affirmed.

Keynote - Andrej Karpathy, Tesla


The Tesla Neural Network & Data Training

Tesla’s approach to full self-driving is grounded in its neural network. Most companies that are developing self-driving technology look to lidar, which is an acronym for “Light Detection and Ranging.” It’s a remote sensing method that uses light in the form of a pulsed laser to measure ranges — i.e., variable distances — to the Earth. These light pulses are combined with other data recorded by the airborne system to generate precise, 3-dimensional information about the shape of the Earth and its surface characteristics.

PyTorch at Tesla - Andrej Karpathy, Tesla

Tesla, however, rejected lidar, partially due to its expensive cost and the amount of technology required per vehicle. Instead, it interprets scenes by using the neural network algorithm to dissect input from its cameras and radar. Chris Gerdes, director of the Center for Automotive Research at Stanford, says this approach is “computationally formidable. The algorithm has to reconstruct a map of its surroundings from the camera feeds rather than relying on sensors that can capture that picture directly.”

Tesla explains on its website the protocols it has embraced to develop its neural networks:

Apply cutting-edge research to train deep neural networks on problems ranging from perception to control;

Per-camera networks analyze raw images to perform semantic segmentation, object detection, and monocular depth estimation;

Birds-eye-view networks take video from all cameras to output the road layout, static infrastructure, and 3D objects directly in the top-down view;

Networks learn from the most complicated and diverse scenarios in the world, iteratively sourced from a fleet of nearly 1M vehicles in real time; and,

A full build of Autopilot neural networks involves 48 networks that take 70,000 GPU hours to train, and, together, output 1,000 distinct tensors (predictions) at each timestep.

Training Teslas via Videofeeds

Tesla gathers more training data than other car companies. Each of the more than 1 million Teslas on the road sends back to the company the videofeeds from its 8 cameras. Hardware 3 onboard computer processes more than 40s the data compared to Tesla’s previous generation system. The company employs 1,000 people who label those images — noting cars, trucks, traffic signs, lane markings, and other features — to help train the large transformer.


At the August event, Tesla also said it can automatically select which images to prioritize in labeling to make the process more efficient. This is one of the many pieces that sets Tesla apart from its competitors.

Conclusion

Tesla has an advantage over Waymo (and other competitors) in three key areas thanks to its fleet of roughly 500,000 vehicles:

  • Computer vision
  • Prediction
  • Path planning/driving policy

Concerns about collecting the right data, paying people to label it, or paying for bandwidth and storage don’t obviate these advantages. These concerns are addressed by designing good triggers, using data that doesn’t need human labelling, and using abstracted representations (replays) instead of raw video.

The majority view among business analysts, journalists, and the general public appears to be that Waymo is far in the lead with autonomous driving, and Tesla isn’t close. This view doesn’t make sense when you look at the first principles of neural networks.

Wafer-Scale Hardware for ML and Beyond

What’s more, AlphaStar is a proof of concept of large-scale imitation learning for complex tasks. If you are skeptical that Tesla’s approach is the right one, or that path planning/driving policy is a tractable problem, you have to explain why imitation learning worked for StarCraft but won’t work for driving.

I predict that – barring a radical move by Waymo to increase the size of its fleet – in the next 1-3 years, the view that Waymo is far in the lead and Tesla is far behind will be widely abandoned. People have been focusing too much on demos that don’t inform us about system robustness, deeply limited disengagement metrics, and Google/Waymo’s access to top machine learning engineers and researchers. They have been focusing too little on training data, particularly for rare objects and behaviours where Waymo doesn’t have enough data to do machine learning well, or at all.

Wafer-scale AI for science and HPC (Cerebras)

Simulation isn’t an advantage for Waymo because Tesla (like all autonomous vehicle companies) also uses simulation. More importantly, a simulation can’t generate rare objects and rare behaviours that the simulation’s creators can’t anticipate or don’t know how to model accurately.

Pure reinforcement learning didn’t work for AlphaStar because the action space of StarCraft is too large for random exploration to hit upon good strategies. So, DeepMind had to bootstrap with imitation learning. This shows a weakness in the supposition that, as with AlphaGo Zero, pure simulated experience will solve any problem. Especially when it comes to a problem like driving where anticipating the behaviour of humans is a key component. Anticipating human behaviour requires empirical information about the real world.

Compiler Construction for Hardware Acceleration: Challenges and Opportunities

Observers of the autonomous vehicles space may be underestimating Tesla’s ability to attract top machine learning talent. A survey of tech workers found that Tesla is the 2nd most sought-after company in the Bay Area, one rank behind Google. It also found Tesla is the 4th most sought-after company globally, two ranks behind Google at 2nd place. (Shopify is in 3rd place globally, and SpaceX is in 1st.) It also bears noting that fundamental advances in machine learning are often shared openly by academia, OpenAI, and corporate labs at Google, Facebook, and DeepMind. The difference between what Tesla can do and what Waymo can do may not be that big.

2020 LLVM in HPC Workshop: Keynote: MLIR: an Agile Infrastructure for Building a Compiler Ecosystem

The big difference between the two companies is data. As Tesla’s fleet grows to 1 million vehicles, its monthly mileage will be about 1 billion miles, 1000x more than Waymo’s monthly rate of about 1 million miles. What that 1000x difference implies for Tesla is superior detection for rare objects, superior prediction for rare behaviours, and superior path planning/driving policy for rare situations. The self-driving challenge is more about handling the 0.001% of miles that contain rare edge cases than the 99.999% of miles that are unremarkable. So, it stands to reason that the company that can collect a large number of training examples from this 0.001% of miles will do better than the companies that can’t.

More Information:

https://www.datacenterdynamics.com/en/news/tesla-detail-pre-dojo-supercomputer-could-be-up-to-80-petaflops/

https://www.allaboutcircuits.com/news/a-circuit-level-assessment-teslas-proposed-supercomputer-dojo/

https://heartbeat.fritz.ai/computer-vision-at-tesla-cd5e88074376

https://towardsdatascience.com/teslas-deep-learning-at-scale-7eed85b235d3

https://www.autopilotreview.com/teslas-andrej-karpathy-details-autopilot-inner-workings/

https://phucnsp.github.io/blog/self-taught/2020/04/30/tesla-nn-in-production.html

https://asiliconvalleyinsider.com/2020/03/08/waymo-chauffeurnet-versus-telsa-hydranet/

https://www.infoworld.com/article/3597904/why-enterprises-are-turning-from-tensorflow-to-pytorch.html

https://cleantechnica.com/2021/08/22/teslas-dojo-supercomputer-breaks-all-established-industry-standards-cleantechnica-deep-dive-part-3/

https://semianalysis.com/the-tesla-dojo-chip-is-impressive-but-there-are-some-major-technical-issues/

https://www.tweaktown.com/news/81229/teslas-insane-new-dojo-d1-ai-chip-full-transcript-of-its-unveiling/index.html

https://www.inverse.com/innovation/tesla-full-self-driving-release-date-ai-day

https://videocardz.com/newz/tesla-d1-chip-features-50-billion-transistors-scales-up-to-1-1-exaflops-with-exapod

https://cleantechnica.com/2021/09/15/what-advantage-will-tesla-gain-by-making-its-own-silicon-chips/