In terms of the Computer Information Processing hardware used to perform AI computation and run Computer models of different types of Neural Networks, there are essentially 3 primary ways that this is done today.
The approach of using CPUs for AI processing and nearly all other types of computing has been the main method for at least the past 40 years, but they are not really used for large scale AI processing work today, except that is, to manage the operation of other types of AI processors. CPUs are really just not suitable for AI processing, and the following will help explain why.
Today there are at least several thousand different types of CPUs available in the commercial market, and these are found in everything from mobile phones, basic desktop Computers, servers, and thousands of datacenters around the world.
Reference: [102] Inside a Google data center
Even most cars contain a few simple CPUs to keep them running. At the most extreme end, some CPUs are exclusively manufactured and available to the Defense and National Security agencies, and they are extremely specialized and their specifications classified.
Essentially all typical CPUs will have one or more processing cores that each contain what is called a serial Arithmetic and Logic Unit (ALU) used to perform operations on data that is stored in volatile memory or permanent memory such as a disc. Each of these CPU cores operate in a purely serial and sequential way, which means they can only perform one operation at a time. This timing is based on a digitally controlled clocking signal that perfectly times and controls every component of the entire CPU, and this clock is often ticking many billions of times per second, or Gigahertz. This whole design is commonly referred to as a ‘von Neumann architecture’, after the genius Computer scientist who conceived this special structure in the 1940’s. Sometimes CPU cores are also cleverly structured so that they can run several calculations simultaneously in what is known as a serial pipeline process. This provides each CPU core with a rudimentary level of parallel calculation operations with each clock signal. Additionally, some CPU’s can have 2, 4, 8, 12, and up to hundreds of cores, and every single one is serially processing information.
The issue with performing calculations needed to simulate Neural Networks for AI processing of information, is that these require massively parallel calculations to work effectively. That is, billions, trillions, and more calculations, at exactly the same time. To put this in perspective, each Human Brain is estimated to perform 20 quadrillion calculations each second [34], which is a rate that totally dwarfs the processing rate of every CPU in the world today.
So, the only real way to perform AI processing using CPU’s is to use more and more CPUs in racks and racks of Computers housed in datacenters. Unfortunately, this becomes ridiculously expensive very quickly because of the number of CPUs required and the total power consumption needed to keep them all running. The bottom line is that CPUs are really not very good for extremely large scale AI processing. Not very good at all.
This limitation to serial processing of CPUs, is precisely why many Computers, and particularly those used for Computer gaming use something called a GPU.
CPUs are built using specially designed layers of semiconductor materials and metals formed into billions of tiny digital switches called called transistors. One of the biggest goals of all CPU manufacturers is to reduce the size of these transistors.
The size refers to the physical dimensions of the individual transistors that are constructed on a CPU chip, and are typically measured in nanometers (nm). Over time, advancements in semiconductor manufacturing technology by various companies, such as Intel Corporation and TSMC, have enabled the production of progressively smaller transistors. Smaller transistors allow for more transistors to be packed onto a single CPU chip, which leads to increased transistor density. Smaller transistors require less time for an electrical signal to travel through them, resulting in faster overall processing speed of the CPU. This reduced delay allows for higher clock speeds, enabling the CPU to execute instructions more rapidly.
Additionally, smaller transistors and higher densities contribute to improved power efficiency. Smaller transistors require less power to switch on and off, resulting in reduced total energy consumption. This energy efficiency allows for higher clock frequencies without excessive heat generation, which can limit the maximum operating speed of a CPU.
Transistor size and density play a significant role in CPU speed, but they are not the sole determining factors. Other components, such as the CPUs architecture, cache sizes, memory access times, and electrical propagation delays to interconnecting technologies, also impact the overall performance of a CPU. There are also optimizing software and algorithms can further enhance the CPU's total efficiency and speed, therefore both hardware and software affect CPU performance.
Most recently, a Belgian company called IMEC announced a breakthrough to size reduction by producing a 1nm transistor that could potentially be used in CPUs, GPUs and TPUs. In addition to increasing transistor density in the usual 2 dimensions, semiconductor chip designs are also becoming more sophisticated by adding a 3rd dimension where multiple layers of interconnected transistors stacked on top of each other as well.
Anastasi explains these technology advancements in the following video.
Reference: [130] New CPU Technology just Arrived - IBM, Imec, Intel - Anastasi In Tech
The GPU is a specialized type of processor that is designed to process large volumes of information in parallel, making them highly efficient for graphics-intensive applications that require extremely large numbers of matrix mathematic calculations. If you’ve ever played a modern Computer game on a desktop Computer or gaming console, then you’ve used a GPU of some type, and possibly didn’t even know it.
These matrix calculations can be performed on either simple numbers called ‘Integers’ or on much larger and more complicated numbers called ‘Floating Points’, but either way, GPUs can perform matrix calculations amazingly fast, it’s just that Integers usually take much less time to process than Floating Points. GPUs were originally developed to perform tasks like processing images, videos, and 3D graphics, with huge amounts of information being processed in parallel, at exactly the same time, with each digital clock pulse coming from a clocking circuit inside the GPU.
The main advantage of GPUs is their ability to perform parallel processing on large sets of information, which enables them to perform more complex calculations and operations massively faster than CPUs. This is because GPUs have many more task specific processing cores than most CPUs, allowing GPUs to process multiple sets of information simultaneously. As a result, GPUs are especially well-suited for tasks such as rendering 3D graphics in real time for Computer games, scientific computing and information visualization, and or course, AI information processing. In the Information Technology (IT) industry, processing very large volumes of information is typcially referred to as High Performance Computing (HPC), and there are several major companies providing extremely powerful Computers for commercial and often classified applications. By contrast, CPUs are extremely flexible general-purpose processors that can perform information processing on a wide range of tasks, but this is done in serial, one operation after another, more sloowwwly.
The GPUs ability to perform specialized matrix operations and vector arithmetic is essentially what is needed for AI information processing. That is, building and using Neural Networks for AI really just requires lots of matrix mathematics operations performed in cleverly structured and sequenced ways. GPUs provide significant performance advantages over CPUs for a wide range of applications, making them among the most essential component in most of today’s AI systems.
GPUs also have specialized and extremely fast memory storage and retrieval abilities, such as VRAM (Video Random Access Memory), which is an optimized system for the fast and efficient processing of large arrays of information in parallel.
Arguably, by far the most advanced and successful GPU company in the world is NVIDIA, which is a technology company that has been designing, building, and supplying GPUs for Computer gaming, professional visualization, datacenters, and automotive markets for many years. NVIDIA is a market leader for HPC applications. NVIDIA was founded by Jensen Huang, Chris Malachowsky, and Curtis Priem in April 1993.
It would be fair to rank Jensen Huang as being among the most impressive company CEOs in the world due to his rare mix of thought leadership in both high technology and global business development. NVIDIA raised $20 million at its Initial Public Offering on the NASDAQ in Jan 1999 at US $12/share, and today its total investments are not publicly disclosed, but has a market capitalization of over USD $600 billion. It is headquartered in Santa Clara, California, USA. https://www.nvidia.com/
Right now, using a system that NVIDIA call DGX, this company is behind nearly every major AI system that is used for world leading research and in commercial operation. Nearly Every... Single... One! And the company keeps on releasing ever more impressive GPU technology, year after year.
“DGX has become the essential instrument of AI” – Jensen Huang – GTC 2023 Keynote – [Appendix 5]
Over the past couple of years, the heart of NVIDIA’s Capabilities in AI was built on the foundation of an impressive GPU called the A100 that is used in their DGX system, and this recently was superseded by an even more impressive GPU called the H100. Quite seriously, NVIDIA just gets better and better at what they do. https://www.nvidia.com/en-au/data-center/h100/
Here’s their pitch:
“Tap into unprecedented performance, scalability, and security for every workload with the NVIDIA® H100 Tensor Core GPU. With NVIDIA NVLink® Switch System, up to 256 H100 GPUs can be connected to accelerate exascale workloads. The GPU also includes a dedicated Transformer Engine to solve trillion-parameter language models. The H100’s combined technology innovations can speed up large language models (LLMs) by an incredible 30X over the previous generation to deliver industry-leading conversational AI.”
So, translating this AI techno-geek sales pitch into something that more normal Humans can understand, the H100 GPU is designed to accelerate extremely large-scale AI Computer information processing, and technically and functionally blows away it’s previous A100 GPU, and essentially every other CPU and GPU in the market today, by far.
The functional specifications of the H100 GPU provide a completely unmatched level of Computer information processing Capabilities. But that’s not enough for NVIDIA, or the infinitely insatiable Compute intensive demands of its AI customer applications. So what NVIDIA does is put up to 8 of these H100 GPUs into a single hardware rack they call an NVIDIA HGX AI system, which is custom designed so the H100 GPUs all work together in the rack as a beautifully orchestrated single GPU.
Then up to 32 of these racks can be connected to a custom designed extremely high speed network communications backplane interface that allows 32 racks to communicate together and work on AI applications in unison. Overall, it’s an insanely powerful Computer system that can run 256 H100 GPUs all together at the same time. https://www.nvidia.com/en-au/data-center/hgx/ [42]
If you have the money and are wanting the absolute ultimate virtual reality gaming Computer available on Earth, then there is no question this is it.
A conceivably totally wild idea is that the H100 HGX AI system is so computationally advanced and powerful at this point in time, if it were also technologically possible to create a Brain Computer Interface (BCI) with an extremely high information bandwidth, and then neuro-digitally interface it with the neural channel bundles of the Human spinal cord, optic nerves, and olfactory nerves, each extending from a Human Brain, then one day it might even be theoretically possible to create a totally realistic virtual environment for the Human Brain.
It is conceivable that the Human Brain might not even be consciously self-aware that it was completely disconnected from actual reality of a Human body. It would, for all intents and purposes be inserted into ‘The Matrix’. All that would be needed is the BCI and a fully contained biological life support system for the Human Brain.
So, if one day in the future NVIDIA releases a ‘Matrix GPU Cluster' with 'Blue Pill' BCI Technology, we Humans might all know what’s in store for us ;)
Now, coming back to the techno-sales pitch above, notice the specific use of the term ‘Tensor’. Well this is explained next.
Bottom line is this: GPUs are technically ingenious and immensely computationally powerful.
The TPU is a specialized type of Computer information processor designed to perform ‘tensor’ calculations that are used in AI information processing, and require high-speed operations on very large matrices which contain more matrices.
In the field of mathematics, a ‘tensor’ is an computational object that generalizes scalars, vectors, matrices, and larger N-dimensional arrays. A tensor is characterized by its rank or order, which represents the number of indices needed to specify each element of the tensor. One way to think of them is as matrices... of matrices... of matrices, etc. Tensors are used in HPC systems to represent complex physical quantities, multidimensional geometric transformations, and dynamic models of the world around us in physics, engineering, chemical, biological, and AI applications.
In 2013, “Google started to see the real potential for deploying neural networks to support a large number of new services. During that time it was also clear that, given the existing hardware, if people did voice searches for three minutes per day or dictated to their phone for short periods, Google would have to double the number of datacenters just to run machine learning models.” [16.1]
Google recognized a need for AI applications, and a deceleration in Moore’s Law which had previously driven the development of faster CPUs. This spurred investment in the development of a highly specialized parallel-processor semiconductor design called the TPU.
Reference: [16.1] First In-Depth look at Google's TPU Architecture - The Next Platform
The TPU processor was originally developed within Google and then introduced to the general public in 2016. TPUs can run a wide range of Neural Network models and are optimized for running a pre-built software library of AI information processing functions called ‘TensorFlow’, also referred to as a Machine Learning toolkit.
TensorFlow makes a programmers job of developing AI software applications much simpler, because the programmer does not need to get involved in the deeper mathematics, neural network architectures, and more complex Computer algorithms that are actually needed to perform Neural Networks information processing with large matrices, etc. TensorFlow enables complex AI applications to be built amazingly quickly with just a few lines of software programming code. Most of the Neural Network complexity involved in AI systems is abstracted away so that basically anybody can use TensorFlow.
“The TPU is programmable like a CPU or GPU. It isn’t designed for just one neural network model; it executes CISC [complex instruction set computer] instructions on many [neural] networks (convolutional, [Long Short-Term Memory models], and large, fully connected [neural network] models). So it is still programmable, but uses a matrix [data register] as a primitive instead of a vector or scalar [data register].” [16.1]
TPUs are especially useful for large-scale AI tasks because these require massive amounts of information processing. TPUs are typically made available to AI researchers via racks of TPUs housed within cloud-based datacenter environments, such as Google Cloud Platform (GCP), where they can be purchased for use on-demand.
Since its original introduction, several major companies are now building TPUs and related designs of Neuromorphic Processors, including:
Google - the company that created TPUs, continues to be the leading provider of these specialized processors. Google Cloud Platform offers TPU rental services to customers.
NVIDIA - is a seriously large player in the AI hardware space and has developed its own AI accelerator GPU chips such as the H100 which includes tensor matrix operations.
Intel - is a leading provider of CPUs and recently entered the AI hardware space with the Nervana Neural Network Processor (NNP). The NNP is a specialized AI accelerator chip designed to compete with Google’s TPUs and NVIDIA’s GPUs. Intel is also working with another AI processor originally developed by Habana Labs which wass acquired for USD $2 billion [16.13],[16.14].
Microsoft - has developed its own AI hardware accelerator, called the Brainwave, which is optimized for running deep neural networks.
Amazon - has developed its own AI hardware accelerator, called the Inferentia, which is designed to accelerate AI machine learning workloads in the cloud. Afterall, AWS is the largest datacenter provider in the world, so why not?
The business of providing AI hardware TPU accelerators is becoming increasingly competitive, with several major companies vying for market share. The demands of AI will continue to grow in complexity and scale, so the demand for specialized hardware accelerators such as TPUs will continue to increase.
That is, of course, until something better comes along.