While coming to the end of 1988, and preparing for the next and final year of an undergraduate Applied Science degree in Robotics and Digital Technology, the whole class had already learned all the internal workings of industrial six-axis welding robots down to the semiconductor circuitry, so the idea of building a serpentine robot seemed much more operationally useful and interesting challenge to take on as a year long project. In discussing this with the digital electronic design tutor, the advice was this project would be far too expensive for the University to support. However, it was suggested that it could be a better idea to explore another emerging digital technology called Neural Networks, which was part of the nascent scientific field of Artificial Intelligence (AI).
At the time, the class had already started to learn how to code in early AI programming languages such as Prolog and LISP (((((..(think of software code written in infinitely nested brackets that led directly to the depths of software debugging Hell)..))))) so we all had a basic understanding of symbolic AI logic and some quite amazing AI programming tools. The Neural Network technology sounded potentially more functionally useful, and fortunately the University approved this project to proceed.
Understanding Neural Networks started with what felt like an Everest scale mountain of foundational knowledge to be climbed through research around these weirdly esoteric subjects such as Parallel Distributed Processing [6] [30], Cognitive Science and Neurocomputing [19] [20]. It turned out, the true foundations of AI come from the biological architectural designs of neural information processing in Brains and the closed-loop neurological control of things like arms and legs. Eventually it became possible to wrap this feeble Brain around the bulk of the available research of how biological neurons essentially work and interface with the rest of a bio-neurologically controlled body. It then became possible to design and build a similar, but extremely limited, digital processing equivalent of this system called an artificial Neural Network [48].
This Neural Network used something called a 'backpropagation' error correction method with a ‘supervised learning’ algorithm, and ordinary gradient descent which works just like most closed-loop control systems. First this Neural Network was built in C software on the University Computers which were big, and then in faster digital hardware [24] [26] [27] with optimized processor assembly code written to electrically programmable memory chips. Much of this project was founded on the work of two pioneering AI researchers named David E. Rumelhart and James L. McClelland, who had written some excellent books on Parallel Distributed Processing and the functional implementation of backpropagation [6] [30].
If you have never seen what a basic Neural Network looks like, then just jump to read this section What is AI and watch the video, as it's a quick and simple explainer. Then come back here.
To enable a backpropagating Neural Network to learn, it just required some training Input information that told it what it needed to learn. The training also included all the known answers for the training Input information provided as corresponding Output information. So for any given Input information, if the Neural Network's Computed Output was wrong by any valued amount, it just needed to compare its Computed Output to the known correct Output information. This would provide the ability to Compute an amount of error between the Computed Output and known Output. Simple. The Neural Network would then use this error amount to slowly modify and adapt its internal network structure through adjustments of the strength of connections between the individual neurons, called weights and this other special added Input connection of each Neuron called a bias.
This error correction and gradual training of the Neural Network was done over thousands to millions of tediously slow repetitions. But, the surprising feature of the Neural Network, was that over the time span of providing lots and lots of training information, it would increasingly learn to recognize the patterns of relationships between all of the training Input information and Output information. Eventually, once this Neural Network reduced all its errors across this training information to some low value, say around 1%, it was ready for use in the real world without the need to provide known Output information. You could now give the Neural Network loads of entirely new Input information that it had never been trained on, and it would Compute the Output information correctly almost all of the time. So, a Neural Network is a special type of Computer that learns. This was wild.
The following two equations essentially enable a backpropagating Neural Network to learn, and although the mathematics symbology makes them appear somewhat complex, when their meaning is decoded and converted to software, what they do inside a Neural Network software program is actually ridiculously simple.
Equation for change of Weights: ∆p Wyz(n) = η * δpz * OUTpy + α * ∆p Wyz(n-1)
Equation for change of Biases: ∆p Bz(n) = η * δpz * 1 + α * ∆p Bz(n-1)
This means an important and potentially very dangerous societal issue with AI and Neural Networks, is that AI science is not even close to the vast complexity of rocket science. So don't be fooled by 'smoke and mirrors' being presented by some AI developers, because it is important to understand that in reality, anybody can build a Neural Network that creates AI.
An extremely interesting feature that provided some insight into how Neural Networks actually work inside is that when it was trained on some binary information that was originally designed for the configuration of a high performance Application Specific Integrated Circuit (ASIC), as expected, the Neural Network would eventually learn all the relationships between the binary Input information and desired binary Output information of the ASIC.
Now, when the Neural Network and all the associated weighted connections were very thoroughly analyzed, it became apparent the Neural Network had automatically built what is known in ASIC electronics design as a Karnaugh Map (K-Map) for the entire digital circuit [24]. This K-Map is something that defines all the boolean algebra and logic gates needed to design the connections of all the digital circuitry of an ASIC. The Neural Network figured out how to build this K-Map entirely by itself. Until this time, K-Maps had always required Human Intelligence to be built, often combined with very advanced ASIC design tools for very large and complex ASICs. Therefore, Neural Networks are unbelievably Intelligent at building K-Maps.
So even though many people say Neural Networks are “black boxes that cannot be understood", this is not entirely true if you are able to deeply analyze sections of them. Notably however, as Neural Networks have become increasingly larger and learned more complex relationships between sets of information, it has become exponentially more difficult to understand the intricacies of the circuit designs a Neural Network has built by itself as it learns. It's reasonable to say the Neural Networks today are so incredibly large, both wide and deep, they cannot be understood by Humans. Simply, Humans don't have the Intelligence this requires.
Notably, the K-Map is a purely binary construction with only 0’s and 1’s, whereas Neural Networks also often operate over a continuous analog range, usually from 0 through to 1, and so they can form significantly more complex multivariate non-linear multidimensional algebraic [28] functions.
As part of this particular degree, the class was also learning analog and digital control theory, analog and digital signal processing (DSP), and what felt like every kind of mathematical and multi-dimensional transform ever devised... all these years later there’s still some traumatic Brain injury. However, these theories were really useful for creatively designing different kinds of Neural Networks using multilayer feedback loops, which today are used in some Recurrent Neural Networks (RNNs). These were very hard to reliably train.
Overall, it turns out these Neural Networks were really quite a simple and very different type of computing system that did work in a massively parallel way. It became immediately apparent that it is possible to build a Computer that learns, almost by itself, by building multiple layers of these neural network components and creating different kinds of looped-feedback connections between the deeper and earlier layers.
The backpropagation of errors with gradient descent was really just an error correction and optimization algorithm that was very computationally expensive, and actually not very good at global optimisation. Getting the Neural Network error to below 1% could sometimes be really difficult because of what is called local minima.
Local minima are something conceptually akin to small valleys across a vast mountain range (built in mathematics), and you are trying to find the lowest possible point in the lowest of all valleys within the entire mountain range (the lowest possible mathematical error), but you are only able to see what's around you from where you are standing (your current error value), so a lot of your view can be obscured by various mountains of different sizes around you. Depending on where you happen to be standing at the time (your current error value), you can't always see the lowest point, so you don't know which direction to head (mathematically) to get to the absolute lowest possible point in the entire mountain range.
Local minima are arguably one of the biggest problems with Neural Networks. The problem is generically called Neural Network optimization, and backpropagation is one approach, but it gets stuck in local minima. This can be EXTREMELY annoying.
Today, there are other optimisation algorithms and Neural Network designs that improve these issues by orders of magnitude, but it seems most Neural Networks today still use some variant of the good’ol backpropagation gradient descent algorithm, which is really quite sloooooow, and computationally intensive.
Frustratingly, over lots of experiments, no matter how many different Neural Network designs were developed, there's always this one big damn problem! Through years of ongoing experimentation with Neural Networks it became very obvious that there is a definite Hard Limit on AI that is essentially defined by just one thing: Computer processing power.
There was just not enough computing power to do the things that looked really interesting, like getting a Neural Network to learn to model the entire world’s stock exchange system using the last 5 or so years of stock price back-test information, and then predict future stock prices at any exchange [23] in realtime. Well, you can guess why that's interesting. :) After all, the Neural Network only needs to be a bit more right than wrong most of the time, and it is generally performing statistically better than potentially every investment fund manager and day trader in the world.
It became necessary to begin exploring a range Neural Network types and early parallel computing technologies such as Vector Processors, Transputers, and various Harvard Architecture processors, clustered Linux machines, through to various generations of GPUs made by NVIDIA. Despite the fact these were all very computationally powerful, and expensive at useful scale, even these have Hard Limits on Computer processor power.
Until the introduction of datacentre server infrastructure, offered as ‘cloud computing’ from around 2010, this Hard Limit has been really damn annoying. ie. basically unworkable for really serious AI applications with Neural Networks.
[79] Computation used to train notable artificial intelligence systems - Measured in total petaFLOPs - Publisher: Our World in Data
Immediately after graduating with an Applied Science degree in Digital Technology majoring in AI, the first job out of the gate was with a large multinational software company called Logica, headquartered in the UK. Before even officially starting with Logica, the company provided the opportunity to go to the Logica Cambridge laboratories where all of the company's pioneering AI work was being researched and developed. It was very surprising to learn that many of the largest banks in the world used Logica software for nearly everything they did. Logica already had early generation AI systems in place at major global banks for credit card fraud detection using an early AI technology called Expert Systems, which is just a special database with thousands of conditional Human rules turned into software code. In continuing its commercial AI software R&D, Logica Cambridge had started building more advanced Neural Networks for some enterprise grade commercial software applications.
The subject of Neural Network error optimization for faster training became the key point of discussion during the time at Logica Cambridge, where eventually the head of the AI lab handed over a book and simply said: "Read this and see what you can do with it, and then let's talk further". The book was titled: 'Genetics Algorithms in Search, Optimisation and Machine Learning - combinatorial optimization, algorithms, machine learning' written by David E. Goldberg [7].
Well after reading and converting the strange new ideas contained in the book into software, it was staggering to realize that genetic algorithms are extremely powerful at solving seemingly intractable problems. It seems that life uses an absolutely ingenious system of countless numbers of variable genes to encode and distribute the programmed information necessary for intelligent life to survive and grow. This genetic algorithm is shockingly brilliant and totally terrifying, at the same time.
Intelligence automatically develops from a special type of program contained within and across all of genetics.
The importance of this point and how it relates to Hyperselfish Intelligence and AI will be become clearer as you read further.
Clearly, there have been some stunning breakthroughs in Neural Network designs and learning algorithms, particularly in the past 10 years, but quite seriously, these are all basically variations built from the same foundational ideas going back more than 50 years. AI researchers are building idea upon idea, but are really creating variations on a theme and progressive fractional improvements, with occasional small but very important discontinuous jumps. So, apologies it this offends some heavily dedicated Neural Network researchers and cognitive scientists, etc.
Fundamentally, it is only through raw increases in Computer processing power that AI has really been able to learn the relationships within larger sets of information and truly become incrementally better over the years. That’s it. Again, sorry.
Now if you know anything about something called “Moore’s Law”, taken from the late genius and Intel chip designer Gordon Moore, you’ll know that the power of Computer information processing has been doubling roughly every 1.5 - 2 years for around the same cost. Give or take. Although in recent years this rate of doubling has been slowing.
So in relation to AI’s Capabilities, this leads to the following obvious logical outcome:
The Hard Limit of AI Capabilities is actually rising with time as the power of Computer information processing continually increases.