From machine learning to artificial intelligence

Download
From machine learning to artificial intelligence

,

June 30, 2024

What is intelligence?

Intelligence is a complex concept that has captivated the minds of philosophers, psychologists and scientists for centuries. From the ancient Greeks to modern-day researchers, the quest to understand and define intelligence has been a central theme in the study of the human mind. The ancient Greek philosophers Plato and Aristotle laid the foundation for later theories of intelligence, with Plato emphasizing the concept of innate knowledge and the ability to understand universal truths through reason, while Aristotle focused on the acquisition of knowledge through experience and observation.

In recent years, the field of artificial intelligence has taken centre-stage in the discussion of intelligence, drawing heavily from and influencing the study of biological intelligence. However, the lack of a clear definition of intelligence has posed a significant challenge to research progress in the field. In 2007, Shane Legg, co-founder of DeepMind, sought to address this issue by analyzing ten informal definitions of intelligence from leading psychologists and philosophers. He distilled the common features and proposed a definition that captures the essence of intelligence in its most general form:

Intelligence measures an agent’s ability to achieve goals in a wide range of environments
Shane Legg, Co-founder, Deepmind

This breaks down intelligence into three key components: an agent, environment, and goals. But in Legg's definition, the most important characteristic of intelligence is generality – the ability to achieve goals in a wide range of environments. Until recently, this property of generality has been the missing piece in artificial intelligence systems.

Machine learning before deep learning

In 1943, the neuropsychologist Warren McCulloch and logician Walter Pitts made a groundbreaking observation, noticing the similarity between biological neurons and logic gates. This led them to propose the first mathematical model of a neuron. Their model was elegantly simple, but possessed no ability to learn from data. The introduction of Rosenblatt's perceptron in 1957 brought about significant improvements to the McCulloch and Pitts neuron, most notably the first learning algorithm for artificial neurons known as the Perceptron learning rule. This development marked the realization of Aristotle's view of intelligence as learning from experience in a machine, sparking a period of intense interest in artificial intelligence during the 1960s. Leading researchers made bold predictions, with H. A. Simon stating in 1965 that "machines will be capable, within twenty years, of doing any work a man can do."

Frank Rosenblatt with the Mark I Perceptron

However, the optimism of the 1960s did not last. The high expectations were not met, leading to the first "AI winter" of the 1970s, characterized by significant cuts in research funding. Interest in artificial intelligence returned in the 1980s with the rise of "expert systems," which used the knowledge of domain experts to create logical rules applied to data to solve specific problems. This approach aligned more closely with Plato's view of intelligence as reasoning over universal truths. Expert systems proved genuinely useful and led to the development of dedicated hardware like Lisp machines and logic-based programming languages such as Prolog. However, this renewed interest was short-lived, as the advent of personal computers and general-purpose programming languages offered more capabilities at a lower cost than dedicated expert systems, leading to a second AI winter from the late 1980s into the 1990s.

Despite the setbacks, the 1990s and early 2000s witnessed a transformation in the field of artificial intelligence through the application of rigorous mathematical and statistical techniques. This led to the development of powerful new algorithms with weird and wonderful names such as support vector machines (1992), AdaBoost (1995), and random forests (2006). Due to the limited computational power of the time, these algorithms were often used alongside a technique known as feature engineering, which involves selecting and transforming raw data into useful inputs for a machine learning model. Much like a skilled chef is required to choose and prepare the best ingredients for a recipe to ensure the final dish turns out well, feature engineering required skilled data scientists for optimal results. This all changed with the advent of the deep learning era.

The deep learning era

The rise of cloud computing, the internet, and the availability of large amounts of data gave birth to a new paradigm in artificial intelligence. Deep neural networks, consisting of multiple layers of artificial neurons, are fed raw data annotated with the ground truth for the task at hand. Unlike previous approaches that relied on hand-crafted feature engineering, deep neural networks can autonomously develop their own representations from raw inputs, given sufficient data and a model with a large enough set of parameters. Within deep neural networks, intermediate numerical representations are repeatedly combined with model parameters to generate the model output. This approach, where a single operation is repeatedly applied to multiple data points, aligns well with a new form of computing that was also being developed simultaneously: general-purpose GPU computing.  

Deep learning for image classification

Throughout the 2010s, deep learning, big data, and GPU computing brought about numerous successes in various domains. Machine translation, object detection, speech recognition, and drug discovery all became possible. The raw ingredients of the deep learning era were annotated data and computational power. The success of these systems was directly proportional to the size of the model and dataset – the bigger the model and the larger the dataset, the greater the success. However, this approach also revealed a problem: one of these ingredients was rapidly overpowering the other.

Compute scales faster than people: The large language models

Since the invention of the semiconductor, the number of transistors on an integrated circuit has roughly doubled every two years, a phenomenon known as Moore's Law, named after Gordon Moore, the co-founder of Intel. Recently, Nvidia claimed to have achieved a 1000x single GPU performance increase over 8 years, far surpassing the 16x increase expected from Moore's Law. Although Moore's Law continues to hold at the transistor level, this remarkable performance increase has been achieved through a relentless focus on the entire GPU computing stack, including architecture, software, memory technology, and algorithms. These impressive improvements in single GPU performance have also been accompanied by a shift to distributed training on clusters of thousands of GPUs, scaling GPU compute both vertically and horizontally.

The size of annotated datasets has inevitably been unable to keep pace with the growth in GPU compute, despite better annotation tools and crowd-sourced data annotations from distributed teams of data annotators around the globe. The solution to this problem has been to use self-supervised learning. Self-supervised learning is a technique where a machine learning system learns directly from the data, without requiring labelled examples. In language modelling, this is typically achieved by training a model to predict the next word in a sentence. In this way, terabytes of data scraped from the internet can be used to train models without the need for human annotation.

Scaling compute, model size, and data in this manner has ushered in the era of Large Language Models (LLMs). The small language models of the deep learning era, with millions of parameters, were effectively limited to performing single tasks, such as machine translation. However, as models were scaled out to hundreds of billions of parameters, the resulting LLMs could complete a wide range of tasks without explicit training, and accept new task instructions in natural language. Model training can be viewed as a form of data compression: the training process distils knowledge into the model weights.

LLM training as internet compression

Shane Legg’s definition of intelligence has now been partially met – LLMs are able to achieve goals in a wide range of tasks, if not yet environments. And it is an Aristotelian intelligence that has prevailed – LLMs have learned their skills through experience and observation of huge amounts of data. Although the next-word prediction task used to train these models appears deceptively simple, solving it across petabytes of data requires the formation of abstract concepts and basic reasoning skills. Despite their impressive abilities, LLMs remain prone to hallucinations – plausible responses that are not grounded in fact. They also struggle with tasks that require complex reasoning and planning. The Platonic ideal of intelligence – understanding universal truth through reasoning and contemplation – remains at the research frontier.

A future fuelled by exponential growth

The past two years have seen an explosion of activity, sparked by the release of ChatGPT in late 2022. Investment into hardware, software, data and algorithms is growing daily, leading to an exponential improvement in model capabilities. Humans are notoriously poor at predicting exponential growth. As the renowned futurist and computer scientist Ray Kurzweil has observed, “If you take 30 steps linearly, you get to 30. If you take 30 steps exponentially, you are at a billion.”

If you take 30 steps linearly, you get to 30. If you take 30 steps exponentially, you are at a billion.
Ray Kurzweil, Computer Scientist and Futurist

The race is now on to train ever-larger models with increasingly impressive capabilities. This year, we have seen the mainstream adoption of multi-modal models, capable of understanding not just text, but also images, video, and audio data. The ability of models to reason, plan, and act on their environment will move from the research frontier to reality, finally bringing Plato’s intelligence to machines and fully realizing Legg's definition of intelligence.

While predicting the path of exponential growth may be challenging, one thing is clear: while the narrow models of the deep learning era will continue to play a role where speed and efficiency are paramount, the future will be increasingly shaped by Generative AI (GenAI) – models that exhibit broad, general intelligence and can adapt to a wide range of tasks and environments. As the field continues to rapidly advance, the potential applications of this technology will only continue to grow, shaping the world in ways we have yet to imagine.

Accelex and GenAI

At Accelex, we have always sought to utilize the best available technology to serve our customers. As we move into a new era, we are fully embracing a GenAI future and have been rapidly integrating this technology into our product lineup. Accelex was founded with the mission to bring automation and structure to private markets data. We recognize GenAI as a powerful tool to further that mission.

By bridging the gap between AI models and practical AI products, Accelex empowers our customers with the capabilities of GenAI. Integrating GenAI with our best-in-class document acquisition, workflow tools, and analytics, provides a comprehensive platform that enhances efficiency and insight into private markets investments.

Interested in learning more about Accelex? Then get in touch for a demo.

What is intelligence?

Intelligence is a complex concept that has captivated the minds of philosophers, psychologists and scientists for centuries. From the ancient Greeks to modern-day researchers, the quest to understand and define intelligence has been a central theme in the study of the human mind. The ancient Greek philosophers Plato and Aristotle laid the foundation for later theories of intelligence, with Plato emphasizing the concept of innate knowledge and the ability to understand universal truths through reason, while Aristotle focused on the acquisition of knowledge through experience and observation.

In recent years, the field of artificial intelligence has taken centre-stage in the discussion of intelligence, drawing heavily from and influencing the study of biological intelligence. However, the lack of a clear definition of intelligence has posed a significant challenge to research progress in the field. In 2007, Shane Legg, co-founder of DeepMind, sought to address this issue by analyzing ten informal definitions of intelligence from leading psychologists and philosophers. He distilled the common features and proposed a definition that captures the essence of intelligence in its most general form:

Intelligence measures an agent’s ability to achieve goals in a wide range of environments
Shane Legg, Co-founder, Deepmind

This breaks down intelligence into three key components: an agent, environment, and goals. But in Legg's definition, the most important characteristic of intelligence is generality – the ability to achieve goals in a wide range of environments. Until recently, this property of generality has been the missing piece in artificial intelligence systems.

Machine learning before deep learning

In 1943, the neuropsychologist Warren McCulloch and logician Walter Pitts made a groundbreaking observation, noticing the similarity between biological neurons and logic gates. This led them to propose the first mathematical model of a neuron. Their model was elegantly simple, but possessed no ability to learn from data. The introduction of Rosenblatt's perceptron in 1957 brought about significant improvements to the McCulloch and Pitts neuron, most notably the first learning algorithm for artificial neurons known as the Perceptron learning rule. This development marked the realization of Aristotle's view of intelligence as learning from experience in a machine, sparking a period of intense interest in artificial intelligence during the 1960s. Leading researchers made bold predictions, with H. A. Simon stating in 1965 that "machines will be capable, within twenty years, of doing any work a man can do."

Frank Rosenblatt with the Mark I Perceptron

However, the optimism of the 1960s did not last. The high expectations were not met, leading to the first "AI winter" of the 1970s, characterized by significant cuts in research funding. Interest in artificial intelligence returned in the 1980s with the rise of "expert systems," which used the knowledge of domain experts to create logical rules applied to data to solve specific problems. This approach aligned more closely with Plato's view of intelligence as reasoning over universal truths. Expert systems proved genuinely useful and led to the development of dedicated hardware like Lisp machines and logic-based programming languages such as Prolog. However, this renewed interest was short-lived, as the advent of personal computers and general-purpose programming languages offered more capabilities at a lower cost than dedicated expert systems, leading to a second AI winter from the late 1980s into the 1990s.

Despite the setbacks, the 1990s and early 2000s witnessed a transformation in the field of artificial intelligence through the application of rigorous mathematical and statistical techniques. This led to the development of powerful new algorithms with weird and wonderful names such as support vector machines (1992), AdaBoost (1995), and random forests (2006). Due to the limited computational power of the time, these algorithms were often used alongside a technique known as feature engineering, which involves selecting and transforming raw data into useful inputs for a machine learning model. Much like a skilled chef is required to choose and prepare the best ingredients for a recipe to ensure the final dish turns out well, feature engineering required skilled data scientists for optimal results. This all changed with the advent of the deep learning era.

The deep learning era

The rise of cloud computing, the internet, and the availability of large amounts of data gave birth to a new paradigm in artificial intelligence. Deep neural networks, consisting of multiple layers of artificial neurons, are fed raw data annotated with the ground truth for the task at hand. Unlike previous approaches that relied on hand-crafted feature engineering, deep neural networks can autonomously develop their own representations from raw inputs, given sufficient data and a model with a large enough set of parameters. Within deep neural networks, intermediate numerical representations are repeatedly combined with model parameters to generate the model output. This approach, where a single operation is repeatedly applied to multiple data points, aligns well with a new form of computing that was also being developed simultaneously: general-purpose GPU computing.  

Deep learning for image classification

Throughout the 2010s, deep learning, big data, and GPU computing brought about numerous successes in various domains. Machine translation, object detection, speech recognition, and drug discovery all became possible. The raw ingredients of the deep learning era were annotated data and computational power. The success of these systems was directly proportional to the size of the model and dataset – the bigger the model and the larger the dataset, the greater the success. However, this approach also revealed a problem: one of these ingredients was rapidly overpowering the other.

Compute scales faster than people: The large language models

Since the invention of the semiconductor, the number of transistors on an integrated circuit has roughly doubled every two years, a phenomenon known as Moore's Law, named after Gordon Moore, the co-founder of Intel. Recently, Nvidia claimed to have achieved a 1000x single GPU performance increase over 8 years, far surpassing the 16x increase expected from Moore's Law. Although Moore's Law continues to hold at the transistor level, this remarkable performance increase has been achieved through a relentless focus on the entire GPU computing stack, including architecture, software, memory technology, and algorithms. These impressive improvements in single GPU performance have also been accompanied by a shift to distributed training on clusters of thousands of GPUs, scaling GPU compute both vertically and horizontally.

The size of annotated datasets has inevitably been unable to keep pace with the growth in GPU compute, despite better annotation tools and crowd-sourced data annotations from distributed teams of data annotators around the globe. The solution to this problem has been to use self-supervised learning. Self-supervised learning is a technique where a machine learning system learns directly from the data, without requiring labelled examples. In language modelling, this is typically achieved by training a model to predict the next word in a sentence. In this way, terabytes of data scraped from the internet can be used to train models without the need for human annotation.

Scaling compute, model size, and data in this manner has ushered in the era of Large Language Models (LLMs). The small language models of the deep learning era, with millions of parameters, were effectively limited to performing single tasks, such as machine translation. However, as models were scaled out to hundreds of billions of parameters, the resulting LLMs could complete a wide range of tasks without explicit training, and accept new task instructions in natural language. Model training can be viewed as a form of data compression: the training process distils knowledge into the model weights.

LLM training as internet compression

Shane Legg’s definition of intelligence has now been partially met – LLMs are able to achieve goals in a wide range of tasks, if not yet environments. And it is an Aristotelian intelligence that has prevailed – LLMs have learned their skills through experience and observation of huge amounts of data. Although the next-word prediction task used to train these models appears deceptively simple, solving it across petabytes of data requires the formation of abstract concepts and basic reasoning skills. Despite their impressive abilities, LLMs remain prone to hallucinations – plausible responses that are not grounded in fact. They also struggle with tasks that require complex reasoning and planning. The Platonic ideal of intelligence – understanding universal truth through reasoning and contemplation – remains at the research frontier.

A future fuelled by exponential growth

The past two years have seen an explosion of activity, sparked by the release of ChatGPT in late 2022. Investment into hardware, software, data and algorithms is growing daily, leading to an exponential improvement in model capabilities. Humans are notoriously poor at predicting exponential growth. As the renowned futurist and computer scientist Ray Kurzweil has observed, “If you take 30 steps linearly, you get to 30. If you take 30 steps exponentially, you are at a billion.”

If you take 30 steps linearly, you get to 30. If you take 30 steps exponentially, you are at a billion.
Ray Kurzweil, Computer Scientist and Futurist

The race is now on to train ever-larger models with increasingly impressive capabilities. This year, we have seen the mainstream adoption of multi-modal models, capable of understanding not just text, but also images, video, and audio data. The ability of models to reason, plan, and act on their environment will move from the research frontier to reality, finally bringing Plato’s intelligence to machines and fully realizing Legg's definition of intelligence.

While predicting the path of exponential growth may be challenging, one thing is clear: while the narrow models of the deep learning era will continue to play a role where speed and efficiency are paramount, the future will be increasingly shaped by Generative AI (GenAI) – models that exhibit broad, general intelligence and can adapt to a wide range of tasks and environments. As the field continues to rapidly advance, the potential applications of this technology will only continue to grow, shaping the world in ways we have yet to imagine.

Accelex and GenAI

At Accelex, we have always sought to utilize the best available technology to serve our customers. As we move into a new era, we are fully embracing a GenAI future and have been rapidly integrating this technology into our product lineup. Accelex was founded with the mission to bring automation and structure to private markets data. We recognize GenAI as a powerful tool to further that mission.

By bridging the gap between AI models and practical AI products, Accelex empowers our customers with the capabilities of GenAI. Integrating GenAI with our best-in-class document acquisition, workflow tools, and analytics, provides a comprehensive platform that enhances efficiency and insight into private markets investments.

Interested in learning more about Accelex? Then get in touch for a demo.

Thank you, a copy of the white paper will be emailed to you shortly
Thank you! Here's a copy of the white paper for you to download.
Download
Oops! Something went wrong while submitting the form.

About Accelex

Accelex provides data acquisition, analytics and reporting solutions for investors and asset servicers enabling firms to access the full potential of their investment performance and transaction data. Powered by proprietary artificial intelligence and machine learning techniques, Accelex automates processes for the extraction, analysis and sharing of difficult-to-access unstructured data. Founded by senior alternative investment executives, former BCG partners and successful fintech entrepreneurs, Accelex is headquartered in London with offices in Paris, Luxembourg, New York and Toronto. For more information, please visit accelextech.com

Want to see Accelex in action?
Get in touch now for a free demo of the platform

Schedule a free demo →