Machine Learning vs Artificial Intelligence
I get asked occasionally what is the difference between machine learning and artificial intelligence. There are a lot of definitions out there, however many seem to be too general to be really useful. Also, let’s move beyond the marketing definition of AI, because to a marketer everything with some math is AI. So let’s come up with a practical definition such that when we see a model we know if it’s ML or AI.
A good definition should leave no ambiguity such that when you see an algorithm or model you can definitely say it is one or the other.
An example of a vague, general definition is this from Google’s Web site:
“Artificial intelligence is a broad field, which refers to the use of technologies to build machines and computers that have the ability to mimic cognitive functions associated with human intelligence, such as being able to see, understand, and respond to spoken or written language, analyze data, make recommendations, and more.
Although artificial intelligence is often thought of as a system in itself, it is a set of technologies implemented in a system to enable it to reason, learn, and act to solve a complex problem.”
Okay, this sounds good as a textbook definition, but can someone use this and look at a convolutional graph neural network and decide if that is ML or AI? Not really.
Google does go on to try and give some more detail:
Artificial intelligence
- AI allows a machine to simulate human intelligence to solve problems
- The goal is to develop an intelligent system that can perform complex tasks
- We build systems that can solve complex tasks like a human
- AI has a wide scope of applications
- AI uses technologies in a system so that it mimics human decision-making
- AI works with all types of data: structured, semi-structured, and unstructured
- AI systems use logic and decision trees to learn, reason, and self-correct
Machine learning
- ML allows a machine to learn autonomously from past data
- The goal is to build machines that can learn from data to increase the accuracy of the output
- We train machines with data to perform specific tasks and deliver accurate results
- Machine learning has a limited scope of applications
- ML uses self-learning algorithms to produce predictive models
- ML can only use structured and semi-structured data
- ML systems rely on statistical models to learn and can self-correct when provided with new data
This seems like much more detail from Google, but I don’t think it really clears it up for a person looking at an application as to how to categorize that application as either ML or AI. For example, “The goal is to develop an intelligent system that can perform complex tasks.” Well, that’s pretty vague and can be applied to either AI or ML. Or how about as to what defines ML, “The goal is to build machines that can learn from data to increase the accuracy of the output.” Well that can be also said about AI. I can make similar points about the rest of their bulleted points. And this is not to pick on Google, because most definitions of ML vs AI are like this. Here is the definition from Coursera:
“In simplest terms, AI is computer software that mimics the ways that humans think in order to perform complex tasks, such as analyzing, reasoning, and learning. Machine learning, meanwhile, is a subset of AI that uses algorithms trained on data to produce models that can perform such complex tasks.”
That’s not really any different or better.
And often as stated here in the Coursera definition, part of a definition is just that ML is a subset of AI, but that is not very good in helping us to think about ML vs AI either.
So let’s come up with a practical way to categorize ML and AI (as they stand in 2024) such that when we see a model or an application we can say it’s either ML or AI.
Let’s start by considering the intelligence part of AI. If something is intelligent, whether it’s the narrow purposeful AI we have now or AGI, that intelligence requires a level of complexity such that examining the parts of a model individually does not help one to comprehend what the model is doing. There is a level of inscrutability in an AI model — the black box effect. Complete understanding moves beyond human comprehension of how a model works and instead we are left with at best an intuitive grasp as to how it works. For example, before OpenAI and others released their most useful models, there was just an intuition that building larger and larger language models trained on more and more text would work. There was an intuition by some (not everyone) that it would work but not a certainty. And when transformers with attention worked as well as they did, there was genuine excitement and even though they hoped it would work they weren’t certain it would. So we can call LLMs — transformer architecture with attention as AI.
In contrast, one can take any of the typical machine learning algorithms packed into scikit-learn from multiple regression to random forest and we can do the math. We know why they work and we can prove why they work. Conversely, generative models and other types of models have a level of complexity that traditional ML models do not have.
That’s my first point of differentiation, the level of complexity, size, and inscrutability of the models.
My second point of differentiation concerns the inter-connectedness of the inputs into the model. In models I consider to be AI, they take into account the relationships and dependencies between inputs. One of the defining characteristics of graph neural networks, convolutional networks, stable diffusion, and large language models is that they specifically look at the relationships in the inputs. Nodes and edges in a graph, sliding windows in a CNN that try to capture the relationship between pixels, or the attention and position methods in large language models that try to recognize the contextual meaning in text are examples.
My last point of differentiation is the information packed into a single input is often enough to understand its relationship in a prediction in machine learning. I’m calling this “single input significance.” For example, in a model trying to predict diabetes, where we have a set of predictor variables that includes things like the level of exercise, BMI, age, gender, etc., I can look at any of those variables and understand how they could predict diabetes. In fact, just one of those variables by itself would have some predictive value in predicting diabetes.
The same cannot be said about models like large language models, CNNs, or graph networks. You can’t look at a picture of a cat and know that one pixel is predictive of cats, or that one word in a sentence is predictive of meaning without the context, or that one node in a network is predictive. Because of that, I consider these types of models to be AI.
To sum it all up, I have three characteristics that distinguish between ML and AI models
1) Inscrutability
2) Inter-connectedness
3) Amount of single input significance
So hopefully that helps if someone asks you what is the difference between ML and AI or if you are looking at some models and want to categorize them.
However, this is my definition as it stands now in 2024. I think moving forward as we head towards AGI, the lines will change as many of the ML techniques will be thought of as one of the many types of tools that an AGI can use. Also, the definition of AGI is burdened with some of the same vagueness of ML vs AI, but it also has the same characteristics I’m outlining here of inscrutability, inter-connectedness, and amount of single input granular significance.
But there are additional characteristics of AGI — but I will lay out my opinion of what those are in an upcoming post.