What Is Machine Learning, Simply
How models learn from data rather than following explicit rules, what training and inference mean, and where machine learning genuinely works — and where it doesn't.
Machine learning is one of those terms that sounds more mysterious than it is. The underlying idea is genuinely elegant: instead of a programmer writing explicit rules for a program to follow, a machine learning system discovers its own rules by examining large amounts of data. Understanding what that means in practice — and what it does not mean — cuts through a lot of the hype and confusion that surrounds the topic.
The Conventional Programming Approach
In traditional software, a programmer specifies the rules explicitly. To write a spam filter the traditional way, you might instruct the program: “If the email contains the phrase ‘Nigerian prince,’ mark it as spam. If it comes from a domain ending in .gov, probably not spam. If it contains ‘free money’ and no company name in the headers, lean toward spam.” You continue writing rules, testing them, finding exceptions, and adding more rules.
This approach works for many problems. But consider recognising handwritten digits — telling the difference between a handwritten “3” and an “8.” What is the rule that distinguishes them? You could try to describe the curves involved, but handwriting varies enormously: different pen pressures, different sizes, different styles, some people close their 3s in unusual ways. Writing explicit rules that correctly classify millions of varied handwriting samples is extremely difficult.
Machine learning sidesteps this by asking: what if we could show the program a million examples of handwritten digits, each labelled with the correct answer, and let the program figure out the rules itself?
Training: Learning from Examples
The core of machine learning is training: feeding a system a large dataset of examples (input-output pairs) and adjusting the system’s internal parameters until it produces correct outputs for those inputs.
Consider a spam filter trained on email. The training set might contain 100,000 emails, each labelled “spam” or “not spam.” The model begins with essentially random internal parameters. It processes the first email, produces a prediction (spam or not), compares the prediction to the correct label, and updates its parameters slightly to make a better prediction next time. It repeats this for all 100,000 emails, many times over, gradually converging on parameter values that classify the training emails correctly.
The result is a model: a mathematical function whose internal parameters have been tuned by training. When you show it a new email it has never seen, it applies those parameters to classify it.
Features are the measurable properties of the input that the model uses to make predictions. For the spam filter, features might include: the presence of specific words, the ratio of HTML to text, the sender’s domain reputation, whether any links point to recently registered domains. The model learns which features are predictive of spam and weights them accordingly.
In many modern systems — particularly deep neural networks — feature selection is itself part of what the model learns, rather than being defined explicitly by the programmer.
Supervised, Unsupervised, and Reinforcement Learning
Machine learning is commonly divided into three broad categories based on what kind of data is used during training.
Supervised learning uses labelled examples: each training input comes with the correct output. The spam filter described above is supervised learning. So is training an image classifier on photographs labelled with their contents, or training a translation model on parallel texts in two languages. The “supervision” is the label that tells the model whether it got it right.
Unsupervised learning uses unlabelled data. The model is given inputs but no correct answers. Instead of learning to predict a specific output, it discovers structure in the data on its own — grouping similar items, identifying which features tend to vary together, detecting unusual patterns that stand out from the rest. Clustering algorithms that group customers into segments based on their behaviour, or anomaly detection systems that identify unusual network traffic, are examples of unsupervised learning.
Reinforcement learning is different in structure from both. An agent learns through trial and error by taking actions in an environment and receiving feedback in the form of rewards or penalties. It develops a policy — a mapping from situations to actions — that maximises cumulative reward over time. This is the approach used to train AI systems that play games, where the reward is winning. It is also used in robotics, where the reward might be successfully completing a physical task.
Neural Networks: A Powerful Architecture
Many state-of-the-art machine learning systems use neural networks — computational structures loosely inspired by the architecture of biological brains, though the analogy should not be taken too literally.
A neural network consists of layers of interconnected nodes (neurons). Each connection has a weight — a numerical value that scales how much influence one neuron’s output has on the next. Training adjusts these weights. In a network with many layers — a deep neural network — each layer learns increasingly abstract representations of the input. Early layers in an image-classifying network might detect edges and simple shapes; later layers recognise combinations of shapes that form objects.
The “deep” in deep learning refers to networks with many layers. The term has become associated with a revolution in capability that began around 2012, when deep neural networks trained on large datasets began dramatically outperforming other approaches on image recognition tasks. This progress extended to language (leading to large language models), audio, video, and many other domains.
Training vs Inference
Training — the process of adjusting a model’s parameters on a dataset — is computationally intensive. Training a large model may require many GPUs (graphics processing units, which are well-suited to the parallel matrix arithmetic neural networks require) running for days or weeks, consuming substantial amounts of electricity.
Inference is the process of using a trained model to make predictions on new inputs. Inference is typically much less expensive computationally than training. When you ask a language model a question, or when a recommendation system suggests something to watch, or when a fraud detection system evaluates a transaction in milliseconds — that is inference. The model was trained once (with great effort and cost); inference is the ongoing use of the result.
What Machine Learning Is Good At — and What It Isn’t
Machine learning is highly effective at tasks where:
- There are many examples to learn from
- The relationship between inputs and correct outputs is too complex to specify by hand
- The distribution of real-world examples resembles the training data
- Approximate correctness is acceptable (since ML models make mistakes)
It is less well-suited to tasks that require explicit, verifiable reasoning chains; that involve operating correctly in situations very different from the training data; that require understanding causation rather than just correlation; or that require a guarantee of correct behaviour rather than good average-case performance.
A spam filter trained mostly on English-language spam may perform poorly on spam in other languages. An autonomous vehicle system trained in sunny California may behave unexpectedly in a snowstorm. The dependence on training data distribution is one of the most important practical constraints of machine learning systems.
The Limits of the “Learning” Metaphor
Machine learning systems do not learn in the sense of building up a general model of the world through experience. They learn patterns in a specific dataset for a specific task. A model trained to classify images of cats and dogs has not learned anything transferable to a different task — its parameters encode statistical regularities specific to that training data.
This distinction between pattern-matching and genuine understanding is part of an active and unresolved debate in AI research. What is clear is that the engineering achievements of machine learning — systems that translate languages, generate images, answer questions, detect diseases in medical scans — are real and significant, regardless of how they compare to human cognition. Understanding what these systems actually do, rather than anthropomorphising them, leads to more realistic expectations about when they are useful and when they will fail.