Decision trees and KNN are powerful, but the technology behind today's most impressive AI - from ChatGPT to self-driving cars - is the neural network. Inspired by the human brain, neural networks can learn incredibly complex patterns that simpler algorithms cannot.
Let us peel back the layers and see how they work.
Your brain contains roughly 86 billion neurons connected by trillions of synapses. When you learn something new, certain neurons fire together and the connections between them strengthen. This is often summarised as: neurons that fire together, wire together.
Artificial neural networks borrow this idea. They use artificial neurons (small mathematical functions) connected in a network. When the network practises on data, the connections that lead to correct answers get strengthened, and the ones that lead to wrong answers get weakened.
The first artificial neuron - the Perceptron - was invented in 1958 by Frank Rosenblatt. It could only learn simple patterns, but it laid the groundwork for everything we have today.
Every neural network has three types of layers:
This is where data enters the network. Each neuron in this layer receives one feature from the dataset. For a 28ร28 pixel image, the input layer would have 784 neurons - one for each pixel.
These are the layers between input and output where the real learning happens. Each neuron takes inputs, processes them, and passes the result forward. A network can have one hidden layer or hundreds - the more layers, the "deeper" the network.
This layer produces the final answer. For a digit classifier (0โ9), the output layer has 10 neurons, each representing the probability of a different digit.
What is the role of hidden layers in a neural network?
Every connection between two neurons has a weight - a number that controls how much influence one neuron has on the next. Think of weights as : turning one up makes that connection louder; turning it down makes it quieter.
Sign in to join the discussion
Each neuron also has a bias - a number that shifts the output up or down, like adjusting the baseline volume before any signal arrives.
When a neural network learns, it is really just adjusting thousands or millions of these weights and biases until it finds the combination that gives the best predictions.
Imagine you are mixing music and you have hundreds of volume knobs - one for each instrument and microphone. Getting the perfect mix means carefully adjusting every knob. That is what training a neural network is like, except with millions of knobs adjusted automatically.
Learning happens in a cycle with four steps:
Data flows through the network from input to output. Each neuron multiplies its inputs by its weights, adds its bias, and passes the result through an activation function (which decides whether the neuron should "fire" or stay quiet). The network produces a prediction.
The prediction is compared to the correct answer. The difference is the error (also called loss). A prediction of "7" when the answer is "3" produces a large error; a prediction of "3" produces a small one.
The error is sent backwards through the network. Each weight learns how much it contributed to the mistake. This is the mathematical magic of backpropagation - it figures out which knobs to turn and by how much.
The weights and biases are adjusted slightly to reduce the error. Then the cycle repeats with the next piece of data.
A neural network does not learn in one go. It repeats this cycle thousands or millions of times, gradually getting better with each pass - much like how you improve at a skill through practice.
What does backpropagation do in a neural network?
Let us trace how a neural network classifies a handwritten "5" from the MNIST dataset:
Input: The 28ร28 pixel image is flattened into 784 numbers (pixel brightness values from 0 to 255). These enter the 784 input neurons.
Hidden layers: The first hidden layer might detect simple edges and curves. The second hidden layer combines those into shapes like loops and strokes. Deeper layers recognise digit-like patterns.
Output: The 10 output neurons produce probabilities. The network might output:
Decision: The network picks the digit with the highest probability - 5. Correct!
If wrong: Backpropagation adjusts the weights so next time, the correct digit gets a higher score.
Modern neural networks can classify handwritten digits with over 99.7% accuracy - better than most humans. The MNIST dataset has become so "easy" for AI that researchers now use harder benchmarks to test new models.
When you look at a handwritten "5", your brain does not analyse individual pixels. You recognise the overall shape instantly. Neural networks learn to do something similar - but they build that understanding one layer at a time, starting from pixels and working up to shapes.
Not every signal should pass through a neuron at full strength. Activation functions act as gatekeepers that decide whether and how much a neuron should fire.
The most common activation function today is ReLU (Rectified Linear Unit). It has a simple rule: if the input is positive, let it through unchanged; if negative, output zero. This simplicity makes it fast to compute while still allowing the network to learn complex patterns.
Without activation functions, no matter how many layers you stack, the network could only learn simple linear relationships - like drawing straight lines through data. Activation functions give neural networks the ability to learn curves, boundaries, and intricate patterns.
When a neural network has many hidden layers, it is called a deep neural network, and training it is called deep learning. Depth allows the network to learn hierarchies of features:
This layered approach is why deep learning excels at complex tasks like image recognition, language understanding, and game playing.
Why are neural networks with many hidden layers called 'deep' learning?
In the next lesson, we will zoom in on the training process - how you actually teach a neural network to get smarter over time.