In the last lesson, you learned about decision trees, KNN, and linear regression. They're powerful โ but they struggle with truly complex tasks like recognising faces, understanding language, or generating images.
For those tasks, we need something more powerful: neural networks โ algorithms loosely inspired by how your brain works.
Your brain has about 86 billion neurons. Each one:
An artificial neuron does the same thing, but with numbers:
Inputs Weights Sum Activation Output
โโโโโ โโโโโโ โโโ โโโโโโโโโโ โโโโโโ
xโ = 0.5 โโโ wโ = 0.8 โโ
โโโโ 1.25 โโโ f(1.25) โโโ 0.78
xโ = 0.3 โโโ wโ = 1.5 โโ
Sum = (0.5 ร 0.8) + (0.3 ร 1.5) = 0.40 + 0.85 = 1.25
Think of weights like volume knobs on a mixing board. Each input is a different instrument. The weights control how loud each instrument is. The neuron blends them together, and the activation function decides if the combined sound is loud enough to pass through to the speakers.
A single neuron can't do much โ just like a single brain cell can't think. The power comes from organising neurons into layers.
1. Input Layer โ The eyes and ears ๐
2. Hidden Layer(s) โ The thinking brain ๐ง
3. Output Layer โ The answer ๐ก
INPUT LAYER HIDDEN LAYER 1 HIDDEN LAYER 2 OUTPUT LAYER
(3 features) (4 neurons) (4 neurons) (2 classes)
[xโ] โโโโโโโ [hโ] โโโโโโโ [hโ
] โโโโโโโ [Cat: 0.92]
[xโ] โโโโโโโ [hโ] โโโโโโโ [hโ] โโโโโโโ [Dog: 0.08]
[xโ] โโโโโโโ [hโ] โโโโโโโ [hโ]
[hโ] โโโโโโโ [hโ]
Note: In reality, EVERY neuron in one layer connects
to EVERY neuron in the next layer (fully connected).
The term "deep learning" simply means a neural network with many hidden layers โ typically more than 2. GPT-4 has over 100 layers! More layers let the network learn increasingly abstract patterns: edges โ shapes โ objects โ scenes.
After a neuron adds up its inputs, it needs to decide: should I fire or not? That's the job of the activation function.
Imagine a water pipe with a valve:
ReLU (Rectified Linear Unit) โ The simple gate
def relu(x):
return max(0, x)
# Examples
print(relu(3.5)) # 3.5 (positive โ passes through)
print(relu(-2.0)) # 0.0 (negative โ blocked)
Sigmoid โ The gentle squisher
import math
def sigmoid(x):
return 1 / (1 + math.exp(-x))
# Examples
print(sigmoid(5.0)) # 0.993 (very confident "yes")
print(sigmoid(0.0)) # 0.500 (completely uncertain)
print(sigmoid(-5.0)) # 0.007 (very confident "no")
Softmax โ The vote counter
Raw outputs: [2.0, 1.0, 0.5]
After softmax: [0.59, 0.24, 0.17] โ Cat (59%), Dog (24%), Bird (17%)
Why do we need activation functions? Without them, a neural network โ no matter how many layers โ would just be fancy linear regression. Activation functions introduce non-linearity, which lets the network learn curves, edges, and complex patterns instead of only straight lines.
Now let's see how data actually moves through a neural network. This process is called forward propagation โ and it's simpler than it sounds.
Think of it as a relay race:
Let's classify whether a fruit is an apple or orange using two features: weight (grams) and colour intensity (0โ1 scale).
Input: weight = 150g, colour = 0.8 (orange-ish)
Step 1 โ Input layer passes values forward:
xโ = 150 (weight)
xโ = 0.8 (colour)
Step 2 โ Hidden neuron 1 computes:
sum = (150 ร 0.01) + (0.8 ร 2.0) + bias(0.1) = 1.5 + 1.6 + 0.1 = 3.2
output = ReLU(3.2) = 3.2
Step 3 โ Hidden neuron 2 computes:
sum = (150 ร -0.005) + (0.8 ร 1.5) + bias(0.3) = -0.75 + 1.2 + 0.3 = 0.75
output = ReLU(0.75) = 0.75
Step 4 โ Output layer computes from hidden outputs:
apple_score = (3.2 ร -0.5) + (0.75 ร 0.8) = -1.0
orange_score = (3.2 ร 0.7) + (0.75 ร 0.3) = 2.465
Step 5 โ Softmax converts to probabilities:
Apple: 18%
Orange: 82% โ Prediction: Orange! ๐
Forward propagation gives us a prediction. But what if it's wrong? That's where backpropagation comes in โ the process by which a neural network learns.
Imagine you're learning to throw darts blindfolded:
That's backpropagation in a nutshell:
Forward propagation: Data โ Network โ Prediction
โ
Compare: Prediction vs Actual Answer = Error
โ
Backpropagation: Error flows BACKWARD through the network
โ
Update: Adjust weights to reduce the error
โ
Repeat: Do this thousands of times!
The error signal starts at the output and flows backward through the network, layer by layer. Each neuron learns: "How much did I contribute to the error? What should I adjust?"
Imagine a factory assembly line making defective products. To fix the problem, you trace backward from the final product through each station: "Was the packaging wrong? Was the painting wrong? Was the raw material wrong?" Each station adjusts its process. That's backpropagation โ tracing errors backward to fix each part of the network.
Let's trace data through a tiny network step by step. We have a 3-layer network (input โ hidden โ output) that classifies whether an email is spam.
Features:
Weights (pre-trained):
Input โ Hidden: Hidden โ Output:
hโ: wโ=0.6, wโ=0.9 out: wโ=0.7, wโ=0.5
hโ: wโ=0.3, wโ=0.4
Biases: bโ=0.1, bโ=0.2, b_out=-0.5
Let's trace it:
import math
# Inputs
x1, x2 = 5, 1
# Hidden neuron 1
z1 = (x1 * 0.6) + (x2 * 0.9) + 0.1 # = 3.0 + 0.9 + 0.1 = 4.0
h1 = max(0, z1) # ReLU โ 4.0
# Hidden neuron 2
z2 = (x1 * 0.3) + (x2 * 0.4) + 0.2 # = 1.5 + 0.4 + 0.2 = 2.1
h2 = max(0, z2) # ReLU โ 2.1
# Output neuron
z_out = (h1 * 0.7) + (h2 * 0.5) - 0.5 # = 2.8 + 1.05 - 0.5 = 3.35
output = 1 / (1 + math.exp(-z_out)) # Sigmoid โ 0.966
print(f"Hidden layer: h1={h1}, h2={h2}")
print(f"Raw output: {z_out:.2f}")
print(f"Spam probability: {output:.1%}") # 96.6% โ SPAM! ๐ซ
What happened:
Try modifying the inputs: What happens with xโ=0 (no exclamation marks) and xโ=0 (no "free")? The probability drops dramatically! That's the network using what it learned.
Here's how everything connects:
Data (Lesson 1) โ Algorithm (Lesson 2) โ Neural Network (Lesson 3)
๐ ๐งฎ ๐ง
Ingredients Cooking method The master chef
The fuel The engine The power plant
Neural networks are just a type of algorithm โ but a remarkably powerful one. They power:
The basic idea of neural networks was proposed in 1943 โ over 80 years ago! But they only became practical in the 2010s when we had enough data and fast-enough computers (GPUs) to train large networks. Sometimes great ideas just need to wait for technology to catch up.
Congratulations โ you now understand the three pillars of AI: data, algorithms, and neural networks! In upcoming lessons, we'll explore AI tools and APIs you can use right away, and dive into responsible AI โ because building AI that works is only half the job. Building AI that's fair is the other half. ๐ฑ