You now know that neural networks learn by adjusting weights and biases. But how does the full training process actually work? How do you know when a model has learned enough - or too much? In this lesson, we will walk through the complete training journey.
Training an AI model follows a cycle that repeats over and over:
This loop runs thousands or even millions of times. Each repetition nudges the model slightly closer to the right answers.
Training GPT-4 reportedly cost over $100 million in computing power alone. The training loop ran across thousands of specialised chips for months.
After each prediction, we need a way to measure how wrong the model was. This measurement is called the loss (or cost), and the formula that calculates it is the loss function.
Think of it like a dartboard. The bullseye is the correct answer. The loss is the distance from where your dart landed to the bullseye. The goal of training is to minimise that distance over time.
Common loss functions include:
What does a loss function measure in AI training?
Sign in to join the discussion
One complete pass through the entire training dataset is called an epoch. Training typically involves many epochs - the model sees the same data multiple times, getting slightly better each round.
Revising for an exam is like running epochs. The first read-through is confusing, but each review builds understanding. However, if you re-read the same notes a hundred times, you might memorise the exact wording without truly understanding the concepts. AI has the same problem.
Overfitting is one of the most common problems in AI training. It happens when the model learns the training data too well - including its noise and quirks - and fails to perform on new, unseen data.
Imagine a student who memorises every past exam paper word for word. They score perfectly on old papers but struggle when the questions change even slightly. The student has not learned the subject - they have memorised the answers.
Signs of overfitting:
The goal of training is not to score perfectly on data the model has already seen. It is to perform well on data it has never seen before. That is the true test of learning.
The opposite problem is underfitting. This happens when the model has not learned enough from the data. It performs poorly on both training data and new data.
Causes of underfitting include:
If overfitting is like memorising past papers, underfitting is like walking into the exam having barely opened the textbook.
A model scores 98% accuracy on training data but only 60% on new data. What is the most likely problem?
To detect overfitting and underfitting, we split our data into three parts:
| Set | Purpose | When Used | |-----|---------|-----------| | Training set | The model learns from this data | During training | | Validation set | Used to check progress and tune settings | During training | | Test set | Final evaluation on completely unseen data | After training |
A common split is 70% training, 15% validation, and 15% test. The model never sees the test set until the very end - it is the final exam.
The validation set is like a practice test you take between study sessions. It tells you how well you are learning without spoiling the real exam. If your practice test scores start dropping while your study notes scores keep rising, you know something is wrong.
Knowing when to stop is crucial. Train too little and the model underfits. Train too much and it overfits. The sweet spot is where validation loss stops improving.
A technique called early stopping automates this:
This prevents the model from going past the point of useful learning and slipping into memorisation.
What is 'early stopping' in AI training?
Some modern training runs use a technique called learning rate scheduling, which gradually reduces how much the weights change with each step - like taking smaller and more careful steps as you approach the summit of a mountain.
In the final lesson, we will explore the ethical dimensions of AI - bias, fairness, privacy, and what responsible AI looks like.