AI & Engineering Academics›🌿 AI Sprouts›Lessons›Training AI Models

🏋️

AI Sprouts • Beginner⏱️ 15 min read

Training AI Models

You now know that neural networks learn by adjusting weights and biases. But how does the full training process actually work? How do you know when a model has learned enough - or too much? In this lesson, we will walk through the complete training journey.

The Training Loop

Training an AI model follows a cycle that repeats over and over:

Predict - Feed data through the model and get a prediction.
Compare - Check how far the prediction is from the correct answer.
Adjust - Update the weights to reduce the error.
Repeat - Do it again with the next batch of data.

This loop runs thousands or even millions of times. Each repetition nudges the model slightly closer to the right answers.

A circular diagram showing the training loop: Predict, Compare, Adjust, Repeat, with arrows connecting each step in a cycle — The training loop is the heartbeat of AI learning - predict, compare, adjust, and repeat.

🤯

Training GPT-4 reportedly cost over $100 million in computing power alone. The training loop ran across thousands of specialised chips for months.

Loss Functions: How Wrong Is the Model?

After each prediction, we need a way to measure how wrong the model was. This measurement is called the loss (or cost), and the formula that calculates it is the loss function.

Low loss = the prediction was close to the correct answer.
High loss = the prediction was far off.

Think of it like a dartboard. The bullseye is the correct answer. The loss is the distance from where your dart landed to the bullseye. The goal of training is to minimise that distance over time.

Common loss functions include:

Mean Squared Error (MSE) - measures the average squared distance between predictions and actual values. Used for predicting numbers.
Cross-Entropy Loss - measures how well predicted probabilities match the true categories. Used for classification tasks.

🧠Quick Check

What does a loss function measure in AI training?

Lesson 4 of 160% complete

←Introduction to Neural Networks

Discussion

Suggest an edit to this lesson

Epochs: How Many Times Through the Data?

One complete pass through the entire training dataset is called an epoch. Training typically involves many epochs - the model sees the same data multiple times, getting slightly better each round.

Epoch 1: The model makes many mistakes; loss is high.
Epoch 10: The model has improved significantly; loss is dropping.
Epoch 50: Improvements slow down; the model is nearing its best.
Epoch 200: The model might start memorising - which brings us to our next topic.

🤔

Think about it:

Revising for an exam is like running epochs. The first read-through is confusing, but each review builds understanding. However, if you re-read the same notes a hundred times, you might memorise the exact wording without truly understanding the concepts. AI has the same problem.

Overfitting: The Student Who Memorises

Overfitting is one of the most common problems in AI training. It happens when the model learns the training data too well - including its noise and quirks - and fails to perform on new, unseen data.

Imagine a student who memorises every past exam paper word for word. They score perfectly on old papers but struggle when the questions change even slightly. The student has not learned the subject - they have memorised the answers.

Signs of overfitting:

Training accuracy is very high (e.g., 99%).
Performance on new data is much worse (e.g., 75%).
The model has essentially memorised the training examples.

💡

The goal of training is not to score perfectly on data the model has already seen. It is to perform well on data it has never seen before. That is the true test of learning.

Underfitting: The Student Who Doesn't Study

The opposite problem is underfitting. This happens when the model has not learned enough from the data. It performs poorly on both training data and new data.

Causes of underfitting include:

The model is too simple for the complexity of the problem.
Training stopped too early (not enough epochs).
The features in the data are not informative enough.

If overfitting is like memorising past papers, underfitting is like walking into the exam having barely opened the textbook.

🧠Quick Check

A model scores 98% accuracy on training data but only 60% on new data. What is the most likely problem?

Validation and Test Sets

To detect overfitting and underfitting, we split our data into three parts:

| Set | Purpose | When Used | |-----|---------|-----------| | Training set | The model learns from this data | During training | | Validation set | Used to check progress and tune settings | During training | | Test set | Final evaluation on completely unseen data | After training |

A common split is 70% training, 15% validation, and 15% test. The model never sees the test set until the very end - it is the final exam.

🤔

Think about it:

The validation set is like a practice test you take between study sessions. It tells you how well you are learning without spoiling the real exam. If your practice test scores start dropping while your study notes scores keep rising, you know something is wrong.

When to Stop Training

Knowing when to stop is crucial. Train too little and the model underfits. Train too much and it overfits. The sweet spot is where validation loss stops improving.

A technique called early stopping automates this:

Monitor the validation loss after each epoch.
If it has not improved for a set number of epochs (called patience), stop training.
Roll back to the weights from the best epoch.

This prevents the model from going past the point of useful learning and slipping into memorisation.

🧠Quick Check

What is 'early stopping' in AI training?

🤯

Some modern training runs use a technique called learning rate scheduling, which gradually reduces how much the weights change with each step - like taking smaller and more careful steps as you approach the summit of a mountain.

The training loop repeats: predict → compare → adjust → repeat.
A loss function measures how far predictions are from the truth.
An epoch is one full pass through the training data.
Overfitting means memorising data; underfitting means not learning enough.
Data is split into training, validation, and test sets.
Early stopping prevents training from going too far.

In the final lesson, we will explore the ethical dimensions of AI - bias, fairness, privacy, and what responsible AI looks like.

AI Foundations

AI Mastery

Career Ready

Lab

Training AI Models

Training AI Models

The Training Loop

Loss Functions: How Wrong Is the Model?

Discussion

Epochs: How Many Times Through the Data?

Overfitting: The Student Who Memorises

Underfitting: The Student Who Doesn't Study

Validation and Test Sets

When to Stop Training

Key Takeaways