Programs›🌳 AI Branches›Lessons›Chatbots and NLP — Teaching Machines to Understand Language

💬

AI Branches • Intermediate⏱️ 35 min read

Chatbots and NLP — Teaching Machines to Understand Language

Talking to Machines 🗣️

You talk to AI every day — asking Siri for the weather, chatting with a customer service bot, or prompting ChatGPT. But have you ever wondered how a machine understands your words?

The answer is Natural Language Processing (NLP) — the branch of AI that gives machines the ability to read, understand, and generate human language. Let's peel back the layers.

A message travels from a user through NLP layers to a chatbot response — NLP transforms raw text into meaning a machine can work with

What Is NLP? 🤔

Natural Language Processing sits at the intersection of linguistics, computer science, and AI. Its goal is deceptively simple: make computers understand language the way humans do.

Why is this hard? Because language is messy:

"I saw her duck" — did she duck down, or did you see her pet duck?
"Let's eat, Grandma" vs "Let's eat Grandma" — punctuation saves lives!
Sarcasm: "Oh great, another Monday" doesn't actually mean great

🤯

There are roughly 7,000 languages spoken on Earth, each with unique grammar, idioms, and cultural nuances. Modern NLP models like GPT-4 can handle over 100 languages — but thousands remain unsupported.

The Evolution: From Rules to Neural Networks 📜➡️🧠

NLP has gone through three major eras:

Era 1: Rule-Based (1960s–1990s)

Programmers wrote explicit grammar rules by hand
"If the user says X, respond with Y"
Brittle — couldn't handle anything unexpected

Era 2: Statistical (1990s–2010s)

Models learned patterns from data using probability
Example: spam filters that learned which word combinations indicate spam
Better, but still limited in understanding context

Era 3: Neural / Deep Learning (2010s–present)

Neural networks learn language representations from massive text corpora
Transformers (2017) revolutionised the field
Models like BERT, GPT, and LLaMA can understand and generate remarkably human-like text

💡

The 2017 paper "Attention Is All You Need" by Google researchers introduced the Transformer architecture. It's the foundation of virtually every modern language AI — from ChatGPT to Google Translate to GitHub Copilot.

Step 1: Tokenisation — Breaking Text into Pieces ✂️

Before a machine can understand a sentence, it must break it into smaller units called tokens.

# Simple word-level tokenisation
sentence = "AI is transforming healthcare!"
tokens = sentence.lower().split()
print(tokens)
# Output: ['ai', 'is', 'transforming', 'healthcare!']

# Modern subword tokenisation (like GPT uses)
# "unhappiness" → ["un", "happiness"]
# "ChatGPT" → ["Chat", "G", "PT"]
# This handles words the model has never seen before!

Modern models use subword tokenisation — they break words into meaningful pieces. This is why a model can handle words it has never encountered: it recognises the sub-parts.

Think of it like Lego: even if you've never seen a specific castle, you understand the individual bricks.

Step 2: Embeddings — Words as Numbers 🔢

Machines don't understand words — they understand numbers. Embeddings convert each token into a list of numbers (a vector) that captures its meaning.

# Conceptual example: word embeddings
# Each word becomes a vector in high-dimensional space
embeddings = {
    "king":   [0.8, 0.2, -0.5, 0.9],
    "queen":  [0.7, 0.3, -0.5, 0.8],
    "man":    [0.9, 0.1, 0.4, 0.2],
    "woman":  [0.8, 0.2, 0.4, 0.1],
}

# The magic: king - man + woman ≈ queen
# Similar meanings → nearby vectors in this number space

The beautiful property of embeddings: words with similar meanings are close together in this number space. "Happy" and "joyful" would be neighbours; "happy" and "earthquake" would be far apart.

🤔

Think about it:

If embeddings capture meaning as numbers, what happens when a word has multiple meanings? "Bank" (river bank vs. financial bank) needs different vectors depending on context. This is exactly what the next step — attention — solves.

Step 3: Attention — What Matters Most 👀

The attention mechanism is the secret sauce of modern NLP. It lets the model figure out which words in a sentence are most relevant to each other.

Consider: "The cat sat on the mat because it was tired."

What does "it" refer to? The cat, the mat, or the act of sitting? You know it's the cat — and attention helps the model figure that out too.

How Attention Works (Intuition)

Imagine you're at a noisy party. You can "attend" to one conversation while filtering out others. Attention in AI works similarly:

For each word, calculate how relevant every other word is
Pay more "attention" to relevant words, less to irrelevant ones
Build a new representation of each word that includes its context

# Simplified attention intuition
sentence = ["The", "cat", "sat", "because", "it", "was", "tired"]

# When processing "it", attention scores might look like:
attention_for_it = {
    "The":     0.02,   # barely relevant
    "cat":     0.65,   # highly relevant — "it" refers to "cat"
    "sat":     0.10,   # somewhat relevant
    "because": 0.03,   # barely relevant
    "it":      0.05,   # self-reference
    "was":     0.05,   # grammar link
    "tired":   0.10,   # connected to "it"
}
# The model "attends" most to "cat" — correctly linking the pronoun

🤯

GPT-4 uses roughly 120 layers of attention, each looking at the text from a different angle — grammar, meaning, tone, factual relationships. It's like having 120 experts each reading the same text and sharing notes.

How Chatbots Work 🤖

A chatbot combines NLP components into a conversation system. Here's the core architecture:

Intent Recognition

"What's the weather like?" → Intent: get_weather

Entity Extraction

"What's the weather like in London tomorrow?" → Entities: city=London, date=tomorrow

Response Generation

Look up the answer and format a reply.

# A simple rule-based chatbot skeleton
def simple_chatbot(user_message):
    """A basic chatbot using intent and entity extraction."""
    message = user_message.lower()

    # Intent recognition (simplified)
    if any(word in message for word in ["weather", "temperature", "rain"]):
        intent = "get_weather"
    elif any(word in message for word in ["hello", "hi", "hey"]):
        intent = "greeting"
    elif any(word in message for word in ["bye", "goodbye", "see you"]):
        intent = "farewell"
    else:
        intent = "unknown"

    # Entity extraction (simplified)
    cities = ["london", "paris", "tokyo", "hyderabad", "amsterdam"]
    detected_city = next((c for c in cities if c in message), None)

    # Response generation
    responses = {
        "greeting":    "Hello! How can I help you today? 😊",
        "farewell":    "Goodbye! Have a great day! 👋",
        "get_weather": f"Checking weather for {detected_city or 'your area'}... 🌤️",
        "unknown":     "I'm not sure I understand. Could you rephrase? 🤔",
    }

    return responses[intent]

# Try it out
print(simple_chatbot("Hi there!"))
print(simple_chatbot("What's the weather in Tokyo?"))
print(simple_chatbot("Tell me a joke"))

From ELIZA to ChatGPT — A Timeline 📅

| Year | System | Approach | Capability | |------|--------|----------|------------| | 1966 | ELIZA | Pattern matching | Mimicked a therapist with simple text substitution | | 1995 | ALICE | Rule-based (AIML) | More rules, still no real understanding | | 2011 | Siri | Statistical NLP | Voice-activated assistant with web search | | 2016 | Alexa / Google Assistant | Deep learning NLP | Better at context and follow-up questions | | 2018 | BERT (Google) | Transformer (bidirectional) | Understood context in both directions | | 2022 | ChatGPT (OpenAI) | Transformer (generative) | Generates fluent, contextual multi-turn conversation | | 2024 | GPT-4, Claude, Gemini | Multimodal transformers | Handle text, images, code, and reasoning |

💡

ELIZA (1966) fooled some users into thinking they were talking to a real therapist — not because it was smart, but because humans naturally project understanding onto conversational partners. This is called the ELIZA effect, and it still applies today when we overestimate what chatbots truly "understand."

Limitations and Hallucinations ⚠️

Modern language models are impressive, but they have real limitations:

Hallucinations

Models sometimes generate confident but completely false information. They don't "know" facts — they predict the most likely next word.

No True Understanding

A language model has never seen, touched, or experienced the world. It only knows language patterns from text. It can describe a sunset beautifully without ever having "seen" one.

Context Window Limits

Models can only consider a fixed amount of text at once. Very long conversations may cause them to "forget" earlier context.

Bias Amplification

If training data contains biases (stereotypes, misinformation), the model reproduces and sometimes amplifies them.

# Illustrating the hallucination concept
def language_model_intuition(prompt):
    """
    A language model doesn't retrieve facts — it predicts probable words.
    
    Prompt: "The capital of France is ___"
    The model has seen "Paris" follow this pattern thousands of times,
    so it predicts "Paris" — not because it 'knows' geography,
    but because it's the statistically likely next word.
    
    For rare or ambiguous prompts, it may confidently predict
    something entirely wrong — that's a hallucination.
    """
    pass  # Real models use billions of parameters for this prediction

🤔

Think about it:

When a chatbot says "I think" or "I believe," does it actually think or believe anything? How does knowing that it's predicting the next word change how you interpret its responses? Should chatbots be required to disclose that they're AI?

Quick Recap 🎯

NLP is the branch of AI that helps machines understand human language
NLP evolved from rigid rules to statistical models to neural transformers
Tokenisation breaks text into pieces; embeddings convert them to numbers; attention captures context
Chatbots use intent recognition and entity extraction to understand and respond
The journey from ELIZA (1966) to ChatGPT (2022) spans nearly 60 years of progress
Hallucinations and bias are real limitations — always verify AI-generated information

What's Next? 🚀

You now understand how machines process language. In the next lesson, we'll tackle computer vision — teaching machines to see. You'll learn how AI recognises faces, drives cars, and powers the AR filters on your phone!

Lesson 2 of 30 of 3 completed

←AI in Healthcare — Saving Lives with Data Computer Vision Basics — Teaching Machines to See→