You talk to AI every day โ asking Siri for the weather, chatting with a customer service bot, or prompting ChatGPT. But have you ever wondered how a machine understands your words?
The answer is Natural Language Processing (NLP) โ the branch of AI that gives machines the ability to read, understand, and generate human language. Let's peel back the layers.
Natural Language Processing sits at the intersection of linguistics, computer science, and AI. Its goal is deceptively simple: make computers understand language the way humans do.
Why is this hard? Because language is messy:
There are roughly 7,000 languages spoken on Earth, each with unique grammar, idioms, and cultural nuances. Modern NLP models like GPT-4 can handle over 100 languages โ but thousands remain unsupported.
NLP has gone through three major eras:
The 2017 paper "Attention Is All You Need" by Google researchers introduced the Transformer architecture. It's the foundation of virtually every modern language AI โ from ChatGPT to Google Translate to GitHub Copilot.
Before a machine can understand a sentence, it must break it into smaller units called tokens.
# Simple word-level tokenisation
sentence = "AI is transforming healthcare!"
tokens = sentence.lower().split()
print(tokens)
# Output: ['ai', 'is', 'transforming', 'healthcare!']
# Modern subword tokenisation (like GPT uses)
# "unhappiness" โ ["un", "happiness"]
# "ChatGPT" โ ["Chat", "G", "PT"]
# This handles words the model has never seen before!
Modern models use subword tokenisation โ they break words into meaningful pieces. This is why a model can handle words it has never encountered: it recognises the sub-parts.
Think of it like Lego: even if you've never seen a specific castle, you understand the individual bricks.
Machines don't understand words โ they understand numbers. Embeddings convert each token into a list of numbers (a vector) that captures its meaning.
# Conceptual example: word embeddings
# Each word becomes a vector in high-dimensional space
embeddings = {
"king": [0.8, 0.2, -0.5, 0.9],
"queen": [0.7, 0.3, -0.5, 0.8],
"man": [0.9, 0.1, 0.4, 0.2],
"woman": [0.8, 0.2, 0.4, 0.1],
}
# The magic: king - man + woman โ queen
# Similar meanings โ nearby vectors in this number space
The beautiful property of embeddings: words with similar meanings are close together in this number space. "Happy" and "joyful" would be neighbours; "happy" and "earthquake" would be far apart.
If embeddings capture meaning as numbers, what happens when a word has multiple meanings? "Bank" (river bank vs. financial bank) needs different vectors depending on context. This is exactly what the next step โ attention โ solves.
The attention mechanism is the secret sauce of modern NLP. It lets the model figure out which words in a sentence are most relevant to each other.
Consider: "The cat sat on the mat because it was tired."
What does "it" refer to? The cat, the mat, or the act of sitting? You know it's the cat โ and attention helps the model figure that out too.
Imagine you're at a noisy party. You can "attend" to one conversation while filtering out others. Attention in AI works similarly:
# Simplified attention intuition
sentence = ["The", "cat", "sat", "because", "it", "was", "tired"]
# When processing "it", attention scores might look like:
attention_for_it = {
"The": 0.02, # barely relevant
"cat": 0.65, # highly relevant โ "it" refers to "cat"
"sat": 0.10, # somewhat relevant
"because": 0.03, # barely relevant
"it": 0.05, # self-reference
"was": 0.05, # grammar link
"tired": 0.10, # connected to "it"
}
# The model "attends" most to "cat" โ correctly linking the pronoun
GPT-4 uses roughly 120 layers of attention, each looking at the text from a different angle โ grammar, meaning, tone, factual relationships. It's like having 120 experts each reading the same text and sharing notes.
A chatbot combines NLP components into a conversation system. Here's the core architecture:
"What's the weather like?" โ Intent: get_weather
"What's the weather like in London tomorrow?" โ Entities: city=London, date=tomorrow
Look up the answer and format a reply.
# A simple rule-based chatbot skeleton
def simple_chatbot(user_message):
"""A basic chatbot using intent and entity extraction."""
message = user_message.lower()
# Intent recognition (simplified)
if any(word in message for word in ["weather", "temperature", "rain"]):
intent = "get_weather"
elif any(word in message for word in ["hello", "hi", "hey"]):
intent = "greeting"
elif any(word in message for word in ["bye", "goodbye", "see you"]):
intent = "farewell"
else:
intent = "unknown"
# Entity extraction (simplified)
cities = ["london", "paris", "tokyo", "hyderabad", "amsterdam"]
detected_city = next((c for c in cities if c in message), None)
# Response generation
responses = {
"greeting": "Hello! How can I help you today? ๐",
"farewell": "Goodbye! Have a great day! ๐",
"get_weather": f"Checking weather for {detected_city or 'your area'}... ๐ค๏ธ",
"unknown": "I'm not sure I understand. Could you rephrase? ๐ค",
}
return responses[intent]
# Try it out
print(simple_chatbot("Hi there!"))
print(simple_chatbot("What's the weather in Tokyo?"))
print(simple_chatbot("Tell me a joke"))
| Year | System | Approach | Capability | |------|--------|----------|------------| | 1966 | ELIZA | Pattern matching | Mimicked a therapist with simple text substitution | | 1995 | ALICE | Rule-based (AIML) | More rules, still no real understanding | | 2011 | Siri | Statistical NLP | Voice-activated assistant with web search | | 2016 | Alexa / Google Assistant | Deep learning NLP | Better at context and follow-up questions | | 2018 | BERT (Google) | Transformer (bidirectional) | Understood context in both directions | | 2022 | ChatGPT (OpenAI) | Transformer (generative) | Generates fluent, contextual multi-turn conversation | | 2024 | GPT-4, Claude, Gemini | Multimodal transformers | Handle text, images, code, and reasoning |
ELIZA (1966) fooled some users into thinking they were talking to a real therapist โ not because it was smart, but because humans naturally project understanding onto conversational partners. This is called the ELIZA effect, and it still applies today when we overestimate what chatbots truly "understand."
Modern language models are impressive, but they have real limitations:
Models sometimes generate confident but completely false information. They don't "know" facts โ they predict the most likely next word.
A language model has never seen, touched, or experienced the world. It only knows language patterns from text. It can describe a sunset beautifully without ever having "seen" one.
Models can only consider a fixed amount of text at once. Very long conversations may cause them to "forget" earlier context.
If training data contains biases (stereotypes, misinformation), the model reproduces and sometimes amplifies them.
# Illustrating the hallucination concept
def language_model_intuition(prompt):
"""
A language model doesn't retrieve facts โ it predicts probable words.
Prompt: "The capital of France is ___"
The model has seen "Paris" follow this pattern thousands of times,
so it predicts "Paris" โ not because it 'knows' geography,
but because it's the statistically likely next word.
For rare or ambiguous prompts, it may confidently predict
something entirely wrong โ that's a hallucination.
"""
pass # Real models use billions of parameters for this prediction
When a chatbot says "I think" or "I believe," does it actually think or believe anything? How does knowing that it's predicting the next word change how you interpret its responses? Should chatbots be required to disclose that they're AI?
You now understand how machines process language. In the next lesson, we'll tackle computer vision โ teaching machines to see. You'll learn how AI recognises faces, drives cars, and powers the AR filters on your phone!