This is a capstone case study โ you'll combine everything from AI Sketch through AI Polish to design a complete AI-powered social media feed.
Design a system that serves personalised tweets to 500M+ users in real-time, ranking content by relevance, recency, and engagement potential.
What makes this hard:
This is the exact type of question asked at senior+ interviews at Meta, Google, Twitter/X, and LinkedIn. It tests DSA, system design, AND ML knowledge simultaneously.
User { id, name, followers[], following[], interests[] }
Tweet { id, authorId, content, media[], timestamp, metrics }
Engagement { userId, tweetId, type: view|like|retweet|reply, timestamp }
| Structure | Purpose | Why This One? | |-----------|---------|---------------| | Hash Map | User โ profile lookup | O(1) access by userId | | Sorted Set | Timeline ranking | O(log n) insert + range queries | | Bloom Filter | "Already seen" dedup | Space-efficient set membership | | Inverted Index | Content search | Maps tokens โ tweet IDs | | Graph (adjacency list) | Social connections | Follow/follower relationships |
Why use a Bloom Filter instead of a regular HashSet for "already seen" detection? With 500M users each seeing 100+ tweets/day, a HashSet would consume terabytes. A Bloom Filter uses ~1% of that memory with an acceptable false positive rate (~1%).
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Tweet โโโโโโถโ Fan-out โโโโโโถโ Timeline Cache โ
โ Ingestion โ โ Service โ โ (Redis Sorted โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ Sets) โ
โโโโโโโโโโฌโโโโโโโโโ
โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โผ
โ User โโโโโโถโ Ranking โ โโโโโโโโโโโโโโโโโโโ
โ Request โ โ Service โโโโโโถโ Feed Response โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโดโโโโโโโ
โ ML Model โ
โ Service โ
โโโโโโโโโโโโโโโ
Fan-out on write for users with fewer than 10K followers:
Fan-out on read for celebrities (> 10K followers):
function handleNewTweet(tweet, author):
if author.followerCount < CELEBRITY_THRESHOLD:
// Fan-out on write
for follower in author.followers:
timelineCache.add(follower.id, tweet, score=tweet.timestamp)
else:
// Fan-out on read โ store in celebrity tweets pool
celebrityTweets.add(author.id, tweet)
The hybrid approach is what Twitter actually uses. Pure fan-out-on-write doesn't scale for accounts with millions of followers (one tweet = millions of cache writes).
The ranking model predicts: P(user engages with tweet)
| Feature Category | Examples | |-----------------|----------| | User features | interests, past engagement patterns, active hours | | Tweet features | age, media type, length, hashtags, author popularity | | Cross features | user-author interaction history, topic overlap | | Context features | time of day, device type, connection speed |
Stage 1 โ Candidate Generation (fast, broad):
Stage 2 โ Fine Ranking (slow, precise):
function rankFeed(userId, candidates):
// Stage 1: Coarse ranking
scored = candidates.map(tweet =>
({tweet, score: lightweightModel.predict(userId, tweet)})
)
top200 = scored.sortBy(s => -s.score).slice(0, 200)
// Stage 2: Fine ranking
reranked = deepModel.batchPredict(userId, top200)
// Stage 3: Business rules (diversity, freshness boost)
return applyBusinessRules(reranked)
The ranking model alone would show only viral content. Apply post-ranking rules:
New users have no engagement history. Solutions:
// WebSocket connection for live updates
connection.onTweet(tweet => {
if (isHighPriority(tweet)): // Breaking news, close friend
injectToFeed(tweet, position=TOP)
else:
showNewTweetsButton(count++)
})
In a real interview, expect follow-up questions:
| Skill Area | Applied Here | |-----------|-------------| | DSA (AI Sketch) | Hash maps, sorted sets, bloom filters, graphs | | Patterns (AI Chisel) | Two pointers (sliding window for time ranges), BFS (social graph) | | System Design (AI Craft) | Distributed architecture, caching, fan-out strategies | | Behavioural (AI Polish) | Trade-off discussion, incident handling, ethical considerations |
This is what a senior engineer answer looks like โ it connects algorithms to architecture to real-world constraints.