System Design Interview Preparation

System Design Decision Flow

1-  Clarify Requirements

   • Functional: What features are needed?

     – Example: Store, retrieve, update user profiles; search by email, etc.

   • Non-functional: Scale, performance, reliability?

    – Example: 100M users, 1000 RPS, 99.99

2-  High-Level Components

2.1- Client 

       The entry point for user interaction via web or mobile interfaces. Clients send HTTP or API requests to the backend through gateways or load balancers.

2.2- CDN (Content Delivery Network)

        A CDN is a globally distributed network of edge servers that caches and serves static content (like images, CSS, JS, fonts, and videos) from locations geographically closer to users. This reduces latency, offloads traffic from your origin server, and improves content delivery speed. Use when:

• You serve static assets to a global audience.

• You want to reduce load on your application or storage servers.

• You aim to lower latency and improve performance of media-heavy applications.

2.3- Load Balancer

        Distributes incoming network traffic across multiple backend servers to ensure high availability, reliability, and scalability. It can support health checks and failover mechanisms. Use when: You have 2 or more backend servers and need to balance load or prevent downtime.

2.4- Application Servers (Stateless Business Logic)

These servers process incoming requests, apply business rules, interact with databases or other services, and return responses. They should be designed to be stateless, meaning they don’t retain session-specific data between requests. This allows:

• Horizontal scaling: new instances can be added or removed without coordination.

• Fault tolerance: failed instances don’t impact state.

• Load balancing: any instance can serve any request.

Session data (if needed) should be stored in external systems like Redis or databases.

2.5- Authentication Module

Handles user identity, login, and session management using JWTs, OAuth tokens, or session cookies. Use when: Your system

requires secure user access and permission enforcement.

2.6- Rate Limiter

Controls the rate at which users can make requests, helping to prevent abuse, spam, or denial-of-service (DoS) attacks. Use when:

You offer public APIs, login forms, or any endpoints vulnerable to brute-force attacks.

2.7- Cache Layer (Redis/Memcached)

In-memory key-value stores used to cache frequently accessed data like user profiles or session tokens. Use when:

• Read-to-write ratio is high (e.g., 5:1 or more).

• You want to reduce latency or database load.

2.8- Database (SQL/NoSQL)

Stores persistent structured or semi-structured data.

• SQL (e.g., PostgreSQL, MySQL): Use for strong consistency, transactional guarantees, and structured schemas.

• NoSQL (e.g., MongoDB, Cassandra): Use for flexible schemas, large-scale writes, or horizontal sharding.

2.9- Search Engine (Elasticsearch)

A distributed search and analytics engine designed for fast, full-text search and filtering. Use when:

• You need fuzzy matching, autocomplete, or ranked search results.

• You want analytics over logs or semi-structured documents.

• SQL-based querying is too slow or inflexible for your search needs.

Elasticsearch indexes documents and provides powerful query DSLs for scoring, filtering, and faceting.

2.10- Message Queue (Kafka, RabbitMQ)

        Used for asynchronous communication between services. Improves resilience and throughput by decoupling producers and con-sumers. Use when: You need background job processing, event-driven architecture, or real-time data pipelines.

2.11- Object Storage (S3, GCS)

        S3 (Amazon Simple Storage Service) is a cloud-based object storage service for unstructured binary data such as images, videos, logs, and backups. It stores data as objects in buckets, each identified by a unique key. Use when:

• You need to store large files without database overhead.

• You require scalable, durable, and low-cost media storage.

• You want to integrate with CDNs for faster asset delivery.

2.12- Monitoring and Alerts

        Includes metrics tracking (Prometheus), visualization (Grafana), log aggregation (ELK Stack), and error monitoring (Sentry). Use when: You’re deploying to production and need visibility into system health, latency, and failure rates.

2.13- Admin Dashboard

        Internal-facing tool for operators to monitor users, system metrics, logs, and debug issues. Use when: Your operations or support

team needs visibility and control over the system.

3-  Database Design

• Define schema and access patterns

• Example schema: Users ( id , username , email , bio , profile_pic_url , created_at )

• Index for lookup fields (user ID, email)

4-  Choose Storage Engine

• Relational DB: Consistent reads/writes (PostgreSQL, MySQL)

• NoSQL: Flexible schema, high write (MongoDB, Cassandra)

• Search/analytics: Elasticsearch, OLAP

5-  Partitioning Strategy

Decision Flow:

• If data <1M rows: Skip partitioning

• Else:

– Use horizontal partitioning (sharding)

– Choose strategy:

∗ Hash-based: Even load, no range queries

∗ Range-based: Good for time/region data

∗ Directory-based: Hot key isolation, max control

– Map logical partitions to physical nodes

6-  Add Cache Layer

• Use Redis/Memcached for frequent lookups and rate limiting

• If read/write ratio >5:1, caching is valuable

7- Add Search Support

• Full-text or fuzzy matching? Use Elasticsearch or Algolia

• If not, index email/username in DB

8- Media Storage and CDN

• Store images/media on object storage (e.g., S3)

• Use CDN (e.g., CloudFront) for fast delivery

9- Fault Tolerance

• DB replication

• Load balancing across zones

• Retry queues for async processing

10- Monitoring and Alerts

• Prometheus + Grafana or ELK stack

• Alert on: high latency, DB lag, cache misses

11- Scaling Strategy

• If cache misses: Increase TTL or cache more keys

• If DB read load: Add read replicas

• If DB write load: Shard or batch writes

• If storage: Use object store or compression

• If network: CDN, compression, batching

12- Conditional Pseudocode Summary

IF user count < 1 M AND RPS < 100 -> single - node DB

ELSE :

IF consistent writes -> use RDBMS

ELSE -> use NoSQL

IF data > machine size :

IF write - heavy or hot keys -> hash partitioning

ELSE IF range queries -> range partitioning

ELSE IF need fine control -> directory - based

IF read - to - write ratio > 5:1 -> add Redis cache

IF storing media -> use S3 + CDN

IF search -> use Elasticsearch

13- Interview Tip

1. I’d start by clarifying the functional and non-functional requirements.

2. Secondly, pick a data model (like relational, document-based, etc.) based on access patterns (like are you gonna access

them by their IDs etc.)

3. Thirdly, choose a suitable database (like PostgreSQL, MongoDB, etc.)

4. Fourthly, consider whether partitioning is needed.

(a) If so, I’d evaluate hash, range, or directory-based strategies depending on skew, query type, and growth.

5. I’d also add caching, search, and media storage

Comments

Popular posts from this blog

Introduction to Languages and Strings | Theory of Computation | Automata Theory

Frontend vs Backend Explained with a Home Analogy | Web Development for Beginners