Scaling a High-Traffic API with FastAPI and Kubernetes

Navigating Kubernetes in Production: Lessons from a High-Traffic API Dealership

Estimated reading time: 7 minutes.

Key takeaways

Transitioning from Django to FastAPI can significantly enhance performance.
Service-oriented architecture allows for better scaling of independent services.
Investing in observability and monitoring is crucial during high traffic.
Proper documentation and communication can streamline onboarding and maintenance.

The Real Problem
Why Common Solutions Failed
What We Implemented Instead
Architecture and Scaling Decisions
What Worked / What Didn’t
Lessons Learned

The Real Problem

Not long ago, I was tasked with scaling a high-traffic public API that served as a data repository for a financial services application. Our users—over 50,000 unique clients—were sending around 500,000 requests per day, with peak traffic hitting upwards of 2,000 requests per second. On average, each request needed to retrieve data from a combination of ClickHouse for analytical queries and Couchbase for document-based storage.

Initially, our backend was built with a Django monolith. While it worked adequately during our early days, the performance began to degrade dramatically under load. The inadequacies of Django, particularly its synchronous nature, meant that we were frequently facing latency issues. During peak hours, the latency clocked in at over 700 milliseconds, making us consider moving to responsive frameworks like FastAPI.

Why Common Solutions Failed

As our traffic increased, we attempted a number of conventional solutions to scale our architecture:

Vertical Scaling: First, we simply tried beefing up our database performance by upgrading the server. However, as our dataset grew, so did our server costs, without delivering significant performance improvements.
Caching: We also implemented various caching strategies using Redis, but the code modifications required for caching to be effective added complexity to our request handling. We learned the hard way that relying on cache invalidation became a maintenance nightmare and introduced new points of failure.
Horizontal Scaling: We attempted to scale our Django applications horizontally on Kubernetes by deploying more pods. However, Kubernetes load balancing didn’t alleviate the latency issues, primarily because our individual app instance still suffered from synchronous database calls.
Database Sharding: The final straw was trying to implement sharding on Couchbase. This was optimistic at best since it required massive changes in data modeling and running migrations while maintaining uptime.

The combination of these strategies failed to provide a sustainable solution. We faced frequent database timeouts, especially during our peak hours, leading to degraded user experiences.

What We Implemented Instead

It was clear we needed a radical architecture change. After considerable evaluation, I opted to migrate to FastAPI to leverage its asynchronous capabilities. The combination of FastAPI and Gunicorn with uvicorn workers allowed us to handle a significantly larger number of concurrent requests.

In addition to the migratory shift towards FastAPI, we made several other impactful adjustments:

Service-Oriented Architecture (SOA): We broke down our monolithic Django app into microservices. Each service handled a specific function, allowing them to scale independently. For example, we created a dedicated analytical service fed by ClickHouse, handling all heavy read queries.
Database Layer Improvement: We kept Couchbase for document storage but decided to improve performance by using QuestDB for time-series data, reducing the workload on Couchbase and streamlining analytics queries.
Distributed Caching: To alleviate the latency caused by cache invalidation, we moved to a shared caching layer using Redis streams for managing real-time data with less complexity.
Load Testing and Observability: We introduced more robust monitoring using OpenSearch for log management and realized that we had underinvested in observability. This allowed us to troubleshoot effectively and monitor query performance.

Architecture and Scaling Decisions

The architecture change essentially revolved around a cleaner division of concerns:

User Service (FastAPI)
Data Analytics (FastAPI with ClickHouse)
Document Service (Couchbase)
Caching Layer (Redis)

These services communicated with each other through gRPC and REST APIs empowering independent handling of requests and scaling.

By rescaling our Kubernetes clusters, autoscaling was put to the test with Horizontal Pod Autoscalers (HPA) based on CPU utilization metrics. This was a key improvement, enabling the system to cope with spikes without pre-emptively over-provisioning resources.

What Worked / What Didn’t

There were significant wins from this architectural shift:

Throughput: Our request throughput nearly doubled. We saw latency drop to an average of 150 milliseconds under load, which we considered an acceptable level given our SLA commitments.
Independence: Independent services allowed teams to deploy features without impacting others, promoting agility.
Easier Scalability: With isolated service scaling, we could target individual microservices based on their load requirements.

However, there were drawbacks:

Increased Complexity: The shift to microservices meant an increase in operational overhead. We had to deal with the intricacies of managing inter-service communications, which led to its own set of performance bottlenecks.
Onboarding Challenges: New engineers found it challenging to navigate the now-complex architecture, with less familiarity regarding workflows across multiple layers.
Dealing with Distributed Data: Running separate databases for services led to new challenges in managing data consistency and recovery mechanisms.

Lessons Learned

Start Small, Scale Gradually: If I could give one piece of advice, it would be to start with a small, manageable system design. A well-structured monolith can serve as a robust foundation before jumping into microservices.
Invest in Observability Upfront: Comprehensive logging and monitoring aren’t optional. They are essential in understanding system health, especially in distributed environments with multiple data sources.
Test Before You Invest: Load testing should be a norm, not an afterthought. Seeing how your application behaves under pressure can prevent costly mistakes down the road.
Document Everything: As complexity grows, so does the need for proper documentation. A centralized knowledge base can aid onboarding and maintenance.
Communication is Key: There needs to be a culture of open communication within teams, especially when working across multiple services. Ensuring everyone is on the same page can optimize performance and reduce bottlenecks.

Reflecting on this experience, the lessons are as much about technology as they are about the processes and people that shape them. Scaling a high-traffic API in production is both a technical and human endeavor, and reading between the lines of operations’ struggles can reveal insights that guide future success.