Key Takeaways from Managing High-Traffic APIs on Kubernetes

Lessons Learned from Running High-Traffic APIs on Kubernetes

Estimated reading time: 5 minutes

Understand your load patterns to better anticipate scaling needs.
Use the right tools, such as FastAPI, for more efficient performance.
Establish robust monitoring practices to identify bottlenecks early.
Invest in communication and training to adapt to new technologies.

The Real Problem
Why Common Solutions Failed
What We Implemented Instead
Architecture and Scaling Decisions
What Worked / What Didn’t
Lessons Learned

The Real Problem

We launched a new feature aimed at enhancing user engagement, which I’ll refer to as “user tracking.” Given our existing user base of around a million active users, we anticipated a spike in traffic. Unfortunately, we underestimated both the load and the latency involved in our microservices architecture. What should have been a straightforward implementation turned into a disaster within hours.

At peak times, we saw request counts skyrocket to over 30,000 requests per minute. Our Django application, operating in Kubernetes, was managing fine most of the time. However, as that spike hit, the system began to fail spectacularly. We experienced a chilling 500 Internal Server Error rate of over 30% on our API endpoints, and latency shot up from a comfortable 150ms to a horrifying 2,500ms on average.

Why Common Solutions Failed

In the face of this problem, we leaned on well-trodden solutions, but they ultimately fell short. Our first instinct was to scale horizontally by adding more replicas of our Django services. However, the Kubernetes Horizontal Pod Autoscaler (HPA) only kicks in when CPU or memory usage crosses a certain threshold. Since our CPU and memory consumption remained reasonably stable during the traffic spike (thanks to asynchronous queues), the HPA didn’t respond as quickly as we needed.

Additionally, as we attempted to optimize the application using common techniques—like code caching and database reads through connection pooling—we only succeeded in marginally improving response times. Concurrent database connections began to throttle because our PostgreSQL instance wasn’t tuned for this type of load, causing enough contention that the overall performance degraded.

What We Implemented Instead

To address these issues, we first needed to thoroughly understand our architecture. We decided to transition from a monolithic Django service to FastAPI for our user tracking endpoint. FastAPI is not only asynchronous but built for speed—perfect for handling HTTP requests concurrently without the overhead associated with a traditional synchronous framework.

Next, we implemented a more comprehensive monitoring solution using OpenTelemetry to gain real-time insights into bottlenecks. Further, we moved our critical data from PostgreSQL to ClickHouse for analytical queries. ClickHouse’s columnar storage model offered the speed and scale we needed for analytics without adding extra pressure on our transactional DB.

For caching, we abandoned Redis for frequent read operations in favor of Couchbase, which offered better capabilities in terms of automatic sharding and distributed caching. Couchbase’s architecture allowed us to scale across clusters with minimal downtime.

Architecture and Scaling Decisions

Implementing FastAPI meant we also had to navigate a few architectural changes. By using Uvicorn as our ASGI server, we could handle requests asynchronously and accept connections much more efficiently than Django’s WSGI model. We deployed this new API as its own Kubernetes Deployment and service, exposing it with an Ingress controller set to route traffic based on specific paths.

The move to ClickHouse was not without its hurdles. We initially struggled with migrating real-time data without disruptions, but after thoroughly planning our data ingestion strategy with Kafka, we achieved seamless integration. Using ClickHouse, we reduced our analytical query times from seconds to milliseconds, which was essential for tasks like generating user reports.

What Worked / What Didn’t

Several key decisions bore fruit. FastAPI’s async capabilities led to a dramatic drop in average response time down to around 300ms, even during peak hours. The combination of FastAPI and ClickHouse provided the scalability we desperately needed; we could now handle more than 40,000 requests per minute without so much as a blink.

However, not everything went to plan. Adjusting the caching strategy led to unforeseen complexity. Our use of Couchbase added a learning curve for the team, requiring upfront investment in training and practice. The support structure for Couchbase was not as mature as for Redis, leading us to spend time troubleshooting complex caching behaviors.

Moreover, scaling OpenTelemetry and monitoring proved to be more intricate than anticipated. We experienced excessive logging, which led to storage issues and forced us to refine what we truly needed to capture. This is an ongoing problem—finding the right balance between having enough data to make informed decisions without bogging down the system with unnecessary overhead.

Lessons Learned

Looking back, a few key takeaways echo through this experience:

Understand Your Load Patterns: We underestimated the implications of high traffic on our existing architecture. Anticipating patterns and scaling accordingly, rather than reacting post-factum, is essential.
Use the Right Tools for the Job: FastAPI was a game-changer, and sometimes refactoring for performance can be worth the effort. Similarly, ensuring that the backend data store is optimized for the type of queries you’ll be running can save a lot of headaches.
Monitoring is Key: Having robust monitoring and observability might seem like additional overhead at first, but it’s an invaluable tool. We could have identified critical performance bottlenecks sooner had we established better monitoring practices from the onset.
Communication and Training: Adding new technologies requires not just a technical deployment but also a cultural shift. Ongoing training and resources are vital for the team to adapt and grow with emerging architectures.

In retrospect, while the path was fraught with challenge, the lessons learned have fundamentally reshaped our approach to building and scaling high-traffic APIs effectively on Kubernetes. As I’ll always assert, the battle is only lost for those who fail to learn.