The Mystery of the Unexpected Traffic Surge: Poker Planning app

Discover how a sudden surge in Vercel Edge Requests for a Miro Planning Poker app led to significant cost increases. Learn how upgrading to SDKv2 and optimizing app requests reduced traffic, ensuring sustainable performance and cost management.


#An Unexpected Message

It was a seemingly ordinary day when I received a perplexing email from Vercel, the popular cloud platform. The message was a stark warning: my team’s Planning Poker app had generated an astronomical 30,001,783 Edge Requests, far surpassing our Pro plan's limit of 10 million.

Me looking at the bill

#Day 1: Initial Investigation

I quickly logged into the Vercel dashboard and discovered that the surge in requests had been occurring for several months. Before this period, there were no significant requests at all. With no recent changes to the app and a declining user base, the spike in traffic was baffling.

Immediately, I reached out to Vercel’s support team:

"I think we’re under a DDoS attack. The workflow of the app implies the /sidebar endpoint to be called, which shows 51,729 requests in the last 30 days compared to an insane 8 million for the homepage."

Vercel’s support suggested that the surge could be due to a variety of reasons, including legitimate increases in web traffic. They pointed out a notable spike from a specific JA4 digest and recommended configuring a firewall rule to block the offending traffic.

#Day 2-3: Firewall Implementation and Monitoring

Following Vercel’s advice, I set up firewall rules to block the suspicious traffic. However, I soon realized that these rules did not help with the Edge Requests metric. Every request, whether blocked or not, was still being counted, and this did not alleviate the billing issue.

Firewall

To gain more insights, I enabled Vercel’s expensive monitoring package, hoping to track the traffic sources and patterns more effectively. Unfortunately, this added another $36 to the bill within a few days, without providing a sustainable solution.

"Since I turned on Monitoring, I am already over the limit. Will Monitoring charge me for denied requests when the firewall is on?"

Vercel support confirmed that all requests, including those denied by the firewall, still counted towards the Edge Requests metric. Frustrated, I had to disable monitoring to prevent further costs from spiraling out of control.

#Day 4-5: Reaching Out for External Help

In parallel, I contacted Miro’s support team, explaining the situation and my suspicion of a DDoS attack. I provided detailed statistics showing that the majority of requests were coming from specific user agents, mostly variations of Chrome on different operating systems.

Miro’s team responded promptly:

"I believe you're right that this is related to the number of users that have installed your app. There will be at least one request made from Miro for the main app entry point each time a user opens a board with an installed Web SDK app."

With this insight, I realized that even if users were not actively engaging with the app, their browsers were still making requests.

#Day 6-7: The Pricing Change Twist

Compounding the problem was a recent change in Vercel’s pricing model. The new model introduced more granular metrics, including charges for Edge Requests, which had previously been bundled with bandwidth costs. This change, effective from April 2024, significantly impacted our project costs.

"Before the pricing change, all requests were included in bandwidth. Now, with the new pricing, Edge Requests alone are 360% above our limit."

My frustration was echoed in exchanges on GitHub, where other users faced similar issues with Vercel’s new pricing model. One user aptly summarized the dilemma:

"Because this way we are optimizing not for users, but for Vercel, which is kind of unhealthy."

#Day 8-10: Implementing Solutions

To mitigate the issue, I implemented several optimizations:

  1. Consolidation of Endpoints: I merged multiple endpoints into a single one to reduce the number of requests.

  2. Adjusting App Logic: I adjusted the app logic to minimize unnecessary calls, such as consolidating the /init.js endpoint into the main entry point.

  3. Monitoring and Analysis: Despite the high cost, I used the monitoring data to identify and block suspicious traffic sources.

These measures resulted in a significant reduction in requests, as shown in the Vercel dashboard. The daily requests dropped by 6.4 times, bringing the total below the 10 million per month threshold.

#Day 11-15: Migration to SDKv2 and App Router

As part of the long-term solution, I decided to migrate to SDKv2 and the new App Router. This migration was crucial because it addressed the core issue: every user who had installed the app, even if not actively using it, generated requests to my servers, leading to the insane number of requests.

By upgrading to the App Router, I managed to reduce the number of files being fetched on the initial request from 10 to 4. Here’s how the optimization worked:

Using Webpack, I merged several JavaScript files, including webpack.xxx.js and framework.xxx.js, into one main chunk. This significantly reduced the number of requests needed to load the app. Additionally, Next 14 do not use files like _middleware, _ssgManifest, and others that were contributing to the high request count.

This significant reduction in the number of requests not only improved performance but also helped in managing costs more effectively.

#The Resolution and the Ongoing Mystery

While the immediate issue of high traffic had been mitigated, the mystery of the initial surge remained partially unresolved. Was it a coordinated attack, an unintended consequence of app installations, or something else entirely? The case highlighted the complexities of managing a popular app in a constantly evolving tech landscape.

Traffic reduction

As I continue to monitor the app, one thing is clear: vigilance and adaptability are crucial. The Planning Poker app, once a contest-winning project, now stands as a testament to the challenges and triumphs of modern web development.

How to: Managing Unexpected Traffic Surges in Next.js Application

Learn to manage traffic surges in Vercel. Identify spikes, investigate causes, and optimize your app with SDKv2 and Webpack. Implement firewall rules and regularly monitor metrics to maintain performance and control costs effectively.

  1. Monitor Traffic Patterns in Vercel: Regularly check your Vercel dashboard for unusual traffic spikes. Consistent monitoring helps in early detection of anomalies.

  2. Analyze Vercel Usage Metrics: Look for periods with sudden increases in Edge Requests. These metrics are crucial in identifying unexpected surges without corresponding changes in user behavior or application updates.

  1. Enable Detailed Monitoring in Vercel: Temporarily enable Vercel’s detailed monitoring to gather comprehensive data on the traffic sources. This step is essential for pinpointing the origin of the surge.

  2. Analyze Request Data in Vercel: Focus on user agents and referrers to identify patterns. In my case, specific Chrome user agents were responsible for the majority of requests, indicating automated or background activity.

  3. Check for Background Activity in Miro: Determine if requests are coming from users who are not actively using the app but have it installed on Miro boards. This insight can reveal hidden sources of traffic.

  1. Upgrade to the latest Next.js: The latest versions of Next.js are better optimized.

  2. Pay attention to middleware: Middleware in Next.js is a wildcard tool allowing to alter behavior of the app and sometimes can be overlooked.

  3. Consolidate Endpoints with Webpack: Merge multiple endpoints into a single one to reduce the number of requests. Be careful, it can harm the initial load time, since parallel requests of small files usually faster than fetching a single big chunk.