Friday, May 16, 2014

What's wrong with Chinese downloads?

Here an example of the download statistics of Groovy 2.3.0 from bintray.com


Why are we receiving so many downloads from China?


Since the creation of JFrog, every time one of our user was providing popular content and downloads, we get hit by huge amount of queries from China.
We tried desperately to find a pattern in theses queries. A faulty download manager, may be the agent name will tell us how to fix this, the pattern of action, and so on. We kind of find a way to stop this, but still every year something new appears.

For info, the pattern is extremely damaging to our servers. A robot of some kind is opening thousands of requests to download something, then starts to get the first few bytes, but immediately stops and keep the connection open.

The issue is that using smart tools on our side to find the above pattern, will block good traffic based on AJAX and long running REST API. Our products (Artifactory and Bintray) needs many parallel connections and long running HTTP sockets to work correctly. So, we are a lot more vulnerable than a standard web site.

Still, I keep thinking! Why people are creating theses robots? Why from China?

Than it hit me!

By running theses robots, they may force us to take the decision to complete block Chinese traffic to our servers. And if we do this, what are the consequences?

Chinese entrepreneurs will create the equivalent of our services inside the firewall. Then the services will run on data centers controlled by the Chinese government and under Chinese law. And we will not be able to complain or fight back.

At the end of the day, we are the bad guys! We unilaterally decided to block China!

I'm wondering if this scenario actually happened to many other website like twitter, amazon, youtube, hulu or facebook, and so paved the ground for the equivalents to be created inside the great firewall.

Am I wrong?

2 comments:

FroMage said...

FTR, we have similar weird numbers coming from China for the Ceylon downloads, where they often represent 80% of traffic, and seem automated: they do not pass through the download page.

usman khatri said...

A faulty download manager, may be the agent name will tell us how to fix this, the pattern of action, and so on. Website