Skip to main content

How to test a proxy API? Web Scraper Checklist

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to test a proxy API?

A proxy masks your IP address and thus changes the apparent location of your device in front of a website it is trying to access. To distinguish between a good and not good proxy, you need to test it. Of course, you would always want to use a reliable proxy API and not be slowed down by a high-latency one because that would defeat the entire purpose of a proxy API.

What is the proxy API?#

Data collection specialists usually access web services through an API proxy interface. It can help them overcome country restrictions, defeat rate-limiting, execute web page's javascript and fetch data as it would be rendered in a real browser.

So, what is a proxy API?#

Proxy API (web scraping API) is an intermediary program that sits between consumer applications and the actual backend services that handle the requests. The proxy API can be a small shim that sends requests in and out or a more complex application that provides data pre- and post-processing. The proxy API consumer doesn't need to adapt to the implementation of the underlying service, as it's just an intermediate between the web scraping script and the actual data source.

Why do you need to test a proxy API?#

Proxies provide anonymity and diversity; thus, if anything were to compromise that, it would not be beneficial for the users. Besides this, multiple aspects of a proxy need to be tested so that we know that the proxy API we will use is up to the mark. Let's discuss all of these in great detail.

Proxy Latency / Speed#

One of the problems that could arise with proxy API is the latency issue. It may take a lot of time for the desired website to respond, which translates to a low speed or a very high latency. This is not at all desirable when one is using proxies. For example, you could be using a proxy API with sneaker bots to buy limited edition sneakers that would go out in a flash. That is why you need to test your proxy API that its speed is up to the mark because if you did not check it, then when the moment came, your request was not processed quickly enough for your purchase to be successful.

Location Determination#

The proxy API feeds it a fake IP address that would take give it a different country location.

This is necessary when one is trying to view content from a website that is banned in their country or when one is trying to access a country-specific website. For example, you want to browse a website of a shop that is in the US and delivers only in the US. Without a proxy API, you cannot do so. However, when you use a US proxy, you can easily browse, scrape data or do whatever that you need to do. Thus, knowing about what location the proxy feeds to the backend service is necessary.

Blacklisting Check#

Proxies usually provide different users with a similar IP address. When websites receive repeated visits from a similar IP address trying to web scrap, it makes them suspicious of that IP address, which could result in being tagged as a spam IP address. If the IP address provided by a proxy API has been flagged as spam, it will be banned from the website, and thus you will not be able to view it. Thus, checking if a proxy API has been listed as spam is necessary.

Concurrency Limits#

Typically, proxy APIs are tuned in such a way that not all threads are being utilized at a given moment and still delivering the desired latency. However, achieving maximum thread is not undesirable too. Sometimes, it can happen that a maximum thread limit has been achieved due to multiple requests being piled up. This can potentially overload it. While web scraping, requests are made in a great number which can lead to saturation very quickly and overloading.

How to test a proxy API?#

So, here is the million-dollar question. If we need to be cautious of these things before employing a proxy API for our use, then how to test it? Well, in order to test different aspects of our desired proxy API, we will make use of different services and approaches. And then finally determine the quality of the proxy API under question.

Determine geolocation#

In order to check the location of the IP address provided by a proxy API, you need an IP checker. It tells the location of your IP address. Using the proxy API, access the said IP checker, and it will show you where the IP address leads to. If it is according to what you desire, then the first box on your checklist can be ticked. You may use any checker. IP location and What Is My Address are two such services are providing accurate locations.

Website scraping time#

When you are using a proxy API to access a website, especially when scraping, you increase a step in the request sent, its processing and data receiving. This can lead to a decreased speed or high latency that provides a bad experience for the user. Thus, you can just compare a website loading speed in the browser and a proxy API website data gathering speed.

Use a browser response to determine the data loss#

When using proxy APIs, it can happen that the data that was requested was not fully delivered. Instead, limited data was sent that was enough for building a page but could not load all the components. A simple example is when a particular component like a dynamic one was not loaded due to blocked XHR requests, so the website skeleton loaded without prices, product details, etc.

Proxy API anti-bot system avoidance test#

If your proxy API is not able to pass anti-bot tools, it will be detected as a bot by websites, and then you might be required to go through a captcha to prove that you are not a bot. The reason for this is the traffic of your proxy API's IP. Test your proxy API by scraping websites that are protected by anti-bot tools like Akami, Cloudfron, ReCaptcha, etc.

You would be able to determine whether a request was detected by an ability to get the needed data from the webpage.

Check if the proxy passes Cloudflare checks#

When a request is made from the consumer's end, it goes through the proxy API to the website, which would then ask the host, i.e., Cloudflare, to deliver the required data to the website. If the traffic from a proxy API seems suspicious to Cloudflare, it can deny access to it or start a challenge (click on the checkbox, enter the CAPTCHA, or even get stuck in the endless loop).

Unsuccessful requests billing check#

Not all websites can be scraped using a general-purpose web scraping API (proxy API), as some of the highly-protected websites might require a different set of proxy locations or proxy types. Still, a good proxy API service should determine unsuccessful requests and doesn't bill them.

You can test out such behavior by checking the billed usage in the dashboard of your web scraping provider.

Conclusion#

Proxy API is incredibly useful when you're trying to get data from the web but don't want to manage your own proxy IP pools. Even in comparison between rotating proxy services and web scraping APIs (proxy APIs), the last one is more efficient, as it provides the same or better level of IPs diversity but doesn't bill for unsuccessful requests.

ScrapingAnt is one of the best web scraping APIs, as it covers all the aspects of web scraping by providing:

  • thousands of standard datacenter proxies
  • millions of premium residential proxies
  • unlimited concurrent requests
  • the cloud browser rendering feature, so a web page would be opened as it will be in a real browser
  • custom request success detection system, that wouldn't allow you to pay for the not successful request

Happy Web Scraping, and don't forget to test your proxy API provider before the paid subscription usage 🎯

Forget about getting blocked while scraping the Web

Try out ScrapingAnt Web Scraping API with thousands of proxy servers and an entire headless Chrome cluster