Methodology
How I test CAPTCHA services, what equipment I use, and what biases I am aware of.
Every review on this site is based on a test I actually ran. This page describes what those tests look like, what the rating numbers mean, and where my method has gaps.
The setup
Tests run from a normal home environment unless I say otherwise.
- Location. Western Europe. I will say if I test from elsewhere.
- Networks. A residential connection by default. For services that behave differently on commercial IPs, I add runs through a consumer VPN (Mullvad), a datacentre IP (a small VPS), and a mobile carrier connection (tethered).
- Browsers. Firefox and Chromium as a baseline. I add Safari when a service depends on browser quirks.
- Devices. A current Mac for desktop. A mid-range Android phone for mobile. No headless browsers in scoring (those are covered separately as their own test class).
- Account tier. I use the free tier first. Paid tiers are tested when the free tier is too limited to draw a useful conclusion. I always say which tier I used.
What I record
For each test:
- The date I ran it.
- The browser, OS, and IP type.
- What I tried to do (sign up, post a form, integrate a widget, send a request, etc.).
- What happened. Including the things that did not work and the time I spent fighting them.
Screenshots are taken at the time of the test. If I update an old review, the new screenshots get a new date in the caption.
What the rating means
The number at the bottom of a review is a 1–10 score. It is not a star rating in disguise. It is a summary of how I would describe the service to a friend who asked. Roughly:
- 9–10 — I would recommend this without hesitation for the use case the post is about.
- 7–8 — I would recommend this, with caveats noted in the post.
- 5–6 — There are real reasons to pick it and real reasons not to.
- 3–4 — I would not pick it for the use case the post is about.
- 1–2 — Avoid.
The score is one number summarising a few axes. The post itself is the real review; the number is the headline.
The axes (provisional)
These are the things I weigh. The exact weighting is going to be tightened as more reviews go up.
- Privacy. What data the service collects, where it goes, and whether it is honest about it.
- Accessibility. Keyboard support, screen reader support, alternatives for users who fail the primary challenge.
- Bypass resistance. How easily a CAPTCHA-solving service or a competent script can get past it. Where I can measure this with public solvers, I do.
- Integration friction. How long it took to get a working test page up, how clear the documentation was, how often I had to read the source.
- Pricing honesty. Whether the public pricing matches what you actually pay, whether there are surprise tiers, whether the free tier is real.
Biases I know I have
- I prefer privacy-respecting services. I try to score this fairly, but my preference shows up in which services I cover first.
- I prefer self-hostable software. Same caveat.
- I am one person. I have one network, one set of devices, one set of accounts. Results from a corporate IP block or a different country will differ. When I can, I run extra tests; when I cannot, I say so.
- I am writing in English about services that mostly market in English. Coverage of Chinese, Russian, and Indic-market services will be weaker until I find ways to test them properly.
If any of this is wrong or unclear, tell me.