Getting blocked is the #1 problem in web scraping. Cloudflare, Akamai, PerimeterX, and DataDome have gotten incredibly sophisticated at detecting bots. But scraping at scale is still very possible — you just need the right techniques.
Why Websites Block Scrapers
Understanding the detection signals helps you avoid them:
- IP reputation — datacenter IPs are flagged instantly; residential IPs are trusted
- Request patterns — 1,000 requests/second from one IP is obviously not human
- Browser fingerprinting — headless browsers have detectable signatures
- TLS fingerprinting — JA3/JA4 hashes identify the HTTP client library
- Behavioral analysis — no mouse movements, instant page navigation, missing cookies
Level 1: Proxy Rotation
The foundation of anti-detection. Rotate IPs so no single address sends too many requests.
import requests
# ZentisLabs rotating residential proxy — new IP per requestproxy = "http://USER:PASS@gate.zentislabs.com:7777"proxies = {"http": proxy, "https": proxy}
response = requests.get( "https://target-site.com/products", proxies=proxies, timeout=15)Rules for effective rotation:
- Use residential proxies for sites with anti-bot protection
- Use sticky sessions for multi-page workflows (login → navigate → scrape)
- Target specific countries/cities to match expected user geography
- Never use free proxy lists — they're burned and monitored
Level 2: Request Headers and User-Agents
Mismatched or missing headers are an instant red flag.
import random
user_agents = [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/122.0.0.0 Safari/537.36", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 Safari/605.1.15", "Mozilla/5.0 (X11; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0",]
headers = { "User-Agent": random.choice(user_agents), "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate, br", "Sec-Fetch-Dest": "document", "Sec-Fetch-Mode": "navigate",}
response = requests.get(url, headers=headers, proxies=proxies)Level 3: Rate Limiting and Human-Like Timing
import timeimport random
def human_delay(): """Random delay between 1-4 seconds with occasional longer pauses""" if random.random() < 0.1: # 10% chance of a longer pause time.sleep(random.uniform(5, 15)) else: time.sleep(random.uniform(1, 4))
for url in urls_to_scrape: response = requests.get(url, proxies=proxies, headers=headers) process(response) human_delay()Level 4: Browser Automation with Stealth
For JavaScript-heavy sites, you need a real browser. Playwright with stealth plugins defeats most fingerprinting.
const { chromium } = require('playwright-extra');const stealth = require('puppeteer-extra-plugin-stealth')();chromium.use(stealth);
const browser = await chromium.launch({ proxy: { server: 'http://gate.zentislabs.com:7777', username: 'USER', password: 'PASS' }});
const context = await browser.newContext({ viewport: { width: 1920, height: 1080 }, userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/122.0.0.0', locale: 'en-US', timezoneId: 'America/New_York',});
const page = await context.newPage();await page.goto('https://target-site.com', { waitUntil: 'networkidle' });
// Simulate human behaviorawait page.mouse.move(100, 200);await page.waitForTimeout(1500);
const data = await page.evaluate(() => { return document.querySelector('.product-price')?.textContent;});await browser.close();Level 5: Session and Cookie Management
Websites track sessions. A request without cookies after "visiting" the homepage is suspicious.
session = requests.Session()session.proxies = proxies
# First visit the homepage to get cookiessession.get("https://target-site.com/", headers=headers)time.sleep(2)
# Now requests carry the session cookiesresponse = session.get("https://target-site.com/api/products", headers=headers)Level 6: CAPTCHA Handling
When you do hit a CAPTCHA, solve it and continue rather than abandoning the session.
- Use ZentisLabs CAPTCHA solver API for reCAPTCHA v2/v3, hCaptcha, and Turnstile
- Implement retry logic: detect CAPTCHA → solve → replay request
- Reduce CAPTCHA frequency by using residential proxies and proper fingerprinting
Anti-Detection Checklist
| Technique | Impact | Difficulty |
|---|---|---|
| Residential proxies | High | Easy |
| Rotate User-Agents | Medium | Easy |
| Rate limiting | High | Easy |
| Browser automation | Very High | Medium |
| TLS fingerprint matching | High | Hard |
| CAPTCHA solving | High | Medium |
| Behavioral simulation | High | Hard |
🛡️ The sites that are hardest to scrape aren't using better technology — they're better at detecting lazy patterns. Match human behavior, use trusted IPs, and respect rate limits. ZentisLabs residential proxies with per-request rotation handle the IP layer automatically.
