How to Web Scrape Without Getting Blocked in 2026

Getting blocked is the #1 problem in web scraping. Cloudflare, Akamai, PerimeterX, and DataDome have gotten incredibly sophisticated at detecting bots. But scraping at scale is still very possible — you just need the right techniques.

Why Websites Block Scrapers

Understanding the detection signals helps you avoid them:

IP reputation — datacenter IPs are flagged instantly; residential IPs are trusted
Request patterns — 1,000 requests/second from one IP is obviously not human
Browser fingerprinting — headless browsers have detectable signatures
TLS fingerprinting — JA3/JA4 hashes identify the HTTP client library
Behavioral analysis — no mouse movements, instant page navigation, missing cookies

Level 1: Proxy Rotation

The foundation of anti-detection. Rotate IPs so no single address sends too many requests.

python

import requests

# ZentisLabs rotating residential proxy — new IP per request
proxy = "http://USER:PASS@gate.zentislabs.com:7777"
proxies = {"http": proxy, "https": proxy}

response = requests.get(
    "https://target-site.com/products",
    proxies=proxies,
    timeout=15
)

Rules for effective rotation:

Use residential proxies for sites with anti-bot protection
Use sticky sessions for multi-page workflows (login → navigate → scrape)
Target specific countries/cities to match expected user geography
Never use free proxy lists — they're burned and monitored

Level 2: Request Headers and User-Agents

Mismatched or missing headers are an instant red flag.

python

import random

user_agents = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/122.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 Safari/605.1.15",
    "Mozilla/5.0 (X11; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0",
]

headers = {
    "User-Agent": random.choice(user_agents),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
}

response = requests.get(url, headers=headers, proxies=proxies)

Level 3: Rate Limiting and Human-Like Timing

python

import time
import random

def human_delay():
    """Random delay between 1-4 seconds with occasional longer pauses"""
    if random.random() < 0.1:  # 10% chance of a longer pause
        time.sleep(random.uniform(5, 15))
    else:
        time.sleep(random.uniform(1, 4))

for url in urls_to_scrape:
    response = requests.get(url, proxies=proxies, headers=headers)
    process(response)
    human_delay()

Level 4: Browser Automation with Stealth

For JavaScript-heavy sites, you need a real browser. Playwright with stealth plugins defeats most fingerprinting.

javascript

const { chromium } = require('playwright-extra');
const stealth = require('puppeteer-extra-plugin-stealth')();
chromium.use(stealth);

const browser = await chromium.launch({
  proxy: {
    server: 'http://gate.zentislabs.com:7777',
    username: 'USER',
    password: 'PASS'
  }
});

const context = await browser.newContext({
  viewport: { width: 1920, height: 1080 },
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/122.0.0.0',
  locale: 'en-US',
  timezoneId: 'America/New_York',
});

const page = await context.newPage();
await page.goto('https://target-site.com', { waitUntil: 'networkidle' });

// Simulate human behavior
await page.mouse.move(100, 200);
await page.waitForTimeout(1500);

const data = await page.evaluate(() => {
  return document.querySelector('.product-price')?.textContent;
});
await browser.close();

Level 5: Session and Cookie Management

Websites track sessions. A request without cookies after "visiting" the homepage is suspicious.

python

session = requests.Session()
session.proxies = proxies

# First visit the homepage to get cookies
session.get("https://target-site.com/", headers=headers)
time.sleep(2)

# Now requests carry the session cookies
response = session.get("https://target-site.com/api/products", headers=headers)

Level 6: CAPTCHA Handling

When you do hit a CAPTCHA, solve it and continue rather than abandoning the session.

Use ZentisLabs CAPTCHA solver API for reCAPTCHA v2/v3, hCaptcha, and Turnstile
Implement retry logic: detect CAPTCHA → solve → replay request
Reduce CAPTCHA frequency by using residential proxies and proper fingerprinting

Anti-Detection Checklist

Technique	Impact	Difficulty
Residential proxies	High	Easy
Rotate User-Agents	Medium	Easy
Rate limiting	High	Easy
Browser automation	Very High	Medium
TLS fingerprint matching	High	Hard
CAPTCHA solving	High	Medium
Behavioral simulation	High	Hard

🛡️ The sites that are hardest to scrape aren't using better technology — they're better at detecting lazy patterns. Match human behavior, use trusted IPs, and respect rate limits. ZentisLabs residential proxies with per-request rotation handle the IP layer automatically.

Why Websites Block Scrapers

Understanding the detection signals helps you avoid them:

IP reputation — datacenter IPs are flagged instantly; residential IPs are trusted
Request patterns — 1,000 requests/second from one IP is obviously not human
Browser fingerprinting — headless browsers have detectable signatures
TLS fingerprinting — JA3/JA4 hashes identify the HTTP client library
Behavioral analysis — no mouse movements, instant page navigation, missing cookies

Level 1: Proxy Rotation

The foundation of anti-detection. Rotate IPs so no single address sends too many requests.

python

import requests

# ZentisLabs rotating residential proxy — new IP per request
proxy = "http://USER:PASS@gate.zentislabs.com:7777"
proxies = {"http": proxy, "https": proxy}

response = requests.get(
    "https://target-site.com/products",
    proxies=proxies,
    timeout=15
)

Rules for effective rotation:

Use residential proxies for sites with anti-bot protection
Use sticky sessions for multi-page workflows (login → navigate → scrape)
Target specific countries/cities to match expected user geography
Never use free proxy lists — they're burned and monitored

Level 2: Request Headers and User-Agents

Mismatched or missing headers are an instant red flag.

python

import random

user_agents = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/122.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 Safari/605.1.15",
    "Mozilla/5.0 (X11; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0",
]

headers = {
    "User-Agent": random.choice(user_agents),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
}

response = requests.get(url, headers=headers, proxies=proxies)

Level 3: Rate Limiting and Human-Like Timing

python

import time
import random

def human_delay():
    """Random delay between 1-4 seconds with occasional longer pauses"""
    if random.random() < 0.1:  # 10% chance of a longer pause
        time.sleep(random.uniform(5, 15))
    else:
        time.sleep(random.uniform(1, 4))

for url in urls_to_scrape:
    response = requests.get(url, proxies=proxies, headers=headers)
    process(response)
    human_delay()

Level 4: Browser Automation with Stealth

For JavaScript-heavy sites, you need a real browser. Playwright with stealth plugins defeats most fingerprinting.

javascript

const { chromium } = require('playwright-extra');
const stealth = require('puppeteer-extra-plugin-stealth')();
chromium.use(stealth);

const browser = await chromium.launch({
  proxy: {
    server: 'http://gate.zentislabs.com:7777',
    username: 'USER',
    password: 'PASS'
  }
});

const context = await browser.newContext({
  viewport: { width: 1920, height: 1080 },
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/122.0.0.0',
  locale: 'en-US',
  timezoneId: 'America/New_York',
});

const page = await context.newPage();
await page.goto('https://target-site.com', { waitUntil: 'networkidle' });

// Simulate human behavior
await page.mouse.move(100, 200);
await page.waitForTimeout(1500);

const data = await page.evaluate(() => {
  return document.querySelector('.product-price')?.textContent;
});
await browser.close();

Level 5: Session and Cookie Management

Websites track sessions. A request without cookies after "visiting" the homepage is suspicious.

python

session = requests.Session()
session.proxies = proxies

# First visit the homepage to get cookies
session.get("https://target-site.com/", headers=headers)
time.sleep(2)

# Now requests carry the session cookies
response = session.get("https://target-site.com/api/products", headers=headers)

Level 6: CAPTCHA Handling

When you do hit a CAPTCHA, solve it and continue rather than abandoning the session.

Use ZentisLabs CAPTCHA solver API for reCAPTCHA v2/v3, hCaptcha, and Turnstile
Implement retry logic: detect CAPTCHA → solve → replay request
Reduce CAPTCHA frequency by using residential proxies and proper fingerprinting

Anti-Detection Checklist

Technique	Impact	Difficulty
Residential proxies	High	Easy
Rotate User-Agents	Medium	Easy
Rate limiting	High	Easy
Browser automation	Very High	Medium
TLS fingerprint matching	High	Hard
CAPTCHA solving	High	Medium
Behavioral simulation	High	Hard

How to Web Scrape Without Getting Blocked in 2026

Why Websites Block Scrapers

Level 1: Proxy Rotation

Level 2: Request Headers and User-Agents

Level 3: Rate Limiting and Human-Like Timing

Level 4: Browser Automation with Stealth

Level 5: Session and Cookie Management

Level 6: CAPTCHA Handling

Anti-Detection Checklist

Ready to get started?

Related Articles

How to Set Up a Rotating Proxy in Python, Node.js, and Bash (2025 Guide)

Best VPS for Web Scraping in 2025: Performance Benchmarks

Deploy Ollama on a VPS: Run LLMs Privately in 10 Minutes

How to Web Scrape Without Getting Blocked in 2026

Why Websites Block Scrapers

Level 1: Proxy Rotation

Level 2: Request Headers and User-Agents

Level 3: Rate Limiting and Human-Like Timing

Level 4: Browser Automation with Stealth

Level 5: Session and Cookie Management

Level 6: CAPTCHA Handling

Anti-Detection Checklist

Ready to get started?

Related Articles

How to Set Up a Rotating Proxy in Python, Node.js, and Bash (2025 Guide)

Best VPS for Web Scraping in 2025: Performance Benchmarks

Deploy Ollama on a VPS: Run LLMs Privately in 10 Minutes