Web Scraping with Playwright in 2026: The Complete Guide

Playwright has become the go-to browser automation framework for production web scraping. It supports Chromium, Firefox, and WebKit with a single API, handles dynamic content natively, and has built-in proxy support. This guide covers everything from basic setup to scaling thousands of concurrent sessions with ZentisLabs proxies.

Why Playwright in 2026?

Multi-browser: Chromium, Firefox, and WebKit from one codebase
Auto-wait: Built-in intelligent waiting for elements, eliminating flaky selectors
Network interception: Block images, fonts, and trackers to save bandwidth
Stealth-ready: With plugins like playwright-extra, you can evade most bot detection
Native proxy support: Per-context proxy configuration with authentication

Basic Setup with Proxies

bash

# Install Playwright
pip install playwright
playwright install chromium

python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        proxy={
            "server": "http://gate.zentislabs.com:7777",
            "username": "USER",
            "password": "PASS",
        }
    )
    page = browser.new_page()
    page.goto("https://httpbin.org/ip")
    print(page.text_content("body"))
    browser.close()

Stealth Configuration

Modern anti-bot systems like Cloudflare, DataDome, and PerimeterX check browser fingerprints. Here's how to configure Playwright to look like a real user:

python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=False,  # Headed mode passes more checks
        proxy={"server": "http://gate.zentislabs.com:7777",
               "username": "USER", "password": "PASS"},
        args=[
            "--disable-blink-features=AutomationControlled",
            "--disable-dev-shm-usage",
        ]
    )
    context = browser.new_context(
        viewport={"width": 1920, "height": 1080},
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
        locale="en-US",
        timezone_id="America/New_York",
    )
    # Remove webdriver property
    context.add_init_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
    page = context.new_page()
    page.goto("https://target-site.com")
    print(page.title())
    browser.close()

Saving Bandwidth

Block unnecessary resources to reduce proxy bandwidth consumption by 40-70%:

python

# Block images, fonts, stylesheets, and trackers
def block_resources(route):
    blocked = ["image", "font", "stylesheet", "media"]
    if route.request.resource_type in blocked:
        route.abort()
    elif any(d in route.request.url for d in ["google-analytics", "facebook", "doubleclick"]):
        route.abort()
    else:
        route.continue_()

page.route("**/*", block_resources)
page.goto("https://target-site.com")

Scaling with Concurrency

Use async Playwright with multiple browser contexts for concurrent scraping:

python

import asyncio
from playwright.async_api import async_playwright

async def scrape_url(browser, url):
    context = await browser.new_context(
        proxy={"server": "http://gate.zentislabs.com:7777",
               "username": "USER", "password": "PASS"}
    )
    page = await context.new_page()
    await page.goto(url, wait_until="domcontentloaded")
    title = await page.title()
    await context.close()
    return {"url": url, "title": title}

async def main():
    urls = ["https://example.com/page/" + str(i) for i in range(100)]
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            proxy={"server": "http://gate.zentislabs.com:7777",
                   "username": "USER", "password": "PASS"}
        )
        # Process 10 pages at a time
        for batch in [urls[i:i+10] for i in range(0, len(urls), 10)]:
            results = await asyncio.gather(*[scrape_url(browser, u) for u in batch])
            for r in results:
                print(r)
        await browser.close()

asyncio.run(main())

Production Error Handling

python

import asyncio
from playwright.async_api import async_playwright, TimeoutError

async def scrape_with_retry(browser, url, max_retries=3):
    for attempt in range(max_retries):
        context = await browser.new_context(
            proxy={"server": "http://gate.zentislabs.com:7777",
                   "username": "USER", "password": "PASS"}
        )
        try:
            page = await context.new_page()
            response = await page.goto(url, timeout=30000)
            if response and response.status == 403:
                print(f"Blocked on attempt {attempt + 1}, retrying...")
                continue
            return await page.content()
        except TimeoutError:
            print(f"Timeout on attempt {attempt + 1}")
        finally:
            await context.close()
    return None

Running in Docker

dockerfile

FROM mcr.microsoft.com/playwright/python:v1.42.0-jammy

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

CMD ["python", "scraper.py"]

🚀 ZentisLabs residential proxies rotate IPs automatically — each new browser context gets a fresh IP. For sticky sessions (multi-page flows), append _session-xyz to your password.

Why Playwright in 2026?

Multi-browser: Chromium, Firefox, and WebKit from one codebase
Auto-wait: Built-in intelligent waiting for elements, eliminating flaky selectors
Network interception: Block images, fonts, and trackers to save bandwidth
Stealth-ready: With plugins like playwright-extra, you can evade most bot detection
Native proxy support: Per-context proxy configuration with authentication

Basic Setup with Proxies

bash

# Install Playwright
pip install playwright
playwright install chromium

python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        proxy={
            "server": "http://gate.zentislabs.com:7777",
            "username": "USER",
            "password": "PASS",
        }
    )
    page = browser.new_page()
    page.goto("https://httpbin.org/ip")
    print(page.text_content("body"))
    browser.close()

Stealth Configuration

Modern anti-bot systems like Cloudflare, DataDome, and PerimeterX check browser fingerprints. Here's how to configure Playwright to look like a real user:

python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=False,  # Headed mode passes more checks
        proxy={"server": "http://gate.zentislabs.com:7777",
               "username": "USER", "password": "PASS"},
        args=[
            "--disable-blink-features=AutomationControlled",
            "--disable-dev-shm-usage",
        ]
    )
    context = browser.new_context(
        viewport={"width": 1920, "height": 1080},
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
        locale="en-US",
        timezone_id="America/New_York",
    )
    # Remove webdriver property
    context.add_init_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
    page = context.new_page()
    page.goto("https://target-site.com")
    print(page.title())
    browser.close()

Saving Bandwidth

Block unnecessary resources to reduce proxy bandwidth consumption by 40-70%:

python

# Block images, fonts, stylesheets, and trackers
def block_resources(route):
    blocked = ["image", "font", "stylesheet", "media"]
    if route.request.resource_type in blocked:
        route.abort()
    elif any(d in route.request.url for d in ["google-analytics", "facebook", "doubleclick"]):
        route.abort()
    else:
        route.continue_()

page.route("**/*", block_resources)
page.goto("https://target-site.com")

Scaling with Concurrency

Use async Playwright with multiple browser contexts for concurrent scraping:

python

import asyncio
from playwright.async_api import async_playwright

async def scrape_url(browser, url):
    context = await browser.new_context(
        proxy={"server": "http://gate.zentislabs.com:7777",
               "username": "USER", "password": "PASS"}
    )
    page = await context.new_page()
    await page.goto(url, wait_until="domcontentloaded")
    title = await page.title()
    await context.close()
    return {"url": url, "title": title}

async def main():
    urls = ["https://example.com/page/" + str(i) for i in range(100)]
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            proxy={"server": "http://gate.zentislabs.com:7777",
                   "username": "USER", "password": "PASS"}
        )
        # Process 10 pages at a time
        for batch in [urls[i:i+10] for i in range(0, len(urls), 10)]:
            results = await asyncio.gather(*[scrape_url(browser, u) for u in batch])
            for r in results:
                print(r)
        await browser.close()

asyncio.run(main())

Production Error Handling

python

import asyncio
from playwright.async_api import async_playwright, TimeoutError

async def scrape_with_retry(browser, url, max_retries=3):
    for attempt in range(max_retries):
        context = await browser.new_context(
            proxy={"server": "http://gate.zentislabs.com:7777",
                   "username": "USER", "password": "PASS"}
        )
        try:
            page = await context.new_page()
            response = await page.goto(url, timeout=30000)
            if response and response.status == 403:
                print(f"Blocked on attempt {attempt + 1}, retrying...")
                continue
            return await page.content()
        except TimeoutError:
            print(f"Timeout on attempt {attempt + 1}")
        finally:
            await context.close()
    return None

Running in Docker

dockerfile

FROM mcr.microsoft.com/playwright/python:v1.42.0-jammy

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

CMD ["python", "scraper.py"]

🚀 ZentisLabs residential proxies rotate IPs automatically — each new browser context gets a fresh IP. For sticky sessions (multi-page flows), append _session-xyz to your password.

Web Scraping with Playwright in 2026: The Complete Guide

Why Playwright in 2026?

Basic Setup with Proxies

Stealth Configuration

Saving Bandwidth

Scaling with Concurrency

Production Error Handling

Running in Docker

Ready to get started?

Related Articles

How to Set Up a Rotating Proxy in Python, Node.js, and Bash (2025 Guide)

Best VPS for Web Scraping in 2025: Performance Benchmarks

Deploy Ollama on a VPS: Run LLMs Privately in 10 Minutes

Web Scraping with Playwright in 2026: The Complete Guide

Why Playwright in 2026?

Basic Setup with Proxies

Stealth Configuration

Saving Bandwidth

Scaling with Concurrency

Production Error Handling

Running in Docker

Ready to get started?

Related Articles

How to Set Up a Rotating Proxy in Python, Node.js, and Bash (2025 Guide)

Best VPS for Web Scraping in 2025: Performance Benchmarks

Deploy Ollama on a VPS: Run LLMs Privately in 10 Minutes