🕷️ Advanced Website Crawler

Extract data from any website with our powerful, intelligent crawler. Custom selectors, real-time analysis, and comprehensive data extraction.

99.9% Uptime
30s Max Timeout
Requests

Powerful Features

Everything you need to extract data from websites efficiently and reliably

Custom CSS Selectors

Target specific elements with precise CSS selectors. Extract exactly the data you need from any webpage structure.

Lightning Fast

Optimized crawling engine with intelligent caching and parallel processing for maximum speed and efficiency.

Robust & Reliable

Handle dynamic content, JavaScript rendering, and anti-bot measures with advanced bypass techniques.

Rich Analytics

Comprehensive metrics including load times, response codes, content analysis, and performance insights.

Multi-Format Export

Export data in JSON, CSV, XML formats. Download or integrate directly with your applications via API.

Advanced Configuration

Custom headers, user agents, timeouts, redirects, and proxy support for maximum flexibility.

How to Use

Get started in just a few simple steps

1

Enter URL

Paste the website URL you want to crawl in the input field

2

Configure Options

Choose extraction options and add custom CSS selectors if needed

3

Start Crawling

Click the crawl button and watch the magic happen in real-time

4

Export Data

Download your extracted data in multiple formats or use our API

Try the Crawler

Test our powerful crawler with your own URLs

🕷️ Website Crawler Interface

Extract data from any website with advanced options

🚀 Quick Test Examples

🎯 Custom CSS Selectors

Add custom CSS selectors to extract specific data from the webpage:

⚠️ Important: To use only custom selectors, go to the Basic Crawling tab and uncheck all default extraction options except the "Parse HTML Stucture".
ℹ️

📚 Preset Selectors:

Open Source & Community Driven

Love our API? Star the project on GitHub, contribute to its development, report issues, or explore the source code. Join our growing community of developers!

Fork the Repository
Report Issues
Contribute to the Codebase

Developer Examples

Integrate our crawler API into your applications

PHP Request PHP
$api_url = 'https://allwebcrawler.pro/api/crawler/crawl';

$data = [
    'url' => 'https://example.com',
    'selectors' => [
        'title' => 'h1',
        'description' => '.description'
    ],
    'extractHeadings' => true,
    'extractLinks' => true
];

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $api_url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_HTTPHEADER, [
    'Content-Type: application/json'
]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);
$result = json_decode($response, true);

curl_close($ch);
JavaScript Fetch JavaScript
const crawlData = async (url, selectors = {}) => {
    const response = await fetch('/api/crawler/crawl', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({
            url: url,
            selectors: selectors,
            extractHeadings: true,
            extractLinks: true,
            extractImages: true,
            timeout: 30000
        })
    });
    
    const data = await response.json();
    return data;
};

// Usage
crawlData('https://example.com', {
    title: 'h1',
    price: '.price'
}).then(result => {
    console.log(result);
});
Python Requests Python
import requests
import json

def crawl_website(url, selectors=None):
    api_url = "https://allwebcrawler.pro/api/crawler/crawl"
    
    payload = {
        "url": url,
        "selectors": selectors or {},
        "extractHeadings": True,
        "extractLinks": True,
        "extractImages": True,
        "timeout": 30000
    }
    
    headers = {
        "Content-Type": "application/json"
    }
    
    response = requests.post(
        api_url, 
        data=json.dumps(payload), 
        headers=headers
    )
    
    return response.json()

# Usage
result = crawl_website(
    "https://example.com",
    {"title": "h1", "description": ".description"}
)
print(json.dumps(result, indent=2))
cURL Command Bash
curl -X POST https://allwebcrawler.pro/api/crawler/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "selectors": {
      "title": "h1",
      "description": ".description",
      "price": ".price"
    },
    "extractHeadings": true,
    "extractLinks": true,
    "extractImages": true,
    "timeout": 30000
  }'

Frequently Asked Questions

Everything you need to know about our crawler

You can crawl virtually any public website including e-commerce sites, blogs, news websites, documentation pages, and more. Our crawler handles both static and dynamic content, including JavaScript-rendered pages.
Custom CSS selectors allow you to target specific elements on a webpage. For example, 'h1' selects all H1 headings, '.price' selects elements with the class 'price', and '#description' selects the element with ID 'description'. You can use any valid CSS selector.
Currently, there are no strict rate limits for testing purposes. However, we recommend reasonable usage to ensure optimal performance. For production use, please contact us for enterprise plans with guaranteed SLA.
For security reasons, we don't support crawling password-protected or authenticated pages. The crawler works with publicly accessible content only. For special use cases, custom authentication can be discussed for enterprise clients.
The crawler returns data in JSON format by default. You can easily convert this to CSV, XML, or any other format using our client libraries or by processing the JSON response in your preferred programming language.
Crawling time depends on the website size and complexity. Most pages are processed within 5-15 seconds. The maximum timeout is set to 30 seconds by default, but you can configure this in the advanced options.