AllWebCrawler API - Advanced Website Crawler API

Powerful Features

Everything you need to extract data from websites efficiently and reliably

Custom CSS Selectors

Target specific elements with precise CSS selectors. Extract exactly the data you need from any webpage structure.

Lightning Fast

Optimized crawling engine with intelligent caching and parallel processing for maximum speed and efficiency.

Robust & Reliable

Handle dynamic content, JavaScript rendering, and anti-bot measures with advanced bypass techniques.

Rich Analytics

Comprehensive metrics including load times, response codes, content analysis, and performance insights.

Multi-Format Export

Export data in JSON, CSV, XML formats. Download or integrate directly with your applications via API.

Advanced Configuration

Custom headers, user agents, timeouts, redirects, and proxy support for maximum flexibility.

How to Use

Get started in just a few simple steps

1

Enter URL

Paste the website URL you want to crawl in the input field

2

Configure Options

Choose extraction options and add custom CSS selectors if needed

3

Start Crawling

Click the crawl button and watch the magic happen in real-time

4

Export Data

Download your extracted data in multiple formats or use our API

Try the Crawler

Test our powerful crawler with your own URLs

🕷️ Website Crawler Interface

Extract data from any website with advanced options

🌐 Website URL:

📋 Quick Examples:

⏱️ Timeout (ms):

📝 Extract Headings (H1-H6)

🔗 Extract All Links

🖼️ Extract Images

📄 Extract Full Text Content

📊 Extract Page Metadata

🤖 Respect Robots.txt

📄 Include Raw HTML Content

🔧 Parse HTML Structure

🚀 Quick Test Examples

🕵️ Custom User Agent:

🔄 Max Redirects:

📡 Custom Headers (JSON format):

🎯 Custom CSS Selectors

Add custom CSS selectors to extract specific data from the webpage:

⚠️ Important: To use only custom selectors, go to the Basic Crawling tab and uncheck all default extraction options except the "Parse HTML Stucture".

Group as Related Selectors

📚 Preset Selectors:

Open Source & Community Driven

Love our API? Star the project on GitHub, contribute to its development, report issues, or explore the source code. Join our growing community of developers!

View on GitHub Star the Project

Fork the Repository

Report Issues

Contribute to the Codebase

Developer Examples

Integrate our crawler API into your applications

                        PHP Request
                        PHP
                    

                        $api_url = 'https://allwebcrawler.pro/api/crawler/crawl';

$data = [
    'url' => 'https://example.com',
    'selectors' => [
        'title' => 'h1',
        'description' => '.description'
    ],
    'extractHeadings' => true,
    'extractLinks' => true
];

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $api_url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_HTTPHEADER, [
    'Content-Type: application/json'
]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);
$result = json_decode($response, true);

curl_close($ch);

                    

                        JavaScript Fetch
                        JavaScript
                    

                        const crawlData = async (url, selectors = {}) => {
    const response = await fetch('/api/crawler/crawl', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({
            url: url,
            selectors: selectors,
            extractHeadings: true,
            extractLinks: true,
            extractImages: true,
            timeout: 30000
        })
    });
    
    const data = await response.json();
    return data;
};

// Usage
crawlData('https://example.com', {
    title: 'h1',
    price: '.price'
}).then(result => {
    console.log(result);
});

                    

                        Python Requests
                        Python
                    

                        import requests
import json

def crawl_website(url, selectors=None):
    api_url = "https://allwebcrawler.pro/api/crawler/crawl"
    
    payload = {
        "url": url,
        "selectors": selectors or {},
        "extractHeadings": True,
        "extractLinks": True,
        "extractImages": True,
        "timeout": 30000
    }
    
    headers = {
        "Content-Type": "application/json"
    }
    
    response = requests.post(
        api_url, 
        data=json.dumps(payload), 
        headers=headers
    )
    
    return response.json()

# Usage
result = crawl_website(
    "https://example.com",
    {"title": "h1", "description": ".description"}
)
print(json.dumps(result, indent=2))

                    

                        cURL Command
                        Bash
                    

                    curl -X POST https://allwebcrawler.pro/api/crawler/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "selectors": {
      "title": "h1",
      "description": ".description",
      "price": ".price"
    },
    "extractHeadings": true,
    "extractLinks": true,
    "extractImages": true,
    "timeout": 30000
  }'

                    

Frequently Asked Questions

Everything you need to know about our crawler

You can crawl virtually any public website including e-commerce sites, blogs, news websites, documentation pages, and more. Our crawler handles both static and dynamic content, including JavaScript-rendered pages.

Custom CSS selectors allow you to target specific elements on a webpage. For example, 'h1' selects all H1 headings, '.price' selects elements with the class 'price', and '#description' selects the element with ID 'description'. You can use any valid CSS selector.

Currently, there are no strict rate limits for testing purposes. However, we recommend reasonable usage to ensure optimal performance. For production use, please contact us for enterprise plans with guaranteed SLA.

For security reasons, we don't support crawling password-protected or authenticated pages. The crawler works with publicly accessible content only. For special use cases, custom authentication can be discussed for enterprise clients.

The crawler returns data in JSON format by default. You can easily convert this to CSV, XML, or any other format using our client libraries or by processing the JSON response in your preferred programming language.

Crawling time depends on the website size and complexity. Most pages are processed within 5-15 seconds. The maximum timeout is set to 30 seconds by default, but you can configure this in the advanced options.

🕷️ Advanced Website Crawler