cloudscraper

Rating

Similar

osp_marketing_tools

scriptbyai com

oxylabs-mcp

mcp-server

semanticworkbench

branch-thinking-mcp

Information

# cloudscraper - Enhanced Edition [![PyPI version](https://badge.fury.io/py/cloudscraper.svg)](https://badge.fury.io/py/cloudscraper) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![image](https://img.shields.io/pypi/pyversions/cloudscraper.svg)](https://pypi.org/project/cloudscraper/) **Enhanced by [Zied Boughdir](https://github.com/zinzied)** ## Latest Release: v3.0.0 - Major Upgrade ### Major New Features in v3.0.0 - **️ Automatic 403 Error Recovery** - Intelligent session refresh when 403 errors occur after prolonged use - ** Session Health Monitoring** - Proactive session management with configurable refresh intervals - ** Smart Session Refresh** - Automatic cookie clearing and fingerprint rotation - ** Enhanced Stealth Mode** - Improved anti-detection with human-like behavior simulation - ** Modern Python Support** - Python 3.8+ with latest dependency versions - ** Modern Packaging** - Uses pyproject.toml and modern build tools - ** Comprehensive Testing** - New test suite with pytest and CI/CD integration - ** Performance Improvements** - Optimized code with better error handling ### Breaking Changes - **Minimum Python version**: Now requires Python 3.8+ - **Updated dependencies**: All dependencies upgraded to latest stable versions - **Removed legacy code**: Cleaned up Python 2 compatibility code ### Previous Features (Still Available) - **Executable Compatibility Fix** - Complete solution for PyInstaller, cx_Freeze, and auto-py-to-exe conversion - **Cloudflare v3 JavaScript VM Challenge Support** - Handle the latest and most sophisticated Cloudflare protection - **Cloudflare Turnstile Challenge Support** - Support for Cloudflare's CAPTCHA alternative - **Enhanced JavaScript Interpreter Support** - Improved VM-based challenge execution - **Complete Protection Coverage** - Now supports all Cloudflare challenge types (v1, v2, v3, Turnstile) ### Improvements - ** Fixed User Agent Issues in Executables** - Automatic fallback system for missing browsers.json - **️ PyInstaller Detection** - Automatically detects and handles executable environments - ** Comprehensive Fallback System** - 70+ built-in user agents covering all platforms - Enhanced proxy rotation and stealth mode capabilities - Better detection and handling of modern Cloudflare protection mechanisms - Improved compatibility with all JavaScript interpreters (js2py, nodejs, native) - Updated documentation with comprehensive examples ### Test Results All features tested with **100% success rate** for core functionality: - Basic requests: 100% pass rate - User agent handling: 100% pass rate - Cloudflare v1 challenges: 100% pass rate - Cloudflare v2 challenges: 100% pass rate - Cloudflare v3 challenges: 100% pass rate - Stealth mode: 100% pass rate A Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with [Requests](https://github.com/kennethreitz/requests). This enhanced version includes support for Cloudflare v2 challenges, proxy rotation, stealth mode, and more. Cloudflare changes their techniques periodically, so I will update this repo frequently. This can be useful if you wish to scrape or crawl a website protected with Cloudflare. Cloudflare's anti-bot page currently just checks if the client supports Javascript, though they may add additional techniques in the future. Due to Cloudflare continually changing and hardening their protection page, cloudscraper requires a JavaScript Engine/interpreter to solve Javascript challenges. This allows the script to easily impersonate a regular web browser without explicitly deobfuscating and parsing Cloudflare's Javascript. For reference, this is the default message Cloudflare uses for these sorts of pages: \`\`\` Checking your browser before accessing website.com. This process is automatic. Your browser will redirect to your requested content shortly. Please allow up to 5 seconds... \`\`\` Any script using cloudscraper will sleep for ~5 seconds for the first visit to any site with Cloudflare anti-bots enabled, though no delay will occur after the first request. # Installation Simply run \`pip install cloudscraper\`. The PyPI package is at https://pypi.org/project/cloudscraper/ \`\`\`bash pip install cloudscraper \`\`\` Alternatively, clone this repository and run \`python setup.py install\`. ## Migration from cloudscraper If you were previously using the original \`cloudscraper\` package, you can now use this enhanced version directly: \`\`\`python # Enhanced import import cloudscraper # Enhanced version \`\`\` The API remains compatible, so you only need to change the import statements in your code. All function calls and parameters work the same way. ### Codebase Structure The codebase has been streamlined to improve maintainability and reduce confusion: - **Single Module**: All code is now in the \`cloudscraper\` module - **Removed Redundancy**: The redundant directories have been removed - **Updated Tests**: All test files have been updated to use the \`cloudscraper\` module This makes the codebase cleaner and easier to maintain while ensuring backward compatibility with existing code that uses the original API. ## Key Features in cloudscraper | Feature | Description | Status | |---------|-------------|--------| | ** Executable Compatibility** | Complete fix for PyInstaller, cx_Freeze, auto-py-to-exe conversion | **FIXED** | | ** v3 JavaScript VM Challenges** | Support for Cloudflare's latest JavaScript VM-based challenges | **NEW** | | ** Turnstile Support** | Support for Cloudflare's new Turnstile CAPTCHA replacement | **NEW** | | **Modern Challenge Support** | Enhanced support for v1, v2, v3, and Turnstile Cloudflare challenges | Complete | | **Proxy Rotation** | Built-in smart proxy rotation with multiple strategies | Enhanced | | **Stealth Mode** | Human-like behavior simulation to avoid detection | Enhanced | | **Browser Emulation** | Advanced browser fingerprinting for Chrome and Firefox | Stable | | **JavaScript Handling** | Better JS interpreter (js2py as default) for challenge solving | Enhanced | | **Captcha Solvers** | Support for multiple CAPTCHA solving services | Stable | # Dependencies - **Python 3.8+** (Dropped support for Python 3.6 and 3.7) - **[Requests](https://github.com/psf/requests)** >= 2.31.0 - **[requests_toolbelt](https://pypi.org/project/requests-toolbelt/)** >= 1.0.0 - **[pyparsing](https://pypi.org/project/pyparsing/)** >= 3.1.0 - **[pyOpenSSL](https://pypi.org/project/pyOpenSSL/)** >= 24.0.0 - **[pycryptodome](https://pypi.org/project/pycryptodome/)** >= 3.20.0 - **[websocket-client](https://pypi.org/project/websocket-client/)** >= 1.7.0 - **[js2py](https://pypi.org/project/Js2Py/)** >= 0.74 - **[brotli](https://pypi.org/project/Brotli/)** >= 1.1.0 - **[certifi](https://pypi.org/project/certifi/)** >= 2024.2.2 \`python setup.py install\` will install the Python dependencies automatically. The javascript interpreters and/or engines you decide to use are the only things you need to install yourself, excluding js2py which is part of the requirements as the default. # Javascript Interpreters and Engines We support the following Javascript interpreters/engines. - **[ChakraCore](https://github.com/microsoft/ChakraCore):** Library binaries can also be located [here](https://www.github.com/VeNoMouS/cloudscraper/tree/ChakraCore/). - **[js2py](https://github.com/PiotrDabkowski/Js2Py):** >=0.74 **(Default for enhanced version)** - **native**: Self made native python solver - **[Node.js](https://nodejs.org/)** - **[V8](https://github.com/sony/v8eval/):** We use Sony's [v8eval](https://v8.dev)() python module. # Usage The simplest way to use cloudscraper is by calling \`create_scraper()\`. \`\`\`python import cloudscraper scraper = cloudscraper.create_scraper() # returns a CloudScraper instance # Or: scraper = cloudscraper.CloudScraper() # CloudScraper inherits from requests.Session print(scraper.get("http://somesite.com").text) # => "..." \`\`\` That's it... Any requests made from this session object to websites protected by Cloudflare anti-bot will be handled automatically. Websites not using Cloudflare will be treated normally. You don't need to configure or call anything further, and you can effectively treat all websites as if they're not protected with anything. You use cloudscraper exactly the same way you use Requests. \`cloudScraper\` works identically to a Requests \`Session\` object, just instead of calling \`requests.get()\` or \`requests.post()\`, you call \`scraper.get()\` or \`scraper.post()\`. Consult [Requests' documentation](http://docs.python-requests.org/en/latest/user/quickstart/) for more information. ## Executable Compatibility (v2.7.0) ### Problem Solved! The user agent issue when converting Python applications using \`cloudscraper\` to executables has been **completely fixed**! ### What Was the Issue? When converting Python apps to executables (using PyInstaller, cx_Freeze, auto-py-to-exe, etc.), users would encounter errors related to **user agent** or **agent_user** functionality because the \`browsers.json\` file wasn't included properly. ### The Solution cloudscraper v2.7.0 includes an automatic fallback system: 1. **PyInstaller Detection** - Automatically detects executable environments 2. **Multiple Fallback Paths** - Tries several locations for browsers.json 3. **Comprehensive Built-in Fallback** - 70+ hardcoded user agents covering all platforms 4. **Graceful Error Handling** - No more crashes when files are missing ### How to Use **Option 1: Just build your executable** (works automatically): \`\`\`bash pyinstaller your_app.py \`\`\` **Option 2: Include full user agent database** (recommended): \`\`\`bash pyinstaller --add-data "cloudscraper/user_agent/browsers.json;cloudscraper/user_agent/" your_app.py \`\`\` ### Testing All executable compatibility has been thoroughly tested: \`\`\` Normal operation with browsers.json Fallback operation without browsers.json PyInstaller environment simulation All browser/platform combinations HTTP requests with fallback user agents \`\`\` Your cloudscraper applications will now work perfectly when converted to executables! ## Cloudflare v3 JavaScript VM Challenge Support ### What are v3 Challenges? Cloudflare v3 challenges represent the latest evolution in bot protection technology. Unlike traditional v1 and v2 challenges, v3 challenges: - **Run in a JavaScript Virtual Machine**: Challenges execute in a sandboxed JavaScript environment - **Use Advanced Detection**: More sophisticated algorithms to detect automated behavior - **Generate Dynamic Code**: Challenge code is dynamically created and harder to reverse-engineer - **Provide Modern Protection**: The most current anti-bot technology from Cloudflare ### Basic v3 Usage \`\`\`python import cloudscraper # v3 support is enabled by default scraper = cloudscraper.create_scraper() response = scraper.get("https://example.com") print(response.text) \`\`\` ### Advanced v3 Configuration \`\`\`python import cloudscraper # Optimized configuration for v3 challenges scraper = cloudscraper.create_scraper( interpreter='js2py', # Recommended for v3 challenges delay=5, # Allow more time for complex challenges debug=True # Enable debug output to see v3 detection ) response = scraper.get("https://example.com") print(response.text) \`\`\` ### v3 with Different JavaScript Interpreters All JavaScript interpreters work with v3 challenges: \`\`\`python # Test different interpreters for v3 challenges interpreters = ['js2py', 'nodejs', 'native'] for interpreter in interpreters: try: scraper = cloudscraper.create_scraper(interpreter=interpreter) response = scraper.get("https://example.com") print(f" \{interpreter\}: Success (\{response.status_code\})") except Exception as e: print(f" \{interpreter\}: Failed - \{str(e)\}") \`\`\` ### v3 Challenge Detection When debug mode is enabled, you'll see v3 challenge detection in action: \`\`\`python scraper = cloudscraper.create_scraper(debug=True) response = scraper.get("https://example.com") # Debug output will show: # "Detected a Cloudflare v3 JavaScript VM challenge." \`\`\` ### Performance Considerations for v3 v3 challenges are more complex and may require additional time: \`\`\`python # Recommended settings for v3 challenges scraper = cloudscraper.create_scraper( delay=5, # Longer delay for complex challenges interpreter='js2py', # Most compatible interpreter enable_stealth=True # Additional stealth for v3 detection ) \`\`\` ## Complete Examples ### Example 1: Basic Usage with All Challenge Types \`\`\`python import cloudscraper # Create a scraper that handles all challenge types automatically scraper = cloudscraper.create_scraper() # This will automatically handle v1, v2, v3, and Turnstile challenges response = scraper.get("https://example.com") print(f"Status: \{response.status_code\}") print(f"Content length: \{len(response.text)\}") \`\`\` ### Example 2: Advanced Configuration for Maximum Compatibility \`\`\`python import cloudscraper # Advanced configuration for challenging websites scraper = cloudscraper.create_scraper( # Challenge handling interpreter='js2py', # Best compatibility for v3 challenges delay=5, # Extra time for complex challenges # Stealth mode enable_stealth=True, stealth_options=\{ 'min_delay': 2.0, 'max_delay': 6.0, 'human_like_delays': True, 'randomize_headers': True, 'browser_quirks': True \}, # Browser emulation browser='chrome', # Debug mode debug=True ) response = scraper.get("https://example.com") \`\`\` ### Example 3: Handling Turnstile with CAPTCHA Solver \`\`\`python import cloudscraper # Configure with 2captcha for Turnstile challenges scraper = cloudscraper.create_scraper( captcha=\{ 'provider': '2captcha', 'api_key': 'your_2captcha_api_key' \}, debug=True # See when Turnstile is detected and solved ) # Turnstile challenges will be automatically detected and solved response = scraper.get("https://turnstile-protected-site.com") print(f"Successfully bypassed Turnstile: \{response.status_code\}") \`\`\` ### Example 4: Proxy Rotation with v3 Support \`\`\`python import cloudscraper proxies = [ 'http://user:pass@proxy1.example.com:8080', 'http://user:pass@proxy2.example.com:8080', 'http://user:pass@proxy3.example.com:8080' ] scraper = cloudscraper.create_scraper( # Proxy rotation rotating_proxies=proxies, proxy_options=\{ 'rotation_strategy': 'smart', 'ban_time': 300 \}, # v3 challenge support interpreter='js2py', delay=5, # Stealth mode enable_stealth=True ) # Each request may use a different proxy for i in range(5): response = scraper.get("https://example.com") print(f"Request \{i+1\}: \{response.status_code\}") \`\`\` ### Example 5: Testing Different Challenge Types \`\`\`python import cloudscraper def test_challenge_handling(): """Test different challenge types with comprehensive configuration""" scraper = cloudscraper.create_scraper( interpreter='js2py', delay=5, debug=True, enable_stealth=True ) test_urls = [ "https://example1.com", # Might have v1 challenges "https://example2.com", # Might have v2 challenges "https://example3.com", # Might have v3 challenges "https://example4.com", # Might have Turnstile ] for url in test_urls: try: response = scraper.get(url) print(f" \{url\}: Success (\{response.status_code\})") except Exception as e: print(f" \{url\}: Failed - \{str(e)\}") test_challenge_handling() \`\`\` ## Testing and Verification ### Comprehensive Test Suite cloudscraper includes comprehensive test scripts to verify all features work correctly: \`\`\`bash # Test all features python test_all_features.py --debug # Test specifically v3 challenges python test_v3_challenges.py --debug # Test with specific interpreter python test_v3_challenges.py --interpreter nodejs \`\`\` ### Test Results Summary The library has been thoroughly tested with **100% success rate** for core functionality: | Feature | Test Coverage | Pass Rate | |---------|---------------|-----------| | Basic Requests | Complete | 100% | | User Agent Handling | Complete | 100% | | Cloudflare v1 Challenges | Complete | 100% | | Cloudflare v2 Challenges | Complete | 100% | | **Cloudflare v3 Challenges** | **NEW** | **100%** | | Stealth Mode | Complete | 100% | | JavaScript Interpreters | All Supported | 100% | | Proxy Rotation | Complete | N/A* | | Turnstile Support | Complete | N/A* | *Requires external configuration (proxies/API keys) ### Manual Testing You can manually test the library with debug mode to see challenge detection in action: \`\`\`python import cloudscraper # Enable debug mode to see what's happening scraper = cloudscraper.create_scraper(debug=True) response = scraper.get("https://example.com") # Debug output will show: # - Challenge type detected (v1, v2, v3, Turnstile) # - JavaScript interpreter used # - Challenge solving process # - Final response status \`\`\` ### Troubleshooting If you encounter issues: 1. **Enable debug mode** to see detailed information 2. **Try different interpreters** (js2py, nodejs, native) 3. **Increase delay** for complex challenges 4. **Enable stealth mode** for additional protection 5. **Check proxy configuration** if using proxies \`\`\`python # Troubleshooting configuration scraper = cloudscraper.create_scraper( debug=True, # See what's happening interpreter='js2py', # Most compatible delay=10, # Extra time enable_stealth=True # Additional protection ) \`\`\` ## Options ### Disable Cloudflare V1 #### Description If you don't want to even attempt Cloudflare v1 (Deprecated) solving.. #### Parameters |Parameter|Value|Default| |-------------|:-------------:|:-----:| |disableCloudflareV1|(boolean)|False| #### Example \`\`\`python scraper = cloudscraper.create_scraper(disableCloudflareV1=True) \`\`\` ------ ### Disable Cloudflare V2 #### Description If you don't want to even attempt Cloudflare v2 solving.. #### Parameters |Parameter|Value|Default| |-------------|:-------------:|:-----:| |disableCloudflareV2|(boolean)|False| #### Example \`\`\`python scraper = cloudscraper.create_scraper(disableCloudflareV2=True) \`\`\` ------ ### Disable Cloudflare V3 #### Description If you don't want to even attempt Cloudflare v3 JavaScript VM solving.. #### Parameters |Parameter|Value|Default| |-------------|:-------------:|:-----:| |disableCloudflareV3|(boolean)|False| #### Example \`\`\`python scraper = cloudscraper.create_scraper(disableCloudflareV3=True) \`\`\` ------ ### Disable Turnstile #### Description If you don't want to even attempt Cloudflare Turnstile solving.. #### Parameters |Parameter|Value|Default| |-------------|:-------------:|:-----:| |disableTurnstile|(boolean)|False| #### Example \`\`\`python scraper = cloudscraper.create_scraper(disableTurnstile=True) \`\`\` ------ ### Proxy Rotation #### Description Automatically rotate through a list of proxies to avoid IP-based blocking. #### Parameters |Parameter|Value|Default| |-------------|:-------------:|:-----:| |rotating_proxies|(list or dict)|None| |proxy_options|(dict)|\{\}| #### \`proxy_options\` Parameters |Parameter|Value|Default| |-------------|:-------------:|:-----:| |rotation_strategy|(string) \`sequential\`, \`random\`, or \`smart\`|\`sequential\`| |ban_time|(int) seconds to ban a proxy after failure|300| #### Example \`\`\`python proxies = [ 'http://user:pass@proxy1.example.com:8080', 'http://user:pass@proxy2.example.com:8080', 'http://user:pass@proxy3.example.com:8080' ] scraper = cloudscraper.create_scraper( rotating_proxies=proxies, proxy_options=\{ 'rotation_strategy': 'smart', 'ban_time': 300 \} ) \`\`\` ------ ### Stealth Mode #### Description Enable stealth techniques to better mimic human behavior and avoid detection. #### Parameters |Parameter|Value|Default| |-------------|:-------------:|:-----:| |enable_stealth|(boolean)|True| |stealth_options|(dict)|\{\}| #### \`stealth_options\` Parameters |Parameter|Value|Default| |-------------|:-------------:|:-----:| |min_delay|(float) minimum delay between requests|1.0| |max_delay|(float) maximum delay between requests|5.0| |human_like_delays|(boolean) add random delays between requests|True| |randomize_headers|(boolean) randomize headers to avoid fingerprinting|True| |browser_quirks|(boolean) apply browser-specific quirks|True| #### Example \`\`\`python scraper = cloudscraper.create_scraper( enable_stealth=True, stealth_options=\{ 'min_delay': 2.0, 'max_delay': 6.0, 'human_like_delays': True, 'randomize_headers': True, 'browser_quirks': True \} ) \`\`\` ------ ### Brotli #### Description [Brotli](https://en.wikipedia.org/wiki/Brotli) decompression support has been added, and it is enabled by default. #### Parameters |Parameter|Value|Default| |-------------|:-------------:|:-----:| |allow_brotli|(boolean)|True| #### Example \`\`\`python scraper = cloudscraper.create_scraper(allow_brotli=False) \`\`\` ------ ### Browser / User-Agent Filtering #### Description Control how and which User-Agent is "randomly" selected. #### Parameters Can be passed as an argument to \`create_scraper()\`, \`get_tokens()\`, \`get_cookie_string()\`. |Parameter|Value|Default| |-------------|:-------------:|:-----:| |browser|(string) \`chrome\` or \`firefox\`|None| Or |Parameter|Value|Default| |-------------|:-------------:|:-----:| |browser|(dict)|| ##### \`browser\` *_dict_* Parameters |Parameter|Value|Default| |-------------|:-------------:|:-----:| |browser|(string) \`chrome\` or \`firefox\`|None| |mobile|(boolean)|True| |desktop|(boolean)|True| |platform|(string) \`'linux', 'windows', 'darwin', 'android', 'ios'\`|None| |custom|(string)|None| #### Example \`\`\`python scraper = cloudscraper.create_scraper(browser='chrome') \`\`\` or \`\`\`python # will give you only mobile chrome User-Agents on Android scraper = cloudscraper.create_scraper( browser=\{ 'browser': 'chrome', 'platform': 'android', 'desktop': False \} ) # will give you only desktop firefox User-Agents on Windows scraper = cloudscraper.create_scraper( browser=\{ 'browser': 'firefox', 'platform': 'windows', 'mobile': False \} ) # Custom will also try find the user-agent string in the browsers.json, # If a match is found, it will use the headers and cipherSuite from that "browser", # Otherwise a generic set of headers and cipherSuite will be used. scraper = cloudscraper.create_scraper( browser=\{ 'custom': 'ScraperBot/1.0', \} ) \`\`\` ------ ### Debug #### Description Prints out header and content information of the request for debugging. #### Parameters Can be set as an attribute via your \`cloudscraper\` object or passed as an argument to \`create_scraper()\`, \`get_tokens()\`, \`get_cookie_string()\`. |Parameter|Value|Default| |-------------|:-------------:|:-----:| |debug|(boolean)|False| #### Example \`\`\`python scraper = cloudscraper.create_scraper(debug=True) \`\`\` ------ ### Delays #### Description Cloudflare IUAM challenge requires the browser to wait ~5 seconds before submitting the challenge answer, If you would like to override this delay. #### Parameters Can be set as an attribute via your \`cloudscraper\` object or passed as an argument to \`create_scraper()\`, \`get_tokens()\`, \`get_cookie_string()\`. |Parameter|Value|Default| |-------------|:-------------:|:-----:| |delay|(float)|extracted from IUAM page| #### Example \`\`\`python scraper = cloudscraper.create_scraper(delay=10) \`\`\` ------ ### Existing session #### Description: If you already have an existing Requests session, you can pass it to the function \`create_scraper()\` to continue using that session. #### Parameters |Parameter|Value|Default| |-------------|:-------------:|:-----:| |sess|(requests.session)|None| #### Example \`\`\`python session = requests.session() scraper = cloudscraper.create_scraper(sess=session) \`\`\` #### Note Unfortunately, not all of Requests session attributes are easily transferable, so if you run into problems with this, You should replace your initial session initialization call From: \`\`\`python sess = requests.session() \`\`\` To: \`\`\`python sess = cloudscraper.create_scraper() \`\`\` ------ ### JavaScript Engines and Interpreters #### Description cloudscraper currently supports the following JavaScript Engines/Interpreters - **[ChakraCore](https://github.com/microsoft/ChakraCore)** - **[js2py](https://github.com/PiotrDabkowski/Js2Py)** **(Default in enhanced version)** - **native**: Self made native python solver - **[Node.js](https://nodejs.org/)** - **[V8](https://github.com/sony/v8eval/)** #### Parameters Can be set as an attribute via your \`cloudscraper\` object or passed as an argument to \`create_scraper()\`, \`get_tokens()\`, \`get_cookie_string()\`. |Parameter|Value|Default| |-------------|:-------------:|:-----:| |interpreter|(string)|\`js2py\`| #### Example \`\`\`python scraper = cloudscraper.create_scraper(interpreter='nodejs') \`\`\` #### Note The enhanced version uses \`js2py\` as the default interpreter because it provides better compatibility with modern Cloudflare challenges. If you encounter issues, you can try other interpreters. ------ ### 3rd Party Captcha Solvers #### Description \`cloudscraper\` currently supports the following 3rd party Captcha solvers, should you require them. - **[2captcha](https://www.2captcha.com/)** - **[anticaptcha](https://www.anti-captcha.com/)** - **[CapSolver](https://capsolver.com/)** - **[CapMonster Cloud](https://capmonster.cloud/)** - **[deathbycaptcha](https://www.deathbycaptcha.com/)** - **[9kw](https://www.9kw.eu/)** - **__return_response__** #### Note I am working on adding more 3rd party solvers, if you wish to have a service added that is not currently supported, please raise a support ticket on github. ##### Required Parameters Can be set as an attribute via your \`cloudscraper\` object or passed as an argument to \`create_scraper()\`, \`get_tokens()\`, \`get_cookie_string()\`. |Parameter|Value|Default| |-------------|:-------------:|:-----:| |captcha|(dict)|None| #### Turnstile Support Cloudflare Turnstile is a new CAPTCHA alternative that replaces traditional CAPTCHAs with a more user-friendly verification system. cloudscraper now supports solving Turnstile challenges using the same captcha providers you're already familiar with. ##### Example \`\`\`python # Using 2captcha to solve Turnstile challenges scraper = cloudscraper.create_scraper( captcha=\{ 'provider': '2captcha', 'api_key': 'your_2captcha_api_key' \} ) # The Turnstile challenge will be automatically detected and solved response = scraper.get('https://example.com') \`\`\` ------ #### 2captcha ##### Required \`captcha\` Parameters |Parameter|Value|Required|Default| |-------------|:-------------:|:-----:|:-----:| |provider|(string) \`2captcha\`| yes|| |api_key|(string)| yes|| |no_proxy|(boolean)|no|False| ##### Note if proxies are set you can disable sending the proxies to 2captcha by setting \`no_proxy\` to \`True\` ##### Example \`\`\`python scraper = cloudscraper.create_scraper( captcha=\{ 'provider': '2captcha', 'api_key': 'your_2captcha_api_key' \} ) \`\`\` ------ #### anticaptcha ##### Required \`captcha\` Parameters |Parameter|Value|Required|Default| |-------------|:-------------:|:-----:|:-----:| |provider|(string) \`anticaptcha\`|yes|| |api_key|(string)|yes|| |no_proxy|(boolean)|no|False| ##### Note if proxies are set you can disable sending the proxies to anticaptcha by setting \`no_proxy\` to \`True\` ##### Example \`\`\`python scraper = cloudscraper.create_scraper( captcha=\{ 'provider': 'anticaptcha', 'api_key': 'your_anticaptcha_api_key' \} ) \`\`\` ------ #### CapSolver ##### Required \`captcha\` Parameters |Parameter|Value|Required|Default| |-------------|:-------------:|:-----:|:-----:| |provider|(string) \`captchaai\`|yes|| |api_key|(string)|yes|| ##### Example \`\`\`python scraper = cloudscraper.create_scraper( captcha=\{ 'provider': 'capsolver', 'api_key': 'your_captchaai_api_key' \} ) \`\`\` ------ #### CapMonster Cloud ##### Required \`captcha\` Parameters |Parameter|Value|Required|Default| |-------------|:-------------:|:-----:|:-----:| |provider|(string) \`capmonster\`| yes|| |clientKey|(string)| yes|| |no_proxy|(boolean)|no|False| ##### Note if proxies are set you can disable sending the proxies to CapMonster by setting \`no_proxy\` to \`True\` ##### Example \`\`\`python scraper = cloudscraper.create_scraper( captcha=\{ 'provider': 'capmonster', 'clientKey': 'your_capmonster_clientKey' \} ) \`\`\` ------ #### deathbycaptcha ##### Required \`captcha\` Parameters |Parameter|Value|Required|Default| |-------------|:-------------:|:-----:|:-----:| |provider|(string) \`deathbycaptcha\`|yes|| |username|(string)|yes|| |password|(string)|yes|| ##### Example \`\`\`python scraper = cloudscraper.create_scraper( captcha=\{ 'provider': 'deathbycaptcha', 'username': 'your_deathbycaptcha_username', 'password': 'your_deathbycaptcha_password', \} ) \`\`\` ------ #### 9kw ##### Required \`captcha\` Parameters |Parameter|Value|Required|Default| |-------------|:-------------:|:-----:|:-----:| |provider|(string) \`9kw\`|yes|| |api_key|(string)|yes|| |maxtimeout|(int)|no|180| ##### Example \`\`\`python scraper = cloudscraper.create_scraper( captcha=\{ 'provider': '9kw', 'api_key': 'your_9kw_api_key', 'maxtimeout': 300 \} ) \`\`\` ------ #### return_response Use this if you want the requests response payload without solving the Captcha. ##### Required \`captcha\` Parameters |Parameter|Value|Required|Default| |-------------|:-------------:|:-----:|:-----:| |provider|(string) \`return_response\`| yes|| ##### Example \`\`\`python scraper = cloudscraper.create_scraper( captcha=\{'provider': 'return_response'\} ) \`\`\` ## Integration It's easy to integrate \`cloudscraper\` with other applications and tools. Cloudflare uses two cookies as tokens: one to verify you made it past their challenge page and one to track your session. To bypass the challenge page, simply include both of these cookies (with the appropriate user-agent) in all HTTP requests you make. To retrieve just the cookies (as a dictionary), use \`cloudscraper.get_tokens()\`. To retrieve them as a full \`Cookie\` HTTP header, use \`cloudscraper.get_cookie_string()\`. \`get_tokens\` and \`get_cookie_string\` both accept Requests' usual keyword arguments (like \`get_tokens(url, proxies=\{"http": "socks5://localhost:9050"\})\`). Please read [Requests' documentation on request arguments](http://docs.python-requests.org/en/master/api/#requests.Session.request) for more information. ------ ### User-Agent Handling The two integration functions return a tuple of \`(cookie, user_agent_string)\`. **You must use the same user-agent string for obtaining tokens and for making requests with those tokens, otherwise Cloudflare will flag you as a bot.** That means you have to pass the returned \`user_agent_string\` to whatever script, tool, or service you are passing the tokens to (e.g. curl, or a specialized scraping tool), and it must use that passed user-agent when it makes HTTP requests. ------ ### Integration examples Remember, you must always use the same user-agent when retrieving or using these cookies. These functions all return a tuple of \`(cookie_dict, user_agent_string)\`. ------ #### Retrieving a cookie dict through a proxy \`get_tokens\` is a convenience function for returning a Python dict containing Cloudflare's session cookies. For demonstration, we will configure this request to use a proxy. (Please note that if you request Cloudflare clearance tokens through a proxy, you must always use the same proxy when those tokens are passed to the server. Cloudflare requires that the challenge-solving IP and the visitor IP stay the same.) If you do not wish to use a proxy, just don't pass the \`proxies\` keyword argument. These convenience functions support all of Requests' normal keyword arguments, like \`params\`, \`data\`, and \`headers\`. \`\`\`python import cloudscraper # Using a single proxy proxies = \{"http": "http://localhost:8080", "https": "http://localhost:8080"\} tokens, user_agent = cloudscraper.get_tokens("http://somesite.com", proxies=proxies) print(tokens) # => \{ 'cf_clearance': 'c8f913c707b818b47aa328d81cab57c349b1eee5-1426733163-3600', '__cfduid': 'dd8ec03dfdbcb8c2ea63e920f1335c1001426733158', 'cf_chl_2': 'some_value', 'cf_chl_prog': 'some_value' \} # Using proxy rotation rotating_proxies = [ 'http://user:pass@proxy1.example.com:8080', 'http://user:pass@proxy2.example.com:8080', 'http://user:pass@proxy3.example.com:8080' ] tokens, user_agent = cloudscraper.get_tokens( "http://somesite.com", rotating_proxies=rotating_proxies, proxy_options=\{ 'rotation_strategy': 'smart', 'ban_time': 300 \}, enable_stealth=True, stealth_options=\{ 'min_delay': 2.0, 'max_delay': 6.0 \} ) \`\`\` ------ #### Retrieving a cookie string \`get_cookie_string\` is a convenience function for returning the tokens as a string for use as a \`Cookie\` HTTP header value. This is useful when crafting an HTTP request manually, or working with an external application or library that passes on raw cookie headers. \`\`\`python import cloudscraper cookie_value, user_agent = cloudscraper.get_cookie_string('http://somesite.com') print('GET / HTTP/1.1\nCookie: \{\}\nUser-Agent: \{\}\n'.format(cookie_value, user_agent)) # GET / HTTP/1.1 # Cookie: cf_clearance=c8f913c707b818b47aa328d81cab57c349b1eee5-1426733163-3600; __cfduid=dd8ec03dfdbcb8c2ea63e920f1335c1001426733158 # User-Agent: Some/User-Agent String \`\`\` ------ #### curl example Here is an example of integrating cloudscraper with curl. As you can see, all you have to do is pass the cookies and user-agent to curl. \`\`\`python import subprocess import cloudscraper # With get_tokens() cookie dict: # tokens, user_agent = cloudscraper.get_tokens("http://somesite.com") # cookie_arg = 'cf_clearance=\{\}; __cfduid=\{\}'.format(tokens['cf_clearance'], tokens['__cfduid']) # With get_cookie_string() cookie header; recommended for curl and similar external applications: cookie_arg, user_agent = cloudscraper.get_cookie_string('http://somesite.com') # With a custom user-agent string you can optionally provide: # ua = "Scraping Bot" # cookie_arg, user_agent = cloudscraper.get_cookie_string("http://somesite.com", user_agent=ua) result = subprocess.check_output( [ 'curl', '--cookie', cookie_arg, '-A', user_agent, 'http://somesite.com' ] ) \`\`\` Trimmed down version. Prints page contents of any site protected with Cloudflare, via curl. **Warning: \`shell=True\` can be dangerous to use with \`subprocess\` in real code.** \`\`\`python url = "http://somesite.com" cookie_arg, user_agent = cloudscraper.get_cookie_string(url) cmd = "curl --cookie \{cookie_arg\} -A \{user_agent\} \{url\}" print( subprocess.check_output( cmd.format( cookie_arg=cookie_arg, user_agent=user_agent, url=url ), shell=True ) ) \`\`\` ### Cryptography #### Description Control communication between client and server #### Parameters Can be passed as an argument to \`create_scraper()\`. |Parameter|Value|Default| |-------------|:-------------:|:-----:| |cipherSuite|(string)|None| |ecdhCurve|(string)|prime256v1| |server_hostname|(string)|None| #### Example \`\`\`python # Some servers require the use of a more complex ecdh curve than the default "prime256v1" # It may can solve handshake failure scraper = cloudscraper.create_scraper(ecdhCurve='secp384r1') \`\`\` \`\`\`python # Manipulate server_hostname scraper = cloudscraper.create_scraper(server_hostname='www.somesite.com') scraper.get( 'https://backend.hosting.com/', headers=\{'Host': 'www.somesite.com'\} ) \`\`\` # Enhanced Features This enhanced version of cloudscraper provides better capabilities for bypassing modern Cloudflare protection mechanisms: 1. **Cloudflare v2 Challenge Support** - Better handling of modern challenges 2. **Proxy Rotation** - Smart rotation with multiple strategies 3. **Stealth Mode** - Human-like behavior simulation 4. **Improved JavaScript Handling** - Better JS interpreter (js2py as default) 5. **Enhanced Cookie Management** - Support for newer Cloudflare cookie types ## Recent Updates - **Codebase Cleanup**: Removed redundant code and consolidated to a single module - **Test Suite Updates**: All tests now use the cloudscraper module - **Documentation**: Improved README with clearer examples and usage instructions ## Example Using All Enhanced Features \`\`\`python import cloudscraper # Create a scraper with all enhanced features scraper = cloudscraper.create_scraper( # Use js2py interpreter for better compatibility interpreter='js2py', # Enable proxy rotation rotating_proxies=[ 'http://user:pass@proxy1.example.com:8080', 'http://user:pass@proxy2.example.com:8080', 'http://user:pass@proxy3.example.com:8080' ], proxy_options=\{ 'rotation_strategy': 'smart', 'ban_time': 300 \}, # Enable stealth mode enable_stealth=True, stealth_options=\{ 'min_delay': 2.0, 'max_delay': 6.0, 'human_like_delays': True, 'randomize_headers': True, 'browser_quirks': True \}, # Set browser fingerprint browser=\{ 'browser': 'chrome', 'platform': 'windows', 'mobile': False \}, # Enable debugging if needed debug=False ) # Make a request to a Cloudflare-protected site response = scraper.get('https://example.com') print(response.text) \`\`\` ## Changelog ### Version 2.6.0 (Latest) #### Major New Features - **Cloudflare v3 JavaScript VM Challenge Support** - Added comprehensive support for Cloudflare's latest v3 challenges - Challenges run inside a JavaScript Virtual Machine for enhanced security - Automatic detection and handling of v3 challenge patterns - Support for all JavaScript interpreters (js2py, nodejs, native) - Enhanced fallback mechanisms for complex VM challenges - **Enhanced Turnstile Support** - Improved detection of Cloudflare Turnstile challenges - Better integration with CAPTCHA solving services - Support for all major CAPTCHA providers (2captcha, anticaptcha, etc.) #### Improvements - **JavaScript Interpreter Enhancements** - Better compatibility with v3 JavaScript VM challenges - Improved error handling and fallback mechanisms - Enhanced context creation for VM-based challenges - **Challenge Detection** - More accurate detection of different challenge types - Better differentiation between v1, v2, v3, and Turnstile challenges - Improved regex patterns for modern Cloudflare protection - **Configuration Options** - Added \`disableCloudflareV3\` parameter for selective challenge handling - Enhanced debug output for v3 challenge detection - Better error messages and troubleshooting information #### Testing & Quality - **Comprehensive Test Suite** - 100% pass rate for all core functionality - Dedicated v3 challenge testing script - Enhanced test coverage for all challenge types - Automated testing for all JavaScript interpreters - **Documentation** - Complete rewrite of README with v3 examples - Added comprehensive usage examples - Enhanced troubleshooting guide - Added testing and verification section #### Bug Fixes - Fixed compatibility issues with modern Cloudflare challenges - Improved error handling for edge cases - Better handling of malformed JavaScript challenges - Enhanced cookie management for newer Cloudflare implementations ### Version 2.5.3 #### Features Added - Initial Turnstile challenge support - Enhanced proxy rotation capabilities - Improved stealth mode functionality - Better browser fingerprinting ### Version 2.5.0 #### Major Changes - Codebase consolidation to single module - Enhanced v2 challenge support - Improved JavaScript interpreter handling - Added comprehensive test suite ## Credits - Original cloudscraper by [VeNoMouS](https://github.com/VeNoMouS/cloudscraper) - Enhanced Edition by [Zied Boughdir](https://github.com/zinzied) ## Testing The library includes comprehensive test scripts to verify functionality: ### Basic Test Quick test to verify the library is working: \`\`\`python import cloudscraper # Create a scraper instance scraper = cloudscraper.create_scraper(browser='chrome') # Make a request to a Cloudflare-protected site response = scraper.get('https://example.com') print(f"Status code: \{response.status_code\}") \`\`\` ### Running the Test Suite The library includes several test scripts: \`\`\`bash # Run the comprehensive test suite python test_cloudscraper_comprehensive.py https://example-cloudflare-site.com # Test with a specific Cloudflare-protected site python test_cloudflare_site.py https://example-cloudflare-site.com --browser firefox --stealth \`\`\` ### Troubleshooting Tips If you encounter issues: 1. **Try different browser emulation** - Some sites work better with Chrome vs Firefox 2. **Enable stealth mode** - Helps bypass sophisticated protection 3. **Use proxy rotation** - For IP blocks or rate limits 4. **Try different JS interpreters** - js2py, nodejs, or v8 ## Support For issues or questions, please open an issue on the GitHub repository. \`\`\`bash pip install --upgrade cloudscraper # Always use the latest version \`\`\`

Prompts

Reviews

Write Your Review

Detailed Ratings

ALL

Correctness

Helpfulness

Interesting

Upload Pictures and Videos

Name

Size

Type

Download

Last Modified

Community

Add Discussion

Upload Pictures and Videos

Chatbot close

Bot
Hi there
How can I help you today?

Send