Request Body Parameters:
- url (string, required): The URL of the article or webpage to be scraped. This URL should point to the page whose content needs to be extracted.
-
Example:
"url": "https://www.boulama.com/blog/posts/the-power-of-habit:-what-i-learned-why-you-should-read-it.html"
- proxy (optional):
- Example with automatic proxy selection:
- Example with location-based proxy selection:
- Example with user-provided proxy details:
- raw_html (boolean, optional): If set to
true
, the raw HTML content of the webpage is returned in the response in addition to or instead of structured article data. Defaults tofalse
. - Example:
"raw_html": true
Response:
The response will contain the extracted content of the article, such as title, text, metadata, and/or raw HTML, depending on the request options.- Status Code: 200 (OK) on success.
- Response Body:
- If
raw_html
istrue
, the response will include the full HTML of the page. - If
raw_html
isfalse
, the response will return the parsed article content (like title, author, body text) in a structured format (e.g., JSON).
Notes:
- Proxy: If no proxy is provided or
proxy: false
is passed, the request will be made directly. For restricted or geo-blocked content, using a proxy may be necessary. - Rate Limiting: Frequent scraping requests might be subject to rate limiting. Ensure your application handles such errors gracefully.