Article Extractor
Tinq.ai article extractor
This endpoint is used to extract the content of an article from a given URL. It can handle requests both with or without a proxy and supports fetching the raw HTML of the page. This service is useful for extracting structured or unstructured content from web pages, such as blog posts, articles, or news content.
Request Body Parameters:
-
url (string, required): The URL of the article or webpage to be scraped. This URL should point to the page whose content needs to be extracted.
-
Example:
"url": "https://www.boulama.com/blog/posts/the-power-of-habit:-what-i-learned-why-you-should-read-it.html"
-
proxy (optional):
-
Example with automatic proxy selection:
- Example with location-based proxy selection:
- Example with user-provided proxy details:
- raw_html (boolean, optional): If set to
true
, the raw HTML content of the webpage is returned in the response in addition to or instead of structured article data. Defaults tofalse
. - Example:
"raw_html": true
Response:
The response will contain the extracted content of the article, such as title, text, metadata, and/or raw HTML, depending on the request options.
- Status Code: 200 (OK) on success.
- Response Body:
- If
raw_html
istrue
, the response will include the full HTML of the page. - If
raw_html
isfalse
, the response will return the parsed article content (like title, author, body text) in a structured format (e.g., JSON).
Notes:
- Proxy: If no proxy is provided or
proxy: false
is passed, the request will be made directly. For restricted or geo-blocked content, using a proxy may be necessary. - Rate Limiting: Frequent scraping requests might be subject to rate limiting. Ensure your application handles such errors gracefully.