🎉 Limited time — 20% off all plans. View pricing →
Docs · Conversion

HTML to Markdown

Convert raw HTML to clean Markdown — same converter as scrape, but you supply the HTML.

POST /v1/convert/html-to-markdown

Pure HTML → CommonMark Markdown transformation. Useful for LLM/RAG pipelines that already have HTML from an upstream tool and just want Qcrawl's text cleanup, or for processing locally-stored HTML without a fetch. <script> and <style> are dropped automatically; pass strip_tags to drop more. Links and images are configurable.

Parameters

Name Type Required Default Description
html string yes The HTML to convert. Body limit is 1 MB (every endpoint).
heading_style string no ATX ATX (#), ATX_CLOSED (# ... #), or UNDERLINED (===).
strip_tags array no Additional tag names to drop entirely. script and style are always dropped.
include_links boolean no true Keep <a> as Markdown links. Set false to drop hrefs.
include_images boolean no false Keep <img> as Markdown images. Off by default — most LLM ingestion pipelines drop them.

Request

curl -X POST https://api.qcrawl.com/v1/convert/html-to-markdown \
  -H "Authorization: Bearer osk_..." \
  -d '{"html": "<article><h1>Hello</h1><p>This is <a href=\"/x\">linked</a> text.</p><script>alert(1)</script></article>"}'

Response

{
  "status": "success",
  "markdown": "# Hello\n\nThis is [linked](/x) text.",
  "byte_count": 36,
  "word_count": 5
}

Errors

Code Meaning
400 html missing/empty, or invalid heading_style.
413 html exceeds 1 MB body limit.

Related