Docs · Conversion

PDF to Markdown

Send base64-encoded PDF bytes, get clean Markdown back. Page-per-section output. Scanned PDFs are flagged.

POST /v1/convert/pdf-to-markdown

Customer supplies the PDF as base64 in the JSON body. Each page becomes a Markdown section (## Page 1, ## Page 2, …). Image-only / scanned PDFs are detected and flagged with likely_scanned_pdf: true so the customer knows to use an OCR tool — this endpoint extracts text only, not images. Body cap is the platform-wide 1 MB, which covers ~750 KB raw PDF (most real documents).

Supported

Text extraction from any PDF whose text was originally typed/rendered (Word→PDF, LaTeX→PDF, etc.).

Not supported

OCR. Scanned/image-only PDFs return very little text — we flag this and recommend an external OCR step. Multi-column layouts may interleave columns in the output (a documented PDF-parser limitation).

Parameters

Name	Type	Required	Default	Description
pdf_base64	string	yes	—	Base64-encoded PDF bytes.
max_pages	integer	no	200	Cap on pages processed. Range 1-500.

Request

curl -X POST https://api.qcrawl.com/v1/convert/pdf-to-markdown \
  -H "Authorization: Bearer osk_..." \
  -d '{"pdf_base64": "JVBERi0xLjQK..."}'

Response

{
  "status": "success",
  "markdown": "## Page 1\n\nQuarterly results...\n\n## Page 2\n\nFinancial summary...",
  "page_count": 12,
  "pages_processed": 12,
  "byte_count": 4831,
  "word_count": 712,
  "likely_scanned_pdf": false,
  "note": null
}

Errors

Code	Meaning
400	invalid base64, empty body, or PDF parse failure.
413	body exceeds 1 MB.

POST /v1/convert/html-to-markdown

HTML to Markdown

POST /v1/convert/docx-to-markdown

Word DOCX to Markdown

PDF to Markdown

Parameters

Request

Response

Errors

Related