PDF to Markdown
Send base64-encoded PDF bytes, get clean Markdown back. Page-per-section output. Scanned PDFs are flagged.
/v1/convert/pdf-to-markdown Customer supplies the PDF as base64 in the JSON body. Each page becomes a Markdown section (## Page 1, ## Page 2, …). Image-only / scanned PDFs are detected and flagged with likely_scanned_pdf: true so the customer knows to use an OCR tool — this endpoint extracts text only, not images. Body cap is the platform-wide 1 MB, which covers ~750 KB raw PDF (most real documents).
Text extraction from any PDF whose text was originally typed/rendered (Word→PDF, LaTeX→PDF, etc.).
OCR. Scanned/image-only PDFs return very little text — we flag this and recommend an external OCR step. Multi-column layouts may interleave columns in the output (a documented PDF-parser limitation).
Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| pdf_base64 | string | yes | — | Base64-encoded PDF bytes. |
| max_pages | integer | no | 200 | Cap on pages processed. Range 1-500. |
Request
curl -X POST https://api.qcrawl.com/v1/convert/pdf-to-markdown \
-H "Authorization: Bearer osk_..." \
-d '{"pdf_base64": "JVBERi0xLjQK..."}' Response
{
"status": "success",
"markdown": "## Page 1\n\nQuarterly results...\n\n## Page 2\n\nFinancial summary...",
"page_count": 12,
"pages_processed": 12,
"byte_count": 4831,
"word_count": 712,
"likely_scanned_pdf": false,
"note": null
} Errors
| Code | Meaning |
|---|---|
| 400 | invalid base64, empty body, or PDF parse failure. |
| 413 | body exceeds 1 MB. |