PDF to Markdown

The PDF to Markdown API (included with the PDF Services API) is a cloud-based web service that automatically converts PDF documents – native or scanned – into well-formatted LLM-friendly Markdown text. This service preserves the document's structure and formatting while converting it into a format that's widely used for LLM flows, content authoring and documentation.

Structured Information Output Format

The output of a PDF to Markdown operation includes:

Output Structure

The following is a summary of key elements in the converted Markdown:

Elements

Ordered list of semantic elements converted from the PDF document, preserving the natural reading order and document structure. The conversion handles:

Content Types

The API processes various content types as follows:

Text Elements
Images and Figures
Tables

Element Types and Paths

The API recognizes and converts the following structural elements:

Category
Element Type
Description
Aside
Aside
Content that is not part of the regular content flow
Figure
Figure
Non-reflowable constructs such as graphs, images, and flowcharts
Footnote
Footnote
Footnote
Headings
H, H1, H2, etc
Heading levels
List
L, Li, Lbl, Lbody
List and list item elements
Paragraph
P, ParagraphSpan
Paragraphs and paragraph segments
Reference
Reference
Links
Section
Sect
Logical section of the document
StyleSpan
StyleSpan
Styling variations within text
Table
Table, TD, TH, TR
Table elements
Title
Title
Document title

Reading Order

The reading order in the output Markdown maintains:

Use Cases

The PDF to Markdown API is particularly valuable for:

API Limitations

For File Constraints and Processing Limits, see Licensing and Usage Limits.

Document Requirements

REST API

See our public API Reference for PDF to Markdown API.