Edit in GitHubLog an issue

PDF to Markdown

The PDF to Markdown API (included with the PDF Services API) is a cloud-based web service that automatically converts PDF documents – native or scanned – into well-formatted LLM-friendly Markdown text. This service preserves the document's structure and formatting while converting it into a format that's widely used for LLM flows, content authoring and documentation.

Structured Information Output Format

The output of a PDF to Markdown operation includes:

  • A primary .md file containing the converted Markdown content

Output Structure

The following is a summary of key elements in the converted Markdown:

Elements

Ordered list of semantic elements converted from the PDF document, preserving the natural reading order and document structure. The conversion handles:

  • Text content with proper Markdown syntax
  • Document hierarchy and structure
  • Inline formatting and emphasis
  • Links and references
  • Images and figures
  • Tables and complex layouts

Content Types

The API processes various content types as follows:

Text Elements
  • Headings: Converted to appropriate Markdown heading levels (H1-H6)
  • Paragraphs: Preserved with proper spacing and formatting
  • Lists: Both ordered and unordered lists with proper nesting
  • Text Emphasis: Bold, italic, and other text formatting
  • Links: Preserved with proper Markdown link syntax
Images and Figures
  • Provided as base64-embedded images in the Markdown output
  • Referenced correctly in the Markdown output
  • Original quality preserved
  • Proper alt-text and captions maintained
Tables
  • Converted to Markdown table syntax
  • Column alignment preserved
  • Cell content formatting maintained
  • Complex table structures supported

Element Types and Paths

The API recognizes and converts the following structural elements:

CategoryElement TypeDescription
Aside
Aside
Content that is not part of the regular content flow
Figure
Figure
Non-reflowable constructs such as graphs, images, and flowcharts
Footnote
Footnote
Footnote
Headings
H, H1, H2, etc
Heading levels
List
L, Li, Lbl, Lbody
List and list item elements
Paragraph
P, ParagraphSpan
Paragraphs and paragraph segments
Reference
Reference
Links
Section
Sect
Logical section of the document
StyleSpan
StyleSpan
Styling variations within text
Table
Table, TD, TH, TR
Table elements
Title
Title
Document title

Reading Order

The reading order in the output Markdown maintains:

  • Natural document flow
  • Proper content hierarchy
  • Column-based layouts
  • Page transitions
  • Inline elements and references

Use Cases

The PDF to Markdown API is particularly valuable for:

  • LLM and RAG ingestion: Convert PDFs to Markdown for chunking, embeddings, and retrieval-augmented generation (RAG).
  • Prompt and context packaging: Produce Markdown that is easy to paste, structure, and cite in prompts and agent workflows.
  • Training data preparation: Create LLM fine-tuning datasets from PDF content after review, cleanup, and labeling.
  • Doc-as-code workflows: Bring PDF content into Git-based review, versioning, diffing, and static-site generators.
  • Knowledge base publishing: Migrate PDFs into documentation platforms and internal wikis as clean, editable Markdown.
  • Legacy and archive modernization: Convert historical PDFs so they become searchable, editable, and maintainable.
  • Automated document processing: Standardize PDF-to-text conversion inside ETL and document-processing pipelines.
  • Enterprise search and indexing: Feed converted Markdown into internal search systems and knowledge repositories.
  • Compliance and audit readiness: Make PDF policies, SOPs, and manuals searchable and easier to review for changes.
  • Content QA and change tracking: Compare converted Markdown across document versions to detect updates and regressions.
  • Analytics and classification: Use Markdown output for topic modeling, tagging, deduplication, and routing workflows.
  • Localization workflows: Convert to Markdown as a starting point for translation and multi-language documentation.

API Limitations

For File Constraints and Processing Limits, see Licensing and Usage Limits.

Document Requirements

  • Files must be unprotected or allow content copying
  • No support for:
    • Hidden objects (JavaScript, OCG)
    • XFA and fillable forms
    • Complex annotations
    • CAD drawings or vector art
    • Password-protected content

REST API

See our public API Reference for PDF to Markdown API.

  • Privacy
  • Terms of Use
  • Do not sell or share my personal information
  • AdChoices
Copyright © 2026 Adobe. All rights reserved.