Pipeline Architecture¶
obsidian-export converts Obsidian Markdown to PDF/DOCX through a 5-stage pipeline. Each stage is a pure function that transforms the document content.
graph LR
S1["Stage 1
Vault"] --> S2["Stage 2
Preprocess"] --> S3["Stage 3
Mermaid"] --> S3b["Stage 3b
SVG"] --> S3c["Stage 3c
Image"] --> S4["Stage 4
Pandoc"]
Stage 1: Vault Operations¶
Handles Obsidian-specific vault operations:
- Frontmatter parsing — extracts YAML frontmatter, cleans tags into keywords
- Title extraction — uses frontmatter
titleor falls back to filename stem - Embed resolution — recursively resolves
![[embed]]references with circular reference detection - Syntax stripping — converts
[[wikilinks]]to plain text, removes## Relationssections
Module: obsidian_export.pipeline.stage1_vault
Stage 2: Text Preprocessing¶
Text-level transformations on the Markdown body:
- Line ending normalization — normalizes CRLF to LF and strips trailing whitespace per line
- Variation selector stripping — removes Unicode U+FE0F (emoji variation selector) that TeX cannot render
- Dollar sign escaping —
$25/userrenders as literal text, not LaTeX math - Callout conversion —
> [!note]blocks become colored boxes (PDF) or blockquotes (DOCX) - URL handling — configurable strategies: keep, footnote long URLs, footnote all, or strip
Module: obsidian_export.pipeline.stage2_preprocess
Stage 3: Mermaid Rendering¶
Renders ```mermaid code blocks to PNG images using mermaid-cli (mmdc):
- Extracts Mermaid blocks from the Markdown
- Invokes mmdc to render each block as a PNG
- Replaces the code block with an image reference
Module: obsidian_export.pipeline.stage3_mermaid
Stage 3b: SVG Conversion¶
Converts SVG image references for format compatibility:
- Finds
image references - PDF output: converts each SVG to PDF via
rsvg-convert - DOCX output: converts each SVG to PNG via
rsvg-convert - Replaces SVG references with the converted file references
Module: obsidian_export.pipeline.stage3_svg
Stage 3c: Image Conversion¶
Converts image formats not natively supported by the target renderer to PNG using Pillow:
- PDF (tectonic/LaTeX) natively supports: PNG, JPG/JPEG, PDF
- DOCX (pandoc) natively supports: PNG, JPG/JPEG, GIF, BMP, TIFF
- Any other format (e.g., WebP, AVIF) is converted to PNG in the temporary directory
- SVG images are skipped (handled by Stage 3b)
Module: obsidian_export.pipeline.stage3_image
Stage 4: Pandoc Conversion¶
Produces the final output via pandoc:
- PDF: Uses tectonic (XeLaTeX) as the PDF engine, with a rendered LaTeX header template for styling
- DOCX: Direct pandoc conversion with GFM input format
Module: obsidian_export.pipeline.stage4_pandoc
Data Flow¶
Each stage receives the Markdown body as a string and returns a transformed string. The run() function orchestrates the stages in sequence:
from obsidian_export import run
from obsidian_export.config import load_config
config = load_config(Path("my_config.yaml"))
run(Path("input.md"), Path("output.pdf"), "pdf", config)
The pipeline uses a temporary directory for intermediate files (Mermaid PNGs, SVG-to-PDF conversions). This directory is automatically cleaned up after conversion.