Pipeline Architecture¶

obsidian-export converts Obsidian Markdown to PDF/DOCX through a 5-stage pipeline. Each stage is a pure function that transforms the document content.

graph LR
    S1["Stage 1
Vault"] --> S2["Stage 2
Preprocess"] --> S3["Stage 3
Mermaid"] --> S3b["Stage 3b
SVG"] --> S3c["Stage 3c
Image"] --> S4["Stage 4
Pandoc"]

Stage 1: Vault Operations¶

Handles Obsidian-specific vault operations:

Frontmatter parsing — extracts YAML frontmatter, cleans tags into keywords
Title extraction — uses frontmatter title or falls back to filename stem
Embed resolution — recursively resolves ![[embed]] references with circular reference detection
Syntax stripping — converts [[wikilinks]] to plain text, removes ## Relations sections

Module: obsidian_export.pipeline.stage1_vault

Stage 2: Text Preprocessing¶

Text-level transformations on the Markdown body:

Line ending normalization — normalizes CRLF to LF and strips trailing whitespace per line
Variation selector stripping — removes Unicode U+FE0F (emoji variation selector) that TeX cannot render
Dollar sign escaping — $25/user renders as literal text, not LaTeX math
Callout conversion — > [!note] blocks become colored boxes (PDF) or blockquotes (DOCX)
URL handling — configurable strategies: keep, footnote long URLs, footnote all, or strip

Module: obsidian_export.pipeline.stage2_preprocess

Stage 3: Mermaid Rendering¶

Renders ```mermaid code blocks to PNG images using mermaid-cli (mmdc):

Extracts Mermaid blocks from the Markdown
Invokes mmdc to render each block as a PNG
Replaces the code block with an image reference

Module: obsidian_export.pipeline.stage3_mermaid

Stage 3b: SVG Conversion¶

Converts SVG image references for format compatibility:

Finds ![](*.svg) image references
PDF output: converts each SVG to PDF via rsvg-convert
DOCX output: converts each SVG to PNG via rsvg-convert
Replaces SVG references with the converted file references

Module: obsidian_export.pipeline.stage3_svg

Stage 3c: Image Conversion¶

Converts image formats not natively supported by the target renderer to PNG using Pillow:

PDF (tectonic/LaTeX) natively supports: PNG, JPG/JPEG, PDF
DOCX (pandoc) natively supports: PNG, JPG/JPEG, GIF, BMP, TIFF
Any other format (e.g., WebP, AVIF) is converted to PNG in the temporary directory
SVG images are skipped (handled by Stage 3b)

Module: obsidian_export.pipeline.stage3_image

Stage 4: Pandoc Conversion¶

Produces the final output via pandoc:

PDF: Uses tectonic (XeLaTeX) as the PDF engine, with a rendered LaTeX header template for styling
DOCX: Direct pandoc conversion with GFM input format

Module: obsidian_export.pipeline.stage4_pandoc

Data Flow¶

Each stage receives the Markdown body as a string and returns a transformed string. The run() function orchestrates the stages in sequence:

from obsidian_export import run
from obsidian_export.config import load_config

config = load_config(Path("my_config.yaml"))
run(Path("input.md"), Path("output.pdf"), "pdf", config)

The pipeline uses a temporary directory for intermediate files (Mermaid PNGs, SVG-to-PDF conversions). This directory is automatically cleaned up after conversion.