Tool under construction. Launching March 20, 2026. Code will be open sourced under AGPL license. Current version is for testing purposes only.
Modufile
PDF ToolsOCR PDF

OCR PDF

Make scanned PDFs searchable with optical character recognition

Drag and drop your files here

or click to browse. Supports PDFs, PNG, JPG, DOCX, and more.

Private.

Language

Language data (~10MB) downloads on first use and is cached locally.

How it works

  1. 1. Renders each page as an image
  2. 2. Tesseract.js recognizes text
  3. 3. Invisible text layer is added
  4. 4. Original appearance is preserved

About OCR PDF

Modufile's OCR PDF tool makes scanned PDFs searchable by adding an invisible text layer to each page. It renders every page as a high-resolution image (200 DPI) using MuPDF, runs Tesseract.js optical character recognition to detect and position text, and writes the recognized words back into the PDF as an invisible overlay using pdf-lib. The original visual appearance is completely preserved while enabling full-text search, text selection, and copy-paste. This supports over 100 languages and runs entirely in your browser — your documents are never uploaded to any server. It is ideal for making scanned contracts, archived documents, and image-based PDFs searchable.

Tech Stack

mupdfRenders each PDF page as a high-resolution image (200 DPI) for OCR input
tesseract.jsPerforms optical character recognition on rendered page images with support for 100+ languages
pdf-libWrites the invisible text layer back into the original PDF document

Frequently Asked Questions