PDF to TXT Converter
Drop a PDF, get plain text. Pages are separated by '---' breaks. Useful for searching, indexing, copy-paste, and feeding PDF content into text-only tools or scripts.
Drop your PDF file here
Converts to .txt — stays on your device
Why convert PDF to TXT?
- Extracting the text of a contract, report, or whitepaper for grep / ripgrep search.
- Pulling words out of a PDF resume to feed into an ATS or text-based filter.
- Indexing the contents of a PDF library for full-text search in a notes app or script.
- Copying a paragraph from a PDF where the built-in copy doesn't preserve line breaks cleanly.
- Feeding PDF content into an LLM, NLP tool, or text-processing pipeline.
- Generating plain-text transcripts of academic papers for citation extraction or reference management.
How our converter works
Your PDF is parsed by pdfjs-dist running in a Web Worker. We walk every page, pull the text content via the PDF's text layer, and reconstruct line breaks heuristically (based on Y-coordinate jumps between text items). Pages are separated by '---' breaks. The output is a single .txt file, UTF-8 encoded. Conversion runs entirely in your browser.
Frequently asked questions
Will scanned PDFs work?
No — scanned PDFs are images, and image text needs OCR (optical character recognition), which we don't currently run client-side. For scans, use a desktop tool with OCR (Adobe Acrobat, Tesseract) or scan with OCR enabled in the original capture tool.
How are line breaks reconstructed?
PDFs don't have explicit line breaks in their text layer — text is positioned absolutely. We insert a newline whenever the Y-coordinate of consecutive text items drops by more than 2 points, which usually corresponds to a line break in the source. Tables and multi-column layouts may produce odd breaks.
What about formatting (bold, italic, colors)?
Stripped. Plain TXT output is text only — no formatting, no inline structure. For preserved structure, use the PDF to HTML converter.
What encoding is the output?
UTF-8 — handles smart quotes, em dashes, accented characters, and most non-Latin scripts. The Content-Type on the download is text/plain;charset=utf-8.
Are my files uploaded?
No. pdfjs-dist runs as JavaScript on this page. Sensitive documents — contracts, tax records, medical reports — stay on your device.
About the PDF format
PDF is the universal fixed-layout document format — preserves layout perfectly, but is not a great source format for text processing. Plain TXT is the universal lowest-common-denominator format that every editor, search tool, and processing pipeline reads. Converting PDF → TXT is the standard extraction step for anything that needs the words without the layout: full-text search, NLP processing, ATS resume parsing, citation extraction, copy-paste workflows. The conversion strips formatting and layout but preserves the text content; for digital PDFs (born-digital, not scanned), this is a high-fidelity extraction.