Show HN: Pdf2csv – Convert PDF Tables to CSV with CLI and Python API
github.comHi Hackernews Hi HN,
I’m thrilled to share pdf2csv, a lightweight tool for converting tables from PDF files into CSV or XLSX format. It’s particularly handy for right-to-left (RTL) languages like Farsi, Hebrew, and Arabic, ensuring text is extracted correctly and easily reversed when needed.
Features: • RTL Language Support: Handles Farsi, Hebrew, and Arabic beautifully with optional text reversal. • Flexible Output: Save tables as CSV or XLSX. • Dual Interface: Use as a Python library or from the CLI. • Powered by Docling: Leveraging the robust Docling library for accurate table extraction
I have a PDF file with the font mapping for GT fonts to TRON character code; for some reason this data does not seem to be included in the fonts themself and is not available as CSV and stuff like that. (The PDF file has many pages and would take too long to enter them manually.) I had tried other ways to convert the data but it did not seem to work. (Somehow, the characters are split across several fonts, and mapped to Unicode code points which seem to have no relation to the character that is mapped there nor to the TRON character code for that character.)
Your repository says it depends on Docling but the link for Docling doesn't work.
You don't need to install docling, it install docling itself. You just need run it with uvx or install it with pip and it should work and I hope it could handle your pdf as well. I also fixed the link in readme.me