Use OCR to extract text from pdf with Tesseract, OpenCV and Python
ref:
- Nanonets | How to OCR with Tesseract, OpenCV and Python
- Zoum datascience | Youtube
- towards data science, archived pdf
- Level Up Coding, archived pdf
- datacorner.fr
- ProgrammingKnowledge | Youtube
- NeuralNine | Youtube
- Cherry's Project | Youtube
- Python Tutorials for Digital Humanities | How to OCR an Index in Python with PyTesseract (OCR in Python Tutorials 03.01)
- Python Tutorials for Digital Humanities | How to Preprocess Images for Text OCR in Python (OCR in Python Tutorials 02.02)
Idea:
- convert pdf to image
- preprocess image with OpenCV
- extract text from image using pytesseract
Related resources
Backlinks