Hi,
I am developing a website which should be able to take PDF files and extract 8 digit numbers from them (Website link + GitHub repo).
However when trying to process this PDF file AbenteuerGarten2021.pdf with the textract package I get the following UnicodeDecodeError !
Strangely when running the exact same flask app on the development server with the exactly same PDF file mentioned above I don't get an error and it works perfectly.
I am using Python 3.8.2 on local environment (Big Sur 11.4) and 3.8.0 on pythonanywhere. Both environments use the following:
- Flask 2.0.1
- Werkzeug 2.0.1
- Textract 1.6.3
Please can anyone help me, I don't understand where the problem is coming from.
I managed to fix the problem by using the method "pdfminer" in process call. I have no idea why it works, but it does. For anyone having similar problem.