Forums

Converting Word documents to PDF not working

Hi! I run this command to convert doc and docx files to PDF:

/usr/bin/soffice --headless --convert-to pdf:writer_pdf_Export /home/facturaspfis/prueba.docx --outdir /home/facturaspfis

This works fine on my bash console (after manually adding Liberation fonts). But when the same command is run from the app, like this:

/usr/bin/soffice --headless --convert-to pdf:writer_pdf_Export /home/facturaspfis/facturacionpfis/facturacionpfis/static/uploads/fcc17166e1b302c38f41f805fb029f76617cf14a.docx --outdir /home/facturaspfis/facturacionpfis/facturacionpfis/static/uploads

I get the following error:

2023-06-09 11:45:00 LibreOffice 6.4.7.2 40(Build:2)#012#012Error in option: --convert-to pdf:writer_pdf_Export#012#012Usage: soffice [argument...]#012       argument - switches, switch parameters and document URIs (filenames).   #012#012Using without special arguments:                                               #012Opens the start center, if it is used without any arguments.                   #012   {file}              Tries to open the file (files) in the components        #012                       suitable for them.                                      #012   {file} {macro:///Library.Module.MacroName}                                  #012                       Opens the file and runs specified macros from           #012                       the file.                                               #012#012Getting help and information:                                                  #012   --help | -h | -?    Shows this help and quits.                              #012   --helpwriter        Op

(Sorry, it appears to be broken so I just copied and pasted it)

Any ideas what could I be missing?

Thanks in advance!

Did you try the first command (which works) in the app? Maybe it's an issue with the paths/files not present?

Both the paths and files are present. The input file in the app command is from a file.save done in the same function prior to the conversion and the output dir is the same os.path.join without the filename part.

What code are you using to run that in the web app?

sfilename = secure_filename(file.filename) 
random_hex = secrets.token_hex(20)
_, f_ext = os.path.splitext(sfilename)
filename = random_hex + f_ext
file.save(os.path.join(current_app.root_path, 'static/uploads', filename))
sp = subprocess.run(['/usr/bin/soffice', 
'-env:UserInstallation=file:///tmp/LibreOffice_Conversion_${USER}', 
'--headless',
'--convert-to pdf:writer_pdf_Export', 
os.path.join(current_app.root_path, "static/uploads", filename), 
'--outdir', 
os.path.join(current_app.root_path, "static/uploads")], 
capture_output=True, 
text=True)

I would expect that --convert-to line to be split into two, like this:

'--convert-to',
'pdf:writer_pdf_Export',

That was it! So stupid mistake! Thanks! I'm now dealing with some character encoding issues but the command works fine!

Now that the command is working, I'm facing another issue.

Given some simple docx file (for example: this one I foud online), if I run the command like my first post, I get the expected PDF (correct_output.pdf).

But when I run the conversion from the app (with the fiixed command), I get weird_output.pdf.

I think I fixed this issue by manually installing Liberation fonts in my account (in ~/.fonts and doing fc-cache -f -v), but why doesn't that get picked up from the app?

I'll appreciate any input.

Cheers

EDIT (to add more info):

Running fc-list : family style spacing from my bash console I can see the Liberation fonts installed. Running the same command from the app lists significantly less fonts (and no Liberation). Reloading the app didn't do the trick. Is there some import or change I need to inclue in my wsgi.py to let the app "see" the newly added fonts?

Update: it is NOT an issue with fonts o Libreoffice. For some reason, Flask 2.3.1 corrupts the uploaded docx file when doing file.save. If I use python-docx and replace that line with:

document = Document(file)
document.save(os.path.join(current_app.root_path, 'static', 'uploads', filename))

the file gets saved without data corruption and can be converted to PDF just fine like in my bash console.

Thanks for sharing the solution!