Forums

File uploads blank

Text files, written on my computer in gedit, upload fine. Files saved from a colleague's machine, written in Windows and saved as '.txt' simply upload as blank. As in, file appears in the right directory, but it's empty.

I've disabled any messing with the file, so the only code that affects it looks like this: :::python

if request.method == 'POST':

    form = BookForm(request.POST, request.FILES)
    if form.is_valid():

        newbook = Book()
        newbook.owner = request.user
        newbook.title = request.POST['title']
        newbook.synopsis = request.POST['synopsis']
        newbook.save()

        newdoc = BookFile()
        newdoc.document = request.FILES['docfile']
        newdoc.book = newbook
        newdoc.save()

I can't see anything relevant in settings.py. Not sure why this is happening.

Okay, they're now reporting that they can save out from Word and see the result, as long as there isn't an apostrophe ' in the text. But the file comes in blank before anything tries to read it. This is madness. THIS IS PYTHON

Sorry about that.

Actually: THIS IS PROGRAMMING. Or the tedious/wonderful/rewarding task of finding where something is going wrong...

Rather thank saving the incoming request.FILES and then inspecting the result. Try directly printing a repr of the incoming data to a log file or just dump it to disk. If the data is arriving but cannot be saved properly due to some inherent weirdness in the stream then you will have to work out how to massage it into something that can be saved successfully. Writing unit tests can help shorten the test / develop cycle in a situation like this.

Perhaps the text files saved under Windows are in unicode and the apostrophe they're adding is actually an extended unicode version? If the Python code is receiving a unicode object and something somewhere is attempting to save out as ASCII then that would cause problems, although I'd expect an exception to be generated - unless some extremely naughty code somewhere is catching the exception and failing to log it.

Also, it's possible Windows is adding a BOM or something to the start of the file which may well trigger some unicode detection in Python. This post has some useful details - the rest of the post is worth reading as well if you haven't played with unicode in Python before.

That's right, @cartroo, there are a number of ways that something that claims to be a text file may not be useable without extra out-of-band information. Windows (and particularly Office-type software) can make assumptions that other software has no way of understanding.

@evilkillerfiggin, when you save from Word, I think there is an option for saving as unicode or utf-8. Another possibility is to convert the uploaded file from the Windows encoding (8859-1, if I remember correctly) to utf-8 before you put it into the database.

Have poked about with little success: eventually I want to accept .doc files without asking users to save as plaintext, so it doesn't seem worth spending much time fixing this bug when I'm only going to replace it anyway.

But thanks for everyone's help, and I now know a little more about file encoding.