Forums

File uploading/editing in Django

I have this snippet in one of the functions in views.py:

    # Handle file upload
if request.method == 'POST':
    form = BookForm(request.POST, request.FILES)
    if form.is_valid():
        newdoc = Book()
        request.FILES['docfile'].open('w+')
        text = request.FILES['docfile'].read()
        text = text.replace('\n', '<br>')
        request.FILES['docfile'].seek(0)
        request.FILES['docfile'].write(text)
        newdoc.document = request.FILES['docfile']

        newdoc.owner = request.user
        newdoc.title = request.POST['title']
        newdoc.save()

This works on my computer, allowing me to upload a text file, replace all the newlines with html breaks, and save it.

It works on PythonAnywhere without any complaints at all. But for some reason it forgets to do the string replace. But the file saves fine. I guess I must not have write access to the tmp uploaded file or something.

Now, I have no clear understanding of what I'm doing and I'm sure there's a cleaner, better practise way to handle files - so I welcome any advice or better examples in that area.

But most importantly, how can I get this working in a hurry?

I'm not Django expert, but judging from a skim through this page it looks like your approach of modifying the file is really not particularly safe - if the file is small and it's in-memory, Django's implementation might feasibly provide a file-like object which only supports read operations. Even if this works in some places, there's a risk that it's not portable (as you've probably just discovered!).

Also, you're reading the whole file into memory, which is definitely not a safe thing to do - if the file ends up being large, you'll cause yourself all sorts of problems. Even if you don't expect the file to be large you really should handle it, because somebody will upload an invalid file at some point (either accidentally or maliciously). You should handle the file in chunks via the chunks() method.

The safer approach is probably something like:

import tempfile
# ...
if request.method == 'POST':
    form = BookForm(request.POST, request.FILES)
    if form.is_valid():
        newdoc = Book()
        with tempfile.TemporaryFile() as tmp:
            for chunk in request.FILES['docfile'].chunks():
                tmp.write(chunk.replace('\n', '<br>'))
            tmp.seek(0)
            newdoc.document = tmp
            newdoc.owner = request.user
            newdoc.title = request.POST['title']
            newdoc.save()

However, if you can change Book so the API accepts an iterator of file data instead of an actual file-like object then you can easily just write a filter like this:

def replace_newline(in_data):
    for item in in_data:
        yield item.replace("\n", "<br>")

And then you can use it like this:

if request.method == 'POST':
    form = BookForm(request.POST, request.FILES)
    if form.is_valid():
        newdoc = Book()
        newdoc.document_iter = replace_newline(request.FILES['docfile'].chunks())
        newdoc.owner = request.user
        newdoc.title = request.POST['title']
        newdoc.save()

This will be significantly faster because it only processes the file data once - as the Book code requests chunks they're converted on-demand from the source file.

Of course, all this assumes that Book.save() makes its own copy of all the data before it exits - if not, you've got bigger problems because once the request is ended you really can't rely on the file still existing.

Thanks for your help, Cartroo. I went with:

        newdoc = Book()
        newdoc.document = request.FILES['docfile']

        newdoc.save()

        #save file as html

        text = ''
        for chunk in newdoc.document.chunks():
            text += chunk.replace('\n','<br>')
        f = open( newdoc.document.path, 'r+' )
        f.seek(0)
        f.write(text)
        f.close

because I wanted the file to end up being saved in the same location as I'd specified in my models.py, and I wasn't sure how else to do it. I'm assuming this method is safe - although I don't know where to specify the max uploaded file size.

EDIT: My previous reply was done in a bit of a rush as I had a busy few days - I've updated it having had a little closer look at the code.

I don't think that approach will necessarily work, because it depends on Django writing its uploaded files in a way which is persistent. For example, on Unix systems it's entirely possible to create a temporary file by opening it and then immediately deleting it, but holding the filehandle open - this keeps the data on disk while the handle is open, but as soon as it's closed then the data becomes inaccessible.

Now it might be that Django writes its files in a way that this works, but personally I think it's a bad idea to rely on that remaining the case even if it's true now. The problem is I'm not sure what the behaviour of your Book class is with the file you give it - if it takes its own copy into a new file then you might be alright.

Also, you mentioned the maximum file size - the webserver (nginx currently) will impose its own limit on uploaded file size. This limit was recently raised to 100MB, which is probably large enough for most normal purposes. However, you can easily impose your own limit should you want to - I would suggest this as long as it's safe (i.e. you're confident you shouldn't get large files).

ddd