Forums

Using multiprocessing.Process in webapp

Hello world! I'm trying to help out a friend with this problem that I couldn't figure out. If the following code is being run locally, Flask immediately returns the response and a few seconds later "Task done" appears in the logs. This makes sense for a quick file IO or database action, for example, that doesn't change the Flask response to the user. So we don't want to keep the user waiting. However, on PythonAnywhere, this example code blocks - rendering the idea of the background task useless. The local test also used only 1 worker and no threading. Any ideas? Is this a bug or an intended limitation of PythonAnywhere? Thanks in advance!

from flask import Flask
from multiprocessing import Process
from time import sleep

app = Flask(__name__)

def task():
    # Write some data, takes a few seconds
    sleep(5)
    print("Task done")

@app.route("/")
def hello():
    # A quick task that we don't need to wait for
    Process(target=task).start()
    # The response can be sent right away
    return "Hello World"

See https://help.pythonanywhere.com/pages/AsyncInWebApps/

Thank you very much for your reply. However, I'm sorry to say that I fail to see how this resolves the question.

A rule of thumb might be: if your work is taking more than 30 seconds, it's worth thinking about a task queue.

I agree and this is not the case here. A task queue would be too much overhead since the task will take way less than 30 seconds - but I still don't want the Flask response to wait for that long. Your linked article agrees with this, too:

Web apps are supposed to respond quickly to browser requests. The request/response cycle, at least in our model, is meant to be fast, a matter of a few seconds, or better yet a fraction of a second.

Furthermore, and maybe most importantly, I couldn't find an answer why PythonAnywhere behaves differently than our local server. This is still confusing to me. Maybe I'm reading it wrong? At the end, under "FAQ", the article even explicitly says that multiprocessing is theoretically possible, albeit not recommended.

The difference on PythonAnywhere is that, when a worker is busy, whether it is busy because of a subprocess it's running or because it's handling a request, it is not available for anything else. It it neither a bug or intended behaviour - it's just a product of the environment.

Thank you very much for replying.

If I understand you correctly, you're saying this particular webapp on PythonAnywhere runs with only 1 worker and that worker is therefore either busy serving the Flask response or busy with the subprocess and can't do both. This makes sense and it's exactly what I would expect, too.

However, that's not the case in the example above: the worker cannot possibly be busy with the subprocess because task only contains a call to time.sleep. Maybe this detail went unnoticed so far? There is nothing to do for the worker inside task and therefore, I expected it to immediately finish the Flask response instead. And that's exactly what happens on our local server, too - which is configured to use only 1 worker, as well.

I hope this does a good job at explaining what my confusion is about and I didn't simply misunderstand your reply.

It still has to be running the code that is sleeping, so it's busy.

Wow, interesting! First of all, thanks for getting back to me. So you're saying the worker is "busy" with sleeping? This obviously explains the observed behavior.

However, doesn't that mean that a PythonAnywhere worker doesn't really support multiprocessing? Simply being able to spawn new processes isn't really multiprocessing if I have no way to switch between them and take advantage of a sleep (or slow file IO or database interaction in the actual app) by continuing with the original one, is it? So it seems like processes on PythonAnywhere always strictly run in series, instead of alternating when it would be advantageous to do so? (In contrast to parallel execution which is obviously impossible with only 1 worker.)

I'm curious about the background for this design decision. It's not what I would have expected from a worker for webapps. I'm not aware of any of the established Python WSGI servers designing their workers in this way. Is this to prevent abuse of some kind? In your linked article it also says that threading is disabled. Is that a related decision you made?

Okay, I just did a quick test and it seems like the response does return. So I have to retract my previous comment partially: it's "real" multiprocessing indeed!

However, the response is still not being relayed to the client until the subprocess finishes. Do you hold back the response in your outgoing proxy (or load balancer) until the worker is "back in idle"?

It would appear that that is what is happening - probably because the server cannot tell that the response is complete before the worker is actually done running the code.

I've been investigating a little further: Flask offers a decorator flask.Response.call_on_close which can be used to run code after sending the response. (Because WSGI close() happens to release the memory of the response object, so it must have been complete and ready to send already.) Here is a minimal example:

from flask import Flask
from time import sleep

app = Flask(__name__)

@app.after_request
def f(response):
    @response.call_on_close
    def g():
        print("task")
        sleep(5)
        print("done")
    return response

@app.route("/")
def hello():
    return "Hello World"

Unfortunately, it shows exactly the same behavior as the multiprocessing example above. In this regard, your WSGI server behaves differently from established WSGI servers:

  • uWSGI

    uwsgi --workers 1 --http 127.0.0.1:8080 --master --module test:app
    
  • Gunicorn

    python3 -m gunicorn --workers 1 test:app
    
  • Waitress

    python3 -m waitress --threads=1 test:app
    

I tried those three locally and all of them send the response first, then finish task. Surely, when close() is being called, the server should be aware that the response is complete. Why doesn't it get sent to the client at this point?

We use uwsgi. I'm not sure which of the settings would cause this behaviour, though.