Forums

Apscheduler - scheduler keeps getting shut down

Hi,

I have an apscheduler instance in my backend Flask code that performs a task every 5 minutes. Locally it works fine. However when deploying to PythonAnywhere I frequently see the message 'Scheduler has been shut down' in my error logs. Is this kind of frequently running task allowed to be run this way on this platform? If not what other solutions would you recommend?

Also, I took a look at the web tasks page but it seems you can only run tasks hourly/daily using that, which unfortunately does not fit my requirements.

See http://help.pythonanywhere.com/pages/LongRunningTasks/

Thanks Glenn,

The article mentions tasks that run once an hour or once a day. I need to run tasks every 5 minutes. Would using the 'scheduled tasks' code allow this?

I thought Apscheduler would do that for you. There is no way to do that natively in PythonAnywhere.

Hi Glenn,

I have split off my scheduling logic from my main py file into another py file and scheduled that as a long running task in the 'tasks' page. However, upon doing that I have received the following error "ImportError: No module named apscheduler.schedulers.background". I have installed apscheduler via 'pip install apscheduler' in my venv but still can't see why this error would occur. What would you recommend?

See http://help.pythonanywhere.com/pages/VirtualEnvInScheduledTasks/

You can run APScheduler within flask. Remember there is no threading in PAW. APScheduler uses threads by default.

1) Your scheduler must be a blocking scheduler, otherwise it will use threads:

apscheduler = APScheduler(BlockingScheduler())

2) Kick off your scheduler when starting Flask by kicking off a process:

 scheduler_process = multiprocessing.Process(target=start_scheduler_internal, args=(this_app,))
scheduler_process.name = "APSchedulerStarter"
scheduler_process.start()

3) change your Executors from threadpool to processpool in your scheduler config:

SCHEDULER_EXECUTORS = {
    'default': {'type': 'processpool', 'max_workers': 2}
}

That's an interesting workaround! I'd suggest using the scheduler, though -- it's likely to be more reliable. Alternatively, the best long-term solution would be to us an always-on task, which is something we're currently beta-testing for paying customers. Essentially, it's a new thing on the "Tasks" page where you can specify a script. We'll start running the script, and if it exits for any reason -- for example, if it crashes or if we have to reboot the machine it's running on for a security patch -- it'll be automatically started up again.

Yes my understanding is the always-on is not available right now, if it is I'd like to give it a shot.

I have work queues that need to be checked from time to time, the queue gets updated and needs to be processed every quarter hour. Even doing that with the current scheduler is clumsy because it only has granularity of hour.

You seem to have reservations about this approach, is it because it falls outside of the measured "bins" for billing purposes? Not sure why it would have much impact. Whether you are doing work thru a web request from a browser, or from a task, its the same work. But you are using "web workers" for web apps not cpu?

I need to understand because APScheduler is a pretty solid package thats been around for a while and it even has a flask plugin especially for it.

I've enabled always-on tasks for your account and emailed you the details. It does sound like they might work well for your use case.

My main worry with spinning off a process from the web app code isn't really to do with the billing stuff -- it's just that our process management systems on web servers expect most processes to be WSGI processes they can control easily, with potentially the occasional process being spun off to do some work during a request. So there could well be unexpected behaviour if those systems see a process that doesn't fit into those categories -- it might get misclassified as a rogue WSGI process and get killed, or something like that. I'd also worry that if (for example) we had to move your site to another server due to hardware issues, it wouldn't be started again until it received its first hit -- so for a low-traffic site, the scheduler could be down for some time.

BTW you can schedule stuff to run every quarter-hour in the normal scheduler, though it's not quite intuitive -- just schedule the same script to run at (say) 2 past the hour, 17 past the hour, 32 past, and 47 past. (I suggest not using 0/15/30/45 because lots of people schedule tasks for "round" times, so things are likely to run slower.)

Ok sounds good, thanks!

Couldn't get the apscheduler to work. Ended up creating separate scripts to do the work and then specifying tasks pointing to the same file(s) but spaced out at different time intervals as suggested above. For what I need to be done this suffices as a solution.

Thanks for the help.

OK -- glad you got something working :-)