If the scheduler is using something like subprocess
then it can be a bit fiddly to deal with output from the process in a safe fashion (i.e. which doesn't risk blocking indefinitely and doesn't deadlock on lots of output). I've always thought it was disappointing that Python doesn't provide a version of communicate()
which doesn't buffer everything in memory - for example, you could specify an optional size limit and it would return early, allow the application to handle the buffered output and then call it once again.
I note that the latest version at least has a timeout, but no size limit. Why oh why oh why do the core python devs not add timeouts to every single operation in the standard library that might block when they're first written? Time and again we've had to wait for later releases to add timeout parameters to various functions. It's IO API design 101! You always provide a timeout, zero timeout always should mean "return current status, do not block at all" and a timeout of None
means "block indefinitely. It's a shame because by and large the Python library is pretty good, but networking and asynchronous IO seem to be real blind spots among the core team. Sorry, rant over.
In any case, if it's any help I wrote a ProcPoller
class which attempts to deal with multiple subprocess
invocations in a non-blocking fashion. You create an instance of it, call a method to invoke as many subprocess commands as you wish and then call poll()
to watch them all for output. By default it buffers output infinitely (not very safe), but the expectation is that you'll derive the class and handle the output in a more sensible fashion. The poll()
method has a timeout so you can perform other background tasks, like terminating jobs after a certain amount of time, without requiring the hassle of threads.