Forums

Process pid not found?

Not sure where I'm going wrong here (using 3.3):

    pid = os.getpid()
    p = psutil.Process(pid)

Works OK in WinXP, but fails with 'no process found with pid 1351' (an example value) on PA.

Ta, Jim

Traceback (most recent call last):
File "/usr/local/lib/python3.3/dist-packages/psutil/_pslinux.py", line 430, in wrapper
return fun(self, *args, **kwargs)
File "/usr/local/lib/python3.3/dist-packages/psutil/_pslinux.py", line 560, in get_process_create_time
f = open("/proc/%s/stat" % self.pid)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/1509/stat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.3/dist-packages/psutil/__init__.py", line 158, in __init__
self.create_time
File "/usr/local/lib/python3.3/dist-packages/psutil/_common.py", line 80, in __get__
ret = self.func(instance)
File "/usr/local/lib/python3.3/dist-packages/psutil/__init__.py", line 378, in create_time
return self._platform_impl.get_process_create_time()
File "/usr/local/lib/python3.3/dist-packages/psutil/_pslinux.py", line 437, in wrapper
raise NoSuchProcess(self.pid, self._process_name)
psutil._error.NoSuchProcess: process no longer exists (pid=1509)

I don't think that's going to work on PythonAnywhere, at least as the system is right now.

Your processes can actually be running on any one of quite a large cluster of machines, so normal Linux process management doesn't work for almost any normal case. Of course, in your specific case, because you're trying to get the details of the current process, it is guaranteed to be running on the same machine as itself, but we've been avoiding adding support for the process module just because we think it would be a bad idea to provide something that sometimes worked but normally didn't.

Perhaps there's some other way to get the information you need -- what are you trying to get the psutil.Process object for?

Many thanks Giles.

I'm pretty sure there's an easier way to do what I want anyway - I just want to check in a Scheduled job whether or not a program is already running (and restart it if it isn't).

So I was going to look at other processes and see if there was a 'python3.3' running the same file.

What's a better PA pattern for this please?

Jim

I assume the program you want to be running is one that you normally run from a console?

The best thing to do is probably to convert it to be run by a scheduled task, which runs once an hour or once a day, and either (re)launches the job if it's not running, or just quits if it is already running.

Then you just need some way of checking whether the process is running. We often use a socket for these cases. Your process opens the socket while it runs, and if it ever exits, it will release it automatically. Then it's easy to check on, with code like this:

import logging
import socket
import sys
from my_module import my_long_running_process

lock_socket = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
try:
    lock_id = "my-username.my-task-name"   # this should be unique. using your username as a prefix is a convention
    lock_socket.bind('\0' + lock_id)
    logging.debug("Acquired lock %r" % (lock_id,))
except socket.error:
    # socket already locked, task must already be running
    logging.info("Failed to acquire lock %r" % (lock_id,))
    sys.exit()

my_long_running_process()

I've just tried to write this up as a wiki help topic on long-running tasks. Comments welcomed?

Looks good! A few comments, mainly to try and anticipate questions people might have reading it...

It might be helpful to clarify that you mean a Unix domain socket instead of just saying "socket", as anybody who's done Internet sockets only may assume that's what you mean and then be terribly confused that you're not specifying an IP address and port number. May also be worth a brief mention that the nul-character prefix on the socket name is a Linux-specific extension, just for anybody who's encountered Unix domain sockets on other systems but isn't aware of the Linux-specific abstract namespace.

I guess it might also be worth clarifying the situation re the machine that scheduled tasks run on. For example, if a scheduled task ends up running on a different machine then the socket won't be accessible (unless there's some underlying magic which makes them available on other machines?). So, let's say scheduled tasks for a given user are running on host A. Each time it sees the socket is still active and skips starting the process. Great so far. Now something happens which migrates scheduled tasks for that user to host B. The next time it runs it sees the socket is no longer open and hence starts a new instance of the process. Can we be sure that the other machine doesn't still have an instance of the process running?

If scheduled jobs are only migrated just prior to a machine being fully shut down (including terminating all user processes) then this is probably not a big deal. If that's not a certainty, however, it may be worth noting in the page because some users may have tasks which will cause corruption if run concurrently, so they should probably be made aware if they need to implement some sort of stronger locking (probably involving the filesystem).

As a point of interest, I was idly considering the other day if it's possible to do proper locking without assumptions about OS-level lock primitives and the like and I was trying to dredge up memory of distributed mutex algorithms from University. I had some thoughts on it, but I wouldn't want to pollute this thread with them - I'll write a blog post about it if I get the time, and people can rip it apart! (^_^)

Many thanks both - food for thought!

@harry Yes, currently I start it from a console.

I guess the original problem is in the general area of IPC (Inter-Process Communication), so there are many other ways e.g. using disk files or OS features like shared memory, semaphores etc. But in the PA multi-processor situation mentioned by both giles and Cartroo, I wonder how many options are still available?

Your problem is that you never know what server any given piece of code is going to run on, so you can't be sure that any two processes are on the same machine.

You have two possible tools to support synchronisation, the filesystem (your /home and /tmp are shared via NFS) and a database. Of the two, the database is probably the most reliable, since NFS can be a bit iffey about consistency, unlike MySQL whose bread and butter is the whole ACID thing. Sqlite, of course, would be on NFS so it ends up with the same disadvantages.

MySQL would work well, although you have to do a little work to check the liveness of the process to coe with the fact that its terminated ungracefully - this could happen due to a bug, a crash or hardware failure.

If you're going to go the MySQL route (and I agree this does make certain aspects easier, like atomicity) then you'd either need to do something fancy by listing active connections (which I wouldn't suggest) or do something like periodically update the database to indicate liveness - I'll describe an idea for such a procedure below.

When a proces starts up it creates a row in a "processes" table with an AUTO_INCREMENT column to assign it a unique ID. Every N seconds it executes a DROP statement for all rows with an ID higher than its own and also checks whether its own row still exists - if it ever finds its own row has been removed, it exits immediately.

When the process first starts, it needs to check whether any existing process is still running. This involves a delay so I suggest still forking away from th scheduled task itself before doing this. The process creates its own row and then waits 2N seconds. If it finds its own row has been deleted then it can be certain tha there's an actively running process with a lower ID, so it can exit.

If it finds its own row still extant then any process with a lower ID must be dead so DROP any such row and then go into active running. You may wish to track the "waiting" vs. "running" state with an additionl enumeration column for diagnostic purposes, and I would also suggest additional columns for the hostname and PID. Once it's running it proceeds as above, removing rows created by competing processes trying to detect if it's still running. If the process ever crashes or hangs, it will no longer remove competing rows and hence eventually be removed itself.

Instead of removing rows you could intead set their state to "dead" and that would allow you to manually check for tasks hanging around and terminate them appropriately.

That might seem a complex approach for a simple problem, but I can't think of anything much simpler which doesn't assume running on the same host and doesn't risk multiple instances running concurrently.

@harry @Cartroo Many thanks both - great ecosystem here on PA!

I was hoping to use a simpler approach in MySQL with LOCK TABLES, e.g. http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html - do you think this may be feasible? The lock(s) are released when the process / MySQL session terminates, either normally or abnormally (or of course on UNLOCK TABLES).

But I'm not sure if there's a non-blocking way in MySQL to test if a table is already locked, or does it always wait?

Also, can I assume that each of my processes has a separate MySQL session?

Ta, Jim

Yes, you could use locks - that's not a bad idea.

I think you'd want to use named locks rather than LOCK TABLES - see GET_LOCK() for lock acquiry with a timeout. However, I'm not sure if PA users have the relevant permissions to use them. Also be aware that names are server-wide so prefix names with your username.

EDIT:

So I just did a quick test and GET_LOCK() seems to work as expected. Remember what I said about prefixing lock names with your username, or some other string you're certain is unique. I would suggest something like username.appname.lockname, so if I had an application called feedscraper then my code might include this snippet:

def current_instance(cur):
    cur.execute("SELECT GET_LOCK('cartroo.feedscraper.running')")
    row = cur.fetchone()
    return (row and row[0])

def main(argv):
    # ...
    conn = MySQLdb.connect(...)
    cur = conn.cursor()
    if not current_instance(cur):
        return 1
    # ...

if __name__ == "__main__":
    sys.exit(main(sys.argv))

Also remember that these named locks are connection-oriented - they're automatically released when the connection closes. They can also be explicitly released prior to this with RELEASE_LOCK(), but if you want the lock held over the lifetime of your application then you shouldn't need this. So, make sure you keep your connection alive - sending MySQL pings is useful for this. If you find your connection isn't alive when you ping, make sure you re-acquire the lock when you've re-created your connection - if the lock has been taken by someone else in the small gap, that process will have to exit.

One useful non-obvious feature is that lock acquiry is idempotent, so you can safely attempt to re-acquire a lock you already hold. This is a useful way to confirm that you still hold the lock. You can also use IS_FREE_LOCK() for this purpose, though of course you still have to check the result of a subsequent GET_LOCK() to avoid a race condition, so you might as well just GET_LOCK() in the first place.

One potentially less useful non-obvious feature is that a connection can only hold a single named lock at a time - calling GET_LOCK() on a different name when an earlier lock is already held performs an implicit RELEASE_LOCK() on that earlier lock. If you're already using GET_LOCK() elsewhere in your application then you can still get an additional lock but you'll need to make sure you use a separate MySQL connection for it (since locks are connection-specific).

@Cartroo Many thanks! Sorry for the delay - 'day job' intruded...

I'll have a go at this when I can and get back to you. Has there been any talk of a PA library, e.g. including this?

Jim

You're quite welcome. I know what you mean about the day job, I don't get nearly as much time to help out here as I'd like since I changed jobs.

I'm not aware of any discussion of a "pawutils" library - I guess it might be somewhat tricky because everyone's needs differ slightly. However, there's probably value in collecting together some best practice functions and classes for convenience and making them available in the environment. At least it would give people something to base their own code on if nothing else.

I suspect the main cost would be maintenance, but perhaps if the PA admins approve of the idea but don't feel they can commit the time to maintain it themselves, they could create a public repository on Github and allow the community to share some of the burden. It's unlikely that anything in the library would be particularly commercially sensitive, given the public nature of the PA platform. If they find it easier for installation they could even stick it on PyPI, I suppose!

@PA boffins: thoughts?

That's an neat idea! We're working on putting as much as we can into the help pages at the moment, but perhaps we could create a public repo with some of the more generally useful ones in it. What else do you think would be a good candidate to add?

Thanks both! Maybe one of you could start a new forum topic (with a more appropriate name!) and invite ideas?

My immediate thought is anything that would help with the multi-process instance(s) discussed above - a bit like a psutils+ or something, perhaps using MySQL to store 'global' process info for a user. But I don't have a feel yet for the most common issues affecting other folk.

Excellent idea, I'll start a topic now.

Returning to this issue of detecting whether or not 'my program XX is already running' and the useful discussion above about different approaches to locking etc., is it possible that PAW already has the answer in its own code?

Does the PAW code have a complete picture of both Bash shell activity and Scheduled job activity for each user, that could potentially be accessed via a new API?

At its simplest, a user could then in a scheduled job say something like: only start XX if no other job is running XX.

Does the PAW code have a complete picture of both Bash shell activity and Scheduled job activity for each user,

Nope! Or at least, not explicitly, and not in a joined-up way. But we're planning on building one...

Ah, thanks harry!

Back to the MySQL approach for now, I guess...