Forums

When manually run, my script takes a couple seconds, but the hourly task logs show thousands of seconds and places me in the tarpit

Hi, I wrote a script which scrapes web data. But only does so if that data isn't found in a file already.

When I run it in the console manually, it immediately finds all the data and completes in just a couple seconds. However, I have this script running every hour (59 minutes past), and the task logs show that it takes thousands of seconds most of the time and is killed. It is usually killed, I imagine, because it took longer than an hour. However, at times, it will show that it took a couple hundred seconds and then is still killed.

This places my task in the tarpit after a single execution and every task is killed throughout the day.

I know my code isn't very efficient, but all I do is read a json file and check if data exists. 99% of the time, it does already exist, so the inefficient parts don't even execute.

2019-04-10 08:25:39 -- Completed task, took 5189.00 seconds, return code was 0.

2019-04-10 08:58:46 -- Completed task, took 3578.00 seconds, return code was 0.

/bin/bash: line 1: 32229 Killed python /bin/run_scheduled_task.py /home/brettcomardelle93/mysite/op-bot.py /bin/bash: line 1: 30565 Killed python /bin/run_scheduled_task.py /home/brettcomardelle93/mysite/op-bot.py /bin/bash: line 1: 28926 Killed python /bin/run_scheduled_task.py /home/brettcomardelle93/mysite/op-bot.py

2019-04-10 11:10:47 -- Completed task, took 7899.00 seconds, return code was 137.

2019-04-10 11:10:47 -- Completed task, took 4291.00 seconds, return code was 137.

2019-04-10 11:10:48 -- Completed task, took 693.00 seconds, return code was 137.

/bin/bash: line 1: 32506 Killed python /bin/run_scheduled_task.py /home/brettcomardelle93/mysite/op-bot.py

2019-04-10 12:00:51 -- Completed task, took 104.00 seconds, return code was 137.

/bin/bash: line 1: 3200 Killed python /bin/run_scheduled_task.py /home/brettcomardelle93/mysite/op-bot.py

2019-04-10 13:00:51 -- Completed task, took 104.00 seconds, return code was 137.

/bin/bash: line 1: 4538 Killed python /bin/run_scheduled_task.py /home/brettcomardelle93/mysite/op-bot.py

2019-04-10 14:01:05 -- Completed task, took 117.00 seconds, return code was 137.

Hmm. It's hard to be sure, but one possibility that comes to mind is that it's not finding the data when it runs as a scheduled task. If you're specifying the location of the data using a relative path (for example, "my_data_file.json") then that path will be resolved relative to the working directory of the running script.

From your logs above, it looks like your script is in the directory /home/brettcomardelle93/mysite/, so if the data file is in that directory too, and to run it from a console you do something like

cd mysite/
python3.6 op-bot.py

...then in that case it will run with its working directory set to /home/brettcomardelle93/mysite/, so the file will be found relative to that directory.

However, when running as a scheduled task, with the command /home/brettcomardelle93/mysite/op-bot.py, the working directory will be the default, which is /home/brettcomardelle93/ -- so it will look for the file relative to there, won't find it, and so it will do all of the expensive work.

If that sounds like a plausible explanation, a quick fix would be to edit the scheduled command so that is more like what happens when you run it from a console:

cd /home/brettcomardelle93/mysite/; python3.6 op-bot.py

(Of course, change the "3.6" to the version of Python that you want to use if it's different.)

The script is running, from the latest task logs, I see:

failed to get existing json objects if there were any
chapter 1 saved
chapter 2 saved
chapter 3 saved
chapter 4 saved
...
Getting images for 938
failed adding objects to json file

2019-04-10 21:55:33 -- Completed task, took 4944.00 seconds, return code was 0.

The json file is apparently not being located in this case because it fails to get the objects and fails writing them. When run from the console, it is found and the task takes seconds.

The json file is in /home/brettcomardelle93/mysite/static/chapters.json

with open('static/chapters.json', 'r') as apiFile:
        data = json.load(apiFile)

Could the file in this directory not be found properly when running as a task?

Instead of writing open('static/chapters.json', 'r'), write open('/home/brettcomardelle93/mysite/static/chapters.json', 'r').

This worked for me when my task couldn't find the file properly.

Yes, exactly :-) It looks like the problem is the one I suspected it was, and @smallbytes has posted a good solution. The one I suggested in my previous post would work too.

Thanks! I'm always happy to help on here. I love programming my website, and if you could visit it I would be pleased. Search for smallbytes ciphers python2.7 on google and find it! It will be the first result.

hello when i run my script using the run option in editor it is working fine but when i send a request to my flask app runs half way without giving an error and my ml model dosent predicts anything

I think this is because tensorflow isnt supported. Though I think people have have keras working- -https://help.pythonanywhere.com/pages/MachineLearningInWebsiteCode/