Forums

*** HARAKIRI ON WORKER 5 (pid: 45, try: 1) ***

Hi there,

I have an issue with workers committing harakiri.

My website behaves very well 99% of the time. None of my pages takes more than 1 sec to load and the browsing is smooth. However, every once in a while one page takes very long to load. It happened at least once on all my pages, even the simplest ones. It will load for around a minute, sometimes more and then proceed with the original request that was sent.

When I look at the server.log file it displays the following Harakiri message:

2021-06-30 08:57:10 Wed Jun 30 08:57:09 2021 - *** HARAKIRI ON WORKER 5 (pid: 45, try: 1) *
2021-06-30 08:57:10 Wed Jun 30 08:57:09 2021 - HARAKIRI !!! worker 5 status !!!
2021-06-30 08:57:10 Wed Jun 30 08:57:09 2021 - HARAKIRI [core 0] 10.0.0.224 - GET 
2021-06-30 08:57:10 Wed Jun 30 08:57:09 2021 - HARAKIRI !!! end of worker 5 status !!!
2021-06-30 08:57:10 DAMN ! worker 5 (pid: 45) died, killed by signal 9 :( trying respawn ...
2021-06-30 08:57:10 Respawned uWSGI worker 5 (new pid: 48)
2021-06-30 08:57:10 spawned 2 offload threads for uWSGI worker 5

Looking at different response on this forum I saw that Harakiri happens when the server fails to respond after 5 mins.

This is weird because I have clocked Harakiri happening after much less than 5 minutes.

The good news is that the original request is most often successful in the end, therefore I was wondering if lowering the time before the Harakiris happen would solve at least the consequences of the problem I seem to have. The problem won't happen for a while after it has happened once.

If you have information about this issue it would really help me out.

Thank you very much!

What do you see in your other logs for the same period?

Hi fjl,

I was trying to reproduce the error to analyse the logs and give you more information but this problem has been fortunately rare recently.

I had the same problem in appearances only this morning. The logs are pretty empty of explanation that I can link to what I see. Error logs are empty and Server logs display what they should, only after a longer delay.

The webpage loads for a while (I clocked 4 mins) on a page that normally works very well. I asked my partner to try to access the same page and we both goth stuck with the same loading screen. Instead of being redirected on the right page however, this time we both got carried to a python anywhere error message page. This page displayed a "error 504 backend" error code.

While usually I see a Harakiri error message in the server logs the closer Harakiri I can find is a good 40 minutes before I had any problem.

I will try to update the symptoms as I encounter this problem again.

Thank you for your help.

Hi,

I had exactly the same problem again with nearly identical circumstances but since it was slightly longer to load it did trigger the Harakiri message.

Load time was around 5 min for my partner and I, we both got redirected to the python anywhere page with "error 504 backend", The pages work fine after this initial bug, we can browse the same pages smoothly now.

The error display in server.logs:

21-07-07 11:11:49 Wed Jul  7 11:11:48 2021 - *** HARAKIRI ON WORKER 7 (pid: 43, try: 1) ***
2021-07-07 11:11:49 Wed Jul  7 11:11:48 2021 - HARAKIRI !!! worker 7 status !!!
2021-07-07 11:11:49 Wed Jul  7 11:11:48 2021 - HARAKIRI [core 0] 10.0.0.224 - GET /Market/details/KFS/ since 1625670407
2021-07-07 11:11:49 Wed Jul  7 11:11:48 2021 - HARAKIRI !!! end of worker 7 status !!!
2021-07-07 11:11:49 DAMN ! worker 7 (pid: 43) died, killed by signal 9 :( trying respawn ...
2021-07-07 11:11:49 Respawned uWSGI worker 7 (new pid: 54)
2021-07-07 11:11:49 spawned 2 offload threads for uWSGI worker 7
2021-07-07 11:11:51 Wed Jul  7 11:11:50 2021 - *** HARAKIRI ON WORKER 8 (pid: 46, try: 1) ***
2021-07-07 11:11:51 Wed Jul  7 11:11:50 2021 - HARAKIRI !!! worker 8 status !!!
2021-07-07 11:11:51 Wed Jul  7 11:11:50 2021 - HARAKIRI [core 0] 10.0.0.224 - GET /Market/details/KFS/ since 1625670409
2021-07-07 11:11:51 Wed Jul  7 11:11:50 2021 - HARAKIRI !!! end of worker 8 status !!!
2021-07-07 11:11:51 SIGN UP VIEW from account.views line 227
2021-07-07 11:11:51 DAMN ! worker 8 (pid: 46) died, killed by signal 9 :( trying respawn ...
2021-07-07 11:11:51 Respawned uWSGI worker 8 (new pid: 57)
2021-07-07 11:11:51 spawned 2 offload threads for uWSGI worker 8
2021-07-07 11:11:51 announcing my loyalty to the Emperor...
2021-07-07 11:12:06 Wed Jul  7 11:12:05 2021 - *** HARAKIRI ON WORKER 5 (pid: 42, try: 1) ***
2021-07-07 11:12:06 Wed Jul  7 11:12:05 2021 - HARAKIRI !!! worker 5 status !!!
2021-07-07 11:12:06 Wed Jul  7 11:12:05 2021 - HARAKIRI [core 0] 10.0.0.224 - GET /Market/ since 1625670424
2021-07-07 11:12:06 Wed Jul  7 11:12:05 2021 - HARAKIRI !!! end of worker 5 status !!!
2021-07-07 11:12:06 DAMN ! worker 5 (pid: 42) died, killed by signal 9 :( trying respawn ...
2021-07-07 11:12:06 Respawned uWSGI worker 5 (new pid: 60)
2021-07-07 11:12:06 spawned 2 offload threads for uWSGI worker 5
2021-07-07 11:12:25 Wed Jul  7 11:12:25 2021 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request / (ip 10.0.0.224) !!!
2021-07-07 11:12:25 Wed Jul  7 11:12:25 2021 - uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 306] during GET / (10.0.0.224)

It looks like there are 2 different endpoints that are having the problem: /Market/details/KFS/ and /Market/. What do those do? Are they connecting to external services, accessing a database, reading/writing files etc?

If you add timing logging to those views (prints to stderr will appear in your error log), you can perhaps work out which parts of the view are taking longer than you expect.

Hi Glenn,

All my endpoints have had that problem, not just 'Market/details/KFS/' or 'Market/'. I feel like my whole website was down when that problem occurred.

Most of my pages are hefty with queries and manipulation, but the views get done on average in < 0.300 secs in normal circumstances. I added a timer on some pages to analyse the problem but the information I gathered was not enough to pinpoint the origins of the problem. I will add more timers and try to reproduce the issue.

Thank you.

Looks like a good idea. Let us know when you find something.

I have recorded the same issue this afternoon as well.

No Harakiri this time, the website was able to load my pages after being idle for ~3m 40sec.

My logs show absolutely no sign of issues. Operations are timing roughly the same as usual. The website was simply inaccessible until my server displayed:

2021-07-08 15:11:44 announcing my loyalty to the Emperor...

And then everything was working fine.

Anytime I encounter this issue I ask my partner to try to access one of our pages on his browser. He can get to our domain at "/", but any login attempt will make his browser load indefinitely just like mine. Both our request come back at exactly the same time to display the right page.

I have noticed some Harakiris in the middle of the night. We have almost zero traffic at that time. I suspect a random crawler visits our page and triggers one of our footer views just at the time this issue happens. It takes more than 5 minutes to resolve itself and then it leaves an Harakiri message on my log.

I really appreciate your insights on this.

Have a nice day.

We reboot servers from time to time due to maintenance. That makes web apps start from cold from time to time, and slow on the first request.

I found was caused the issue.

By placing my timers better I noticed that the only operation that would make my views crash was to save() an instance to the database.

I could query just fine, but after changing an attribute the database would not let me save the changes.

Turns out that the regular backups of my database (I did one every hour) was the thing that was slowing me down.

I have slowed down the frequency of my backups now and everything is back to normal.

Thank you for your help!

Have a good one.

Glad you've figured that out and thanks for letting us know!