Forums

Lessons learned from outage of 2015.09.05

  1. I have set-up several monitors to check if website is up every 5 minutes at https://uptimerobot.com/ . It is free and it seems to work well. You get the average response time of the pages you monitor as a bonus. Email and SMS alerts included.
  2. On the PythonAnywhere dashboard I'd suggest you guys put a note "What to do if website doesn't work properly" (where you have the links to the logs):
  3. Where to monitor YOUR communications - Tweeter and NOT emails
  4. Once issues have been solved on your site - the need to RELOAD website...I wasn't aware of this triviality, and thus experienced really slow performance for 2 additional hours after the outage ended, till I have reloaded it...Maybe you can automate sending emails to all that have been affected once all is Ok but they need to reload their sites ?

Thanks! All good suggestions.

Re: point (4) -- that's actually a bit surprising for us. You shouldn't need to reload your web apps after an outage. We normally restart any misbehaving servers from our side, and if a server is working then it should be a binary working/not-working thing. Reloading the web app doesn't change anything that I can imagine would speed things up again.

That's not to say I have any doubt at all that you really did see your web apps performing badly for the two hours after the outage, and that reloading really did fix it. I just can't think of any way that could happen! More investigations needed...