Forums

PA servers HTTP caching issues when using requests

Hello,

Some HTTP GET requests that I am doing with python requests library are being somehow cached somewhere between the PA server and the server I am connecting to. It started happening this morning. The same http GET request returns a different thing from the server than in my local, which signals something is cached on PA infrastructure.

  • On the server, if I do:

    import requests

    requests.get('http://www.elmundo.es/bolsa/datos/valores_indice.html?cod_indice=I.MA').json()['indice']['valores'][0]['hora']

    I get as a result: '09:10'

  • On my local python (same environment and thus lib versions), If I execute the exact same code, I get:

'18:00'

I am only getting the correct response on my local, PA is not working, I am getting a response from early in the morning. Please help and let me know how I can disable this caching.

Is the time changing when you run it on PythonAnywhere? That is, are you getting, say, 09:10 the first time, then 09:11 if you run it a minute later?

I'm asking because it's possible that the server you're sending those queries to might be sending different times (or different data) to different servers, perhaps based on their IP address.

Time was not changing on PythonAnywhere console. Not sure what was going on, but looks like this is one of those issues close to impossible to debug.

Anyway, it's working back as normal today. Thanks

OK, glad to hear it's working now. Just for clarity, we don't do any caching whatsoever -- from a paid account, your access to the Internet is a simple network connection. So while something strange is definitely going on, it's definitely not a cache, at least on our side.

Just in case it happens again, some useful information would be:

  • Where are you running the code? That is, is it from consoles, from inside a website's code, or in a scheduled task?
  • If it's in a console, does it happen from all console servers? Each console you start is automatically allocated to a random console server in our cluster -- you can find out which server you're on by running "hostname" if it's a bash console.
  • If it's from a website and you have several, which one is it?

I'm asking this because I'm wondering if it's related to the specific machine you're running the code from. (If it's a scheduled task, then it will always be the same machine, so that would eliminate that possibility given that the problem comes and goes.)

Thanks Giles,

  1. I was getting the strange behaviour from my python task code, from the console using curl, and from the console opening a python shell and using requests inside it. So it was consistent.
  2. Yeah, when testing this I tried from at least 5 different consoles and same stuff in all.
  3. No website request, just scheduled tasks.

I have no idea what happened, the only open answer is that the server blocked my IP, and instead of returning the latest message, it was returning an old content (or maybe cached on their side with a lagged response), who knows.

Do you know how can I get the external IP from the console next time it happens? I guess in that case I could try opening multiple consoles until I get a diff IP and then send the request to see if the behavior keeps happening.

Thank you

Try something like this.