Forums

sites that block scraping from cloud hosts?

ahoy friends, is there a way to tell if a site is blocking PA IPs, or basically any cloud IPs?

I've got two government web pages that I want to open; and I'm on a paid tier of PA.

Locally, I can open them with requests, no problem. but on PA, I always get a timeout error. Ive tried sending a User-Agent & other headers.

But since I can run this code locally, I'm kind of narrowing it down to an IP issue? Maybe ... ?

Fwiw, both the sites I'm trying to open are public state government websites with public notices on them. They have no reason to be hidden.

If you're getting connection timeouts from a paid account, then it's likely that the sites are blocking cloud IPs, yes. You could potentially contact the site administrators to confirm (though I guess that with government sites they might not be super-responsive). They'd probably want to know the IP address that you're using to connect to them so that they can check any whitelists -- the ipify Python package is a good way to do that.

ok rad thanks!

Glad to hear that it helps you!