whitelist and weather websites : Forums : PythonAnywhere

whitelist and weather websites

Hi, I'm not entirely clear about how the whitelist is selected (so I'm not sure if its kosher to request this)? Looking at the forum threads, it seems like people have requested some sites and those have been added?

I'm on the free plan and am working on this little script to gather data from some of the weather forecast websites that provide free access to their data. I see that api.openweathermap.org is already there so one of the modules work. I am also wanting to access

open.live.bbc.co.uk
yr.no
eklima.met.no

Any chance of getting these on the whitelist as well? Thanks

omer | 11 posts | July 5, 2016, 9:29 p.m. | permalink

The criteria for inclusion are:

Does it have a public API?
Will it be useful to a large number of other free users

On that basis, I have added yr.no and eklima.met.no to the whitelist. I couldn't find anything about open.live.bbc.co.uk, so I have not added it. If you can point me to the API docs for that site, we'll reconsider.

glenn | 9498 posts | PythonAnywhere staff | July 6, 2016, 10:14 a.m. | permalink

Aha! Brilliant thanks! So BBC, provides this live RSS feed for the weather forecast. E.g. http://open.live.bbc.co.uk/weather/feeds/en/2643743/3dayforecast.rss

The page describing this is http://www.bbc.com/weather/about/17543675

There isn't an API I think, but one can get the RSS feed for any city? And the feed can be parsed to extract the data from the XML as I'm doing in my python module. Does that qualify? Thanks

omer | 11 posts | July 6, 2016, 12:18 p.m. | permalink

Hmm I'm actually still getting an error parsing the YR website. Could it be because there's a www. behind it? Or is that handled internally by the white list engine?

omer | 11 posts | July 6, 2016, 12:28 p.m. | permalink

Same for the eklima.met.no site - I get a 403 forbidden error?

omer | 11 posts | July 6, 2016, 12:47 p.m. | permalink

It looks like yr.no redirects to www.yr.no, even though their docs say yr.no. I've added www.yr.no to the whitelist and that's working now.

eklima.met.no is working, I have managed to get an API response from there in a free account. The 403 you're getting must be from the site or because you're not using the proxy settings.

glenn | 9498 posts | PythonAnywhere staff | July 6, 2016, 2:03 p.m. | permalink

yes, Yr.no is working for me too now - thanks :-)

I also have a free account with eklima and the script works when I run it on my local machine. I'm using pysimplesoap to access the XML and I'm able to view the XML when I place the URL in a webpage so the SOAP request itself seems ok and the remote server seems to be responding correctly. I haven't set up any proxy, but... was that something needed by pythonanywhere? What proxy settings should I be using?

Here's the error I get back - maybe you can see what I'm doing wrong?

File "./voll_station_data.py", line 53, in <module>
    response = client._url_to_xml_tree ("http://eklima.met.no/metdata/MetDataService?invoke=getMetData&timeserietypeID=0&format=&from=&to
=&stations=68860&elements=UM%2CPRM%2CRR%2CTAMRR%2CFFM%2CTAN%2CTAX%2CDD18%2CNNM&hours=&months=&username=", False, False)
  File "/home/omer/.local/lib/python2.7/site-packages/pysimplesoap/client.py", line 529, in _url_to_xml_tree
    xml = fetch(url, self.http, cache, force_download, self.wsdl_basedir, self.http_headers)
  File "/home/omer/.local/lib/python2.7/site-packages/pysimplesoap/helpers.py", line 76, in fetch
    response, xml = http.request(url, 'GET', None, headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1570, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1317, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1252, in _conn_request
    conn.connect()
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 901, in connect
    self.sock.connect((self.host, self.port) + sa[2:])
  File "/usr/local/lib/python2.7/dist-packages/httplib2/socks.py", line 424, in connect
    self.__negotiatehttp(destpair[0], destpair[1])
  File "/usr/local/lib/python2.7/dist-packages/httplib2/socks.py", line 390, in __negotiatehttp
    raise HTTPError((statuscode, statusline[2]))
httplib2.socks.HTTPError: (403, 'Forbidden')

omer | 11 posts | July 6, 2016, 3:55 p.m. | permalink

Yup. That looks like the library doesn't use the proxy settings. In general, well-behaved libraries will notice when there's a proxy set in the environment and use it.

You'll need to set the proxy to proxy.server on port 3128

glenn | 9498 posts | PythonAnywhere staff | July 6, 2016, 4:08 p.m. | permalink

Any thoughts about the BBC RSS feed? Do you think that fits the requirements for whitelisting?

omer | 11 posts | July 6, 2016, 5:03 p.m. | permalink

hmm, setting the proxy didn't seem to get the requests through (I get the same errors - copied below)? Note that I haven't set a username for the proxy, should I be doing that as well?

using proxy {u'proxy_port': 3128, u'proxy_host': u'proxy.server'}
Traceback (most recent call last):
  File "./voll_station_data.py", line 55, in <module>
    response = client._url_to_xml_tree ("http://eklima.met.no/metdata/MetDataService?invoke=getMetData&timeserietypeID=0&format=&from=&to
=&stations=68860&elements=UM%2CPRM%2CRR%2CTAMRR%2CFFM%2CTAN%2CTAX%2CDD18%2CNNM&hours=&months=&username=", False, False)
  File "/home/omer/.local/lib/python2.7/site-packages/pysimplesoap/client.py", line 529, in _url_to_xml_tree
    xml = fetch(url, self.http, cache, force_download, self.wsdl_basedir, self.http_headers)
  File "/home/omer/.local/lib/python2.7/site-packages/pysimplesoap/helpers.py", line 76, in fetch
    response, xml = http.request(url, 'GET', None, headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1570, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1317, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1252, in _conn_request
    conn.connect()
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 901, in connect
    self.sock.connect((self.host, self.port) + sa[2:])
  File "/usr/local/lib/python2.7/dist-packages/httplib2/socks.py", line 424, in connect
    self.__negotiatehttp(destpair[0], destpair[1])
  File "/usr/local/lib/python2.7/dist-packages/httplib2/socks.py", line 390, in __negotiatehttp
    raise HTTPError((statuscode, statusline[2]))
httplib2.socks.HTTPError: (403, 'Forbidden')

omer | 11 posts | July 6, 2016, 5:53 p.m. | permalink

No it doesn't need a username or password. I'm not sure why it's still not using the proxy, perhaps that's something you can raise with the maintainers of the library.

glenn | 9498 posts | PythonAnywhere staff | July 6, 2016, 6:09 p.m. | permalink

right! I think you're right. I see that I can get the URL via curl in the bash shell, so it must be something to do with the library. I just came across the pythonanywhere help page mentioning httplib2 - the pysimplesoap library code seems to indicate that proxies should work, but the comment on the pythonanywhere website seems to indicate that httplib2 won't work at all? Note I'm using python2.7

omer | 11 posts | July 6, 2016, 6:23 p.m. | permalink

httplib2 seems more and more like the problem. I edited the python lib for pysimplesoap and changed it to use urllib2 and then the code went through without a problem. Any ideas why httplib2 would misbehave?

omer | 11 posts | July 6, 2016, 9:41 p.m. | permalink

I always recommend using the requests library for HTTP stuff -- it picks up proxy settings from the environment and has a really nice API. httplib2 needs quite a lot of configuration to make it recognise proxy settings, and IIRC it has some obscure bugs.

giles | 11788 posts | PythonAnywhere staff | July 7, 2016, 11:10 a.m. | permalink

yes I think that's something to suggest to the pysimplesoap contributors.

Just going back to the open.live.bbc.co.uk rss feed question: I do think the RSS feed is something other users will also benefit from, since it provides weather for cities around the world. The RSS feed XML is easy to parse in python scripts. The server address is different from the main news website, so I would have thought this would be ok? But ... since it hasn't been added yet, should I take this to mean that it doesn't fit the whitelist requirements?

omer | 11 posts | July 7, 2016, 4:56 p.m. | permalink

I think the silence re: the BBC thing was an oversight on our part. Sorry!

You're right, it does look like that domain is entirely designed for machine-readable stuff like RSS feeds. I've whitelisted it. (It won't appear on the list of whitelisted sites until we next do a system update, but it should work now.)

giles | 11788 posts | PythonAnywhere staff | July 7, 2016, 5:07 p.m. | permalink

no worries - and yes everything works now - so thanks a lot!

omer | 11 posts | July 7, 2016, 5:50 p.m. | permalink

No problem! Glad to help :-)

giles | 11788 posts | PythonAnywhere staff | July 7, 2016, 5:51 p.m. | permalink