Forums

Is urrlib requests forbidden?

Right now I am a free user. I tried this code:

import urllib.request

def fetchpage(url,values=None,header={"Referer":"http://www.google.com/"}):
    data=None
    if values!=None:
        data=urllib.parse.urlencode(values)
        data=data.encode('utf-8')
    req=urllib.request.Request(url,data,header)
    response=urllib.request.urlopen(req)
    html=response.read()
    html=str(html)
    return html

But I get exception urllib.error.HTTPError: HTTP Error 403: Forbidden?

So are free accounts forbidden to make outgoing connections?

Yup. We have a whitelist of sites that free users can make http(s) requests to. We did this to combat widespread abuse from free accounts. Have a look at the help page for more details.

I would use this thread to ask another problem i am having. I am new to web programming.

So I have a main flask web app which calls another module to process some task. That module tries to open a sqlite database. But I was getting the error that no such table exists even though it exists!

So I tried what was suggested here: https://www.pythonanywhere.com/forums/topic/230/

DATABASE = '/home/zerop/mysite/maindb.db'
app = Flask(__name__)
app.config.from_object(__name__)
app.config.from_envvar('FLASKR_SETTINGS', silent=True)
sqlite3.connect(app.config['DATABASE'])

Ok, so this works now.

But why is it necessary to use such a complex method to open db. On my own computer while running flask i simply give the location of the db in sqlite3.connect and it works.

I am clueless.

I don't see anything complicated there. That's just Flask's way of allowing you to configure things more easily. It means that you can keep things like what database to use in one place instead of having magic strings scattered around your code.

I'm having a similar error to the original post above even though I'm using a whitelisted site. When the data is encoded as bytes, I receive the 403, when it is a str the connection is successful. I need it to be encoded for the POST to work properly in the app. Anyone have any guidance? Thank you!

try: import simplejson as json
except ImportError: import json
import urllib.parse, urllib.request, urllib.error

MIKOMOS_BASE_URL = 'http://mikomos.com/w/api.php'

def makomQuery(action, makomquery):
    '''Queries Mikomos and returns makom/category info or list as a dict.'''
    try:
        # Figure out which type of query we want to use
        if action == 'browsebysubject': # for makom property lookup
            subject_or_query = 'subject'
        elif action == 'ask': # for the actual Mikomos search
            subject_or_query = 'query'
        # Now make the actual request
        args = {
            'action' : action,
            subject_or_query : makomquery,
            'format' : 'json',
        }
        data = urllib.parse.urlencode(args)
        binary_data = data.encode('utf8')
        req = urllib.request.Request(MIKOMOS_BASE_URL, binary_data)
        response = urllib.request.urlopen(req)
        mikomos_response = json.load(response)

        return mikomos_response
    except urllib.error.URLError as err:
        print("URLError: {}".format(err))
        return dict()

Hm. Maybe try using the requests library to do your POST? It tends to abstract away some of the encoding complexity for you, and hopefully will "just work"....

I tried with the Requests library also but I'm still getting the 403:

import urllib.error, requests
[...]
req = requests.post(MIKOMOS_BASE_URL, data=args)
return req.json()

What does the text field of the req object have in it? My guess is that the 403 error isn't coming from our proxy to say "this site isn't on the whitelist" -- it's actually coming from mikomos.com to say "there's something wrong with your request". The text should tell us whether or not that's true.

I opened up Python consoles on both PythonAnywhere and my local machine and ran the following:

payload = {'action':'browsebysubject', 'subject':'Property:Dairy_or_meat', 'format':'json'}
req = requests.post('http://mikomos.com/w/api.php', data=payload)

On PythonAnywhere req.text outputs:

'<HTML>\n  <HEAD>\n     Access Denied\n  </HEAD>\n<BODY>\n\n<h1>Access Denied</h1>\n\n<p>\nAccess to arbitrary websites is not available from free accounts;\nyou can only access sites that are on our\n<a href="http://www.pythonanywhere.com/whitelist">whitelist</a>.\nIf you want to suggest something to add to our whitelist\ndrop us a line at support@pythonanywhere.com.  It will have\nto have an official public API.\n</p>\n\n\n<p>\nAlternatively, you can sign up for a paid account at\n<a href="http://www.pythonanywhere.com/account/">http://www.pythonanywhere.com/account/</a>\n</p>\n<p>\nIf you have already got a paid account and you\'re still getting this messge,\nyou may need to reload your web app (from the "Web" tab) or restart\nyour consoles.  If that doesn\'t help, drop us a line at support@pythonanywhere.com.\n</p>\n\n</BODY>'

On my local machine, however, the output is what I expect:

'{"query":{"subject":"Dairy_or_meat#102#","data":[{"property":"_MDAT","dataitem":[{"type":6,"item":"1/2013/8/4/21/23/4"}]},{"property":"_PVAL","dataitem":[{"type":2,"item":"Dairy"},{"type":2,"item":"Pareve"},{"type":2,"item":"Meat"}]},{"property":"_SKEY","dataitem":[{"type":2,"item":"Dairy or meat"}]},{"property":"_TYPE","dataitem":[{"type":5,"item":"http://semantic-mediawiki.org/swivt/1.0#_txt"}]}],"serializer":"SMW\\\\Serializers\\\\SemanticDataSerializer","version":0.1}}'

Any other ideas? Thank you!

Ah, hang on! mikomos.com isn't whitelisted, where did you read that it was?

I'm happy to add it to the whitelist if it has an official public API -- just give me a link to the documentation.

I had contacted PythonAnywhere support via the online feedback form to request it be whitelisted and Conrad emailed me back five days ago letting me know that, "We have added www.mikomos.com to the whitelist for you." I have been able to confirm the site is now whitelisted in two ways:

  1. Opening up a Bash console and visiting http://www.mikomos.com/w/api.php with the Lynx browser is successful, whereas visiting non-whitelisted pages is not successful.
  2. When a request is sent without being encoded to bytes, the request is successful. (The data is encoded automatically in requests, but with the urllib method above it can be skipped by omitting binary_data = data.encode('utf8').)

Ah, now I see the problem. There's a difference between mikomos.com and www.mikomos.com. The latter was whitelisted, but not the former. It sounds like the API is using different ones at different times (which isn't ideal).

I've whitelisted the non-www version as well. Let's see if that helps.

Hi Giles,

It works now. Thank you very much for your help looking into and fixing this for me, even though I am a free user. I appreciate your perseverance in all the back-and-forth on the issue. You guys have really have top-notch support and customer service.

Thank you,

Avi

No problem, glad to help!