Forums

PythonAnywhere editor only recognises ASCII or UTF 8-encoded text ?

I have a piece of python code that extracts some data from a website. That informations contains some characters that are used in Sweden like 'Å''Ä''Ö'. Now i use pickle to dump that information to a textFile. The information in that textFile is later used for something else so i have two problems that are related here. I will explain both that will make you understand it better:

1) When i try to edit the textFile through PythonAnywhere i get the error message

PythonAnywhere editor only recognises ASCII or UTF 8-encoded text

2) When i use the information from the textFile later in the Python code i get this error message:

KeyError: u'Div 3 Mellersta G\xcb\x86taland, herrar'

Its that extra "u" in the start that messes up things. How could i possibly fix this?

Thank You

Hmm. OK, the problem here is all to do with character encodings, which is a kind of complicated topic. Just to make sure I don't waste your time by explaining stuff that you already know, do you already know all about UTF-8, character encodings, and the use of the Python encode and decode functions?

I have never used them but i can guess what they do. This is a part of the code that i get an error of:

@app.route('/addEvent/', methods=['POST'])
def addEvent():

    data = request.get_json()
    dict = pickle.load( open( "gameFile.txt", "rb" ) )

    selectedTeam = data["selectedTeam"]
    eventTypeAndName = data["eventType"]

    eventType, name = eventTypeAndName.split("-")   # split the eventTypeAndName string to get each string seperate

    homeScore = dict[data['section']][int(data['gameID'])]['homeScore']

and the last line thats where i get the error

KeyError: u'Div 3 Mellersta G\xcb\x86taland, herrar'

and the dict['section'] contains that key. How would a fix on this look like. Because i have many lines that looks like that..

Here's a good overview on unicode encoding.

Thanks for that link, Conrad :-) @Timocin, I really recommend you read that article, then read my extra information below...

The problem is that the data you're getting from your request are Python unicode strings -- that is, each character isn't a byte, it's some kind of clever internal representation of a character. Python's error message is showing you a UTF-8 representation of those characters.

The data you have in your pickled dictionary is using some different kind of encoding scheme. The strings that you're using as keys are sequences of bytes, which represent the characters in some manner.

What you need to do is find out what encoding you used for the data that's in your pickled dictionary, and then convert the key to that encoding like this data['section'].encode('representation-name'). Unfortunately I can't say what encoding your dictionary uses; that's something you'll have to look at the original source of the data for. You might have some luck with the chatdet Python module, which tries to guess for you -- it's installed on PythonAnywhere for Python 2.7, and you can install it for other versions by running (for example) pip3.3 install --user chardet in a bash console.

Thanks for sharing me, it working

:)