Forums

Routing request with Shift-JIS encoded query string in Flask

I'm trying to create a server that's compatible with an old Japanese game, by handling the hardcoded endpoints the client sends with a Flask application. Some of the requests send a Shift-JIS encoded message within the query string, like www.example.com/post?text=こんにちは. Flask returns a 500 status code when it receives a string encoded in this way. If the query string only contains ASCII the request succeeds.

From a packet analysis in Wireshark, the query string contains a sequence of bytes encoded in Shift-JIS. In the access log the Shift-JIS encoded query shows up as a sequence of backslash-encoded characters, like "?text=\x8E\xD7\x82\xC8 ... HTTP/1.1".

How can I route Shift-JIS encoded queries in Flask?

Hmmm. I don't know for sure, but it may be that flask is built with the assumption that everything is unicode, at a very low level: docs here.

Where is the 500 error coming from?

To reproduce the error, I basically have something like

@app.route("/chat")
def add_comment():
    comment = request.args.get('comment')
    print(comment)
    return "OK"

This gives the following error:

print(comment)
    UnicodeEncodeError: 'latin-1' codec can't encode characters in position 4-9: ordinal not in range(256)

Can you share more of the traceback? I want to know where the error is coming from...

Sure, here it is:

2016-02-01 21:33:38,844 :Exception on /cgi-bin/wtalk/wtalk2.cgi [GET]
Traceback (most recent call last):
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/ruin0x11/elona_server/elona.py", line 108, in add_chat
    print(comment)
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 4-9: ordinal not in range(256)

try

print(request.query_string.decode('shift-jis'))

http://werkzeug.pocoo.org/docs/0.11/wrappers/#werkzeug.wrappers.BaseRequest

or maybe even just

print(comment.decode('shift-jis'))

It still gives me this:

  File "/home/ruin0x11/elona_server/elona.py", line 108, in add_chat
    print(request.query_string.decode('shift-jis'))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 4-9: ordinal not in range(256)

i wonder why it's decided latin-1 should be the default encoding? try forcing it to utf8:

comment.decode('shift-jis').encode('utf8')

or

request.query_string.decode('shift-jis').encode('utf8')

I got the same error for both statements.

OK so the problem is the latin-1 encoding I guess. Do you have any idea where that's coming from? Our system default encoding should be UTF-8 or plain ASCII. latin-1 is totally unexpected...

One more thing -- rather than just printing to stdout (which might not go anywhere) try printing to stderr, which will definitely go to the error log:

print(comment.decode('shift-jis').encode('utf8'), file=sys.stderr)

This is from the error log:

2016-02-03 00:36:49,525 :Exception on /cgi-bin/wtalk/wtalk2.cgi [GET]
Traceback (most recent call last):
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/ruin0x11/elona_server/elona.py", line 108, in add_chat
    print(comment.decode('shift-jis').encode('utf-8'), file=sys.stderr)
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 4-9: ordinal not in range(256)

It doesn't look like anything else was printed.

Also, this is the exact request being made. The escaped characters are the actual bytes being sent.

76.10.47.75 - - [03/Feb/2016:00:36:49 +0000] "GET /cgi-bin/wtalk/wtalk2.cgi?mode=regist&comment=chat\x8E\xD7\x82\xC8\x82\xE9\x94\x92\x8C\xD5Len\x81u\x82\xA0\x82\x93\x82\x84ZXC\x81v HTTP/1.1" 500 291 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)" "76.10.47.75"

latin-1 again! Any ideas why it's defaulting to latin-1? That's not the default we specify...

I tried setting up a minimal flask app of my own and sending it shift-jis encoded text, and it was able to handle it just fine.

In the meantime, rather than trying to print the actual strings, you can print their representation, if the prints are just for debugging:

print(repr(comment.decode('shift-jis').encode('utf-8')), file=sys.stderr)

I got the same error using repr.

I'm not sure why it would try to decode the string as latin-1 either. This is the only part of the script that tries to deal with encodings, and the entire server consists of the single python script.

Here is a link to the repository of this code: elona_server. The endpoint I was trying to fix was the one that routes to "/cgi-bin/wtalk/wtalk2.cgi".

Thank you for your patience.

I don't have a solution for you, but I do note that you're using a Windows client. Latin-1 is heavily used by Windows, so it may be that your client is reporting that it's sending everything in latin-1 and that is what's confusing Flask when it comes to try to interpret the input.

Perhaps you can investigate sending accept encoding headers from your app to force the client to send you utf-8 or something like that.

I got the same error using repr.

Really? Now that is truly strange... repr should only return ascii characters. Can you show me the traceback for that?

The client is using Windows, but unfortunately the requests are made from a binary to which there is no source.

This is the traceback:

2016-02-04 00:29:56,820 :Exception on /cgi-bin/wtalk/wtalk2.cgi [GET]
Traceback (most recent call last):
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/ruin0x11/elona_server/elona.py", line 108, in add_chat
    print(repr(comment.decode('shift-jis').encode('utf-8'), file=sys.stderr))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 4-9: ordinal not in range(256)

Ah, I think that explains it. The brackets for the repr are in the wrong place:

print(repr(comment.decode('shift-jis').encode('utf-8')), file=sys.stderr)

I'm sure we'll figure it out, one way or another. encodings are tricky, but once we can figure out where to get the raw bytes from, we'll be able to decode them correctly, and use unicode from there on...

With the above:

2016-02-08 05:33:01,842 :Exception on /cgi-bin/wtalk/wtalk2.cgi [GET]
Traceback (most recent call last):
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ruin0x11/.virtualenvs/eternal_league/lib/python3.4/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/ruin0x11/elona_server/elona.py", line 108, in add_chat
    print(repr(comment.decode('shift-jis').encode('utf-8')), file=sys.stderr)
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 4-9: ordinal not in range(256)

I'll try and see if I can reproduce the error by running a local instance.

I strongly suspect that your client is the problem here. It seems to be sending the bytes of the shift-jis string but saying that they are encoded as latin-1 and that is confusing Flask when it comes to extracting the strings from the request. It's possible that you can get the raw bytes from Flask and mangle the encoding from there.

I know we're in danger of having too many chefs here, but this encoding error doesn't make any sense. The only way that could be happening, that I can think of, is if you failed to restart your webapp. But let's try to do some step-by-step debugging. Add something like this:

print('stderr encoding', sys.stderr.encoding, file=sys.stderr)
print('sys default encoding', sys.getdefaultencoding(), file=sys.stderr)

print('type of raw querystring', type(request.query_string), file=sys.stderr)

comment = request.args.get('comment', file=sys.stderr)
print('type of comment', type(comment), file=sys.stderr)

print('repr of querystring', repr(request.query_string), file=sys.stderr)

print('repr of comment', repr(comment), file=sys.stderr)

print('attempting to decode querystring', file=sys.stderr)

print('repr of decoded querystring', repr(request.query_string.decode('shift-jis')), file=sys.stderr)