Forums

url encode in PATH_INFO

System image: fishnchips

Python version: 3.8

def application(wsgienv,starter):
    starter('200 OK',[('transfer-encoding','chunked')])
    for k,v in wsgienv.items():
        yield f'{k}: {v!r}\r\n'.encode('utf8')

It is a simple code to show received request.

Make a request:

curl "https://<web app host>/test/%E4%B8%AD%E6%96%87?query=%E6%97%A5%E6%9C%AC%E8%AA%9E"

will get such response:

QUERY_STRING: 'query=%E6%97%A5%E6%9C%AC%E8%AA%9E'
REQUEST_URI: '/test/%E4%B8%AD%E6%96%87?query=%E6%97%A5%E6%9C%AC%E8%AA%9E'
PATH_INFO: '/test/ä¸\xadæ\x96\x87'
...

Here is the question:

Why the 'PATH_INFO' has different value with the path part of 'REQUEST_URI'?

Is there a way to config my account to make 'PATH_INFO' show the percent-encoded path, or just use

urllib.parse.urlsplit(wsgienv['REQUEST_URI'])

and forget the 'PATH_INFO'?

Query is a part of URI but not a part of the path. Are you looking for https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode?

Yes, I know query is not a part of path.

I mean, the 'PATH_INFO' is not a percent-encoded string, but string in some other encoding and is different with the requested path from the curl.

My question is, to correctly recognize the requested path and query, which should be used? 'REQUEST_URL' splited with urllibh.parse.urlsplit? or 'PATH_INFO' + 'QUERY_STRING'?

I am not sure what do you want to get. What is the end goal?

The goal is to access the requested path in the code.

In the code below, the 'path' variable is not the requested path, if path is percent-encoded (as the example in the first post):

path=wsgienv['PATH_INFO']
  1. Why "wsgienv['PATH_INFO']" is wrong?

  2. Why "wsgienv['REQUEST_URI']" has the correct path but 'REQUEST_URI' is not defined in https://wsgi.readthedocs.io/en/latest/definitions.html ?

  3. If the 'REQUEST_URI' is only available in pythonanywhere, is there a way to access the correct path anywhere, not only in pythonanywhere?

  1. PATH_INFO is not wrong. You're just encoding it to UTF8 and it's probably not originally encoded in utf8. Use the encoding that is declared on the request to encode it.
  2. The WSGI docs that you point to are probably not an exhaustive list
  3. REQUEST_URI may not be available everywhere, but it's certainly not something that is only available on PythonAywhere.