Forums

string splicing not working correctly

Please ignore the title, I did more digging and edited below:

I have a very long string (51616 chars) that is being passed to my app (using web.py) from an html POST form with enctype="multipart/form-data". when I run my code locally with python 2.7.11, I get one result which works perfectly. when I run it in my app on PythonAnywhere, it is perfect except characters 16261-16293. Everything before/after that is right on.

Interestingly, Chrome makes the same mistakes as firefox, but slightly out of alignment.

Chrome, then firefox, then local(correct):

...LCJzb3VuZHNk2kb3RFp5JtAeUx2tHXjjBpgm4UHmcz9kUmFp...
...LCJzb3VIXk2kb3RFp5JtAeUx2tHXjjBpgm4UHmcjdGVkUmFp...
...LCJzb3VuZHNFbmFibGVkIjpmYWxzZSwiY29sbGVjdGVkUmFp...

the next part of my program is to base64.b64decode(string). If I do that on the incorrectly-spliced string, I get unicode continuation/start errors. Here's an example of decoding the incorrectly-spliced string: http://pastebin.com/1WdDbNfT. If you ctrl+f for 'purchaseti^' you'll see what's happening.

My guess is that the string is longer than some limit (string size, byte size, etc.) and the webserver is splicing two parts of it together somehow before sending it to the program. However it doesn't look like unicode and I've experimented with .encode() and .decode() with no luck. Also, if I post a PART of this 51k-character string, say 500 chars, it works just fine. any help with this would be greatly appreciated. Let me know if you need more info.

edit: relevant? http://stackoverflow.com/questions/2276759/php-whats-the-total-length-of-a-post-global-variable

Are you still having troubles?

You mentioned you are using web.py. Is that still supported? I can imagine it would have unfixed bugs if not...

Yes it's still not working. I don't know if it's still actively supported - the most recent change on github was 2 years ago. However, everything works just fine on PythonAnywhere unless I paste in a long string.

If I code the full string into the program itself and run it, my PA site works perfectly. But if I tell it to use the inputted string and paste it in to the POST form (which is what I need it to do), that small portion of the string comes into the program differently than it was entered into the browser.

More info: I created a ~57,000-character long string of 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' repeating, sent it to my program and told it to return me exactly what I put in. I copy/pasted that into a new local program as such:

testresult = '(long string here)'
fixed = testresult.replace('ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'x')
f = open('temp.txt', 'w')
f.write(fixed)
f.close

I got this result:

...xxxxxABCDEFGHIJKLMNOPQRSTIJKLMNOPQRSTUVWXYZxABCDEFGHIJKLMNOPQRSTGHIJKLMNOPQRSTUVWXYZxxxxxx...

so it dropped 15 characters, then was fine for 64, then dropped 12 more, then was fine after that.

I think there's some disagreement between your browsers encoding of the form data and web.py's decoding of the data. I have written the same test in a Flask app and it works as expected. I even bumped the input string length up to 5000 copies of the alphabet and it still works. I suspect that debugging this may involve delving deep into how your browser is sending the request and how web.py is interpreting it.