Forums

Accessing my files via requests.session( )

I've written a script on my local machine to log into my PythonAnywhere instance and attempt to get my access log files. To do this, I first login to the site like this:

session_requests = requests.session()
result = session_requests.get(BASE_URL)
token = result.cookies['csrftoken']

payload = {'auth-username': 'pjones',
           'auth-password': '[my pwd]',
           'csrfmiddlewaretoken': token}

result = session_requests.post(BASE_URL, data=payload, headers=dict(referer=BASE_URL))

Here BASE_URL is just https://www.pythonanywhere.com. This works fine in that it comes back with a 200 status code and the HTML for a page that is something other than the login page. But when immediately after I do this:

url = 'https://www.pythonanywhere.com/user/pjones/'
page = session_requests.get(url, headers=dict(referer=url))

print(page.status_code)
print(page.text)

I get another 200 response, but the text content of it is the HTML for the login page rather than my dashboard page as I'd expect. It's almost as if the login has a (very short) lifetime but I can't find anything in the requests documentation indicating that it implements such a thing.

Is there something specific to PythonAnywhere that is making this happen? Thanks!

perhaps you are not including all the cookie values in the final request?

Hi, thanks for the reply. True that I was not submitting cookies in my request. But if I do that, say like this:

BASE_URL = 'https://www.pythonanywhere.com/user/pjones'
ACCESS_FILE_PATH = '/files/var/log/www.atweather.org.access.log'

...and then after my session and login requests:

jar = requests.cookies.RequestsCookieJar()

url = '{}{}'.format(BASE_URL, ACCESS_FILE_PATH)
r = requests.get(url, headers=dict(referer=url), cookies=jar)

jar.set('csrftoken', token, domain=BASE_URL, path=ACCESS_FILE_PATH)
jar.set('cookie_warning_seen', True, domain=BASE_URL, path=ACCESS_FILE_PATH)
jar.set('sessionid', session_id, domain=BASE_URL, path=ACCESS_FILE_PATH)

url = '{}{}'.format(BASE_URL, ACCESS_FILE_PATH)
r = requests.get(url, headers=dict(referer=url), cookies=jar)

print(r.status_code)
print(r.text)

Then I still end up with a 200 status on the login page. Here token and session_id are what the login process gave me. This should give me HTML that's mostly the text of my access log. I don't think I'm that far from what I'm looking for?

What are you getting in the status_code and text?

The status code is just 200, and the text is the HTML for the PythonAnywhere login page. I get 200's all through the session request, login POST, and this last bit - and yet I never seem to get past the login page.

stab in the dark but when you do the second request page = session_requests.get(url, headers=dict(referer=url)) doesn't that start a new session?

<p>found this <a href="https://stackoverflow.com/questions/12737740/python-requests-and-persistent-sessions"> https://stackoverflow.com/questions/12737740/python-requests-and-persistent-sessions</a></p>

If you're getting the login page, that suggests that you're not using a valid sessionid.

Is there a reason you're not using the API?

@jamesdavies000 - I think that the line session_requests = requests.session() ensures that whenever I use session_requests downstream, I'm using the same session.

In fact, when I print(result.cookies) after all three of my calls (session establishment, login, and HTML retrieval attempt) I get the same token and session id throughout.

@glenn - yes I could use the files endpoint on the API. I'm just tinkering around with requests and Beautiful Soup, and wanted to see for fun if I could scrape the file, parse it, and insert into a Postgres table on my Raspberry Pi. The API will ultimately work just fine for this purpose.

+1 to using the API :-)

If I'm reading your code correctly, you don't seem to be posting to the login view at any point -- this line from your first post:

result = session_requests.post(BASE_URL, data=payload, headers=dict(referer=BASE_URL))

...is posting to the front page, not to the login view. The login view is at https://www.pythonanywhere.com/login/

Coming back to this again.

@giles thank you for your response. After looking at the API further, it seems that there's no endpoint for getting files. Am I not looking at the correct thing? If I want my access logs then do I really have to copy and paste them?

So I come back around to my original problem of not being able to programmatically log in to my web dashboard and get them. If I try to POST to the login view as you suggested:

payload = {'auth-username': 'pjones', 'auth-password': [my pwd], 'csrfmiddlewaretoken': token}
r2 = session_requests.post('https://www.pythonanywhere.com/login/', data=payload)

where token is the CSRF token retrieved from first doing a GET on the login page, then I receive a 403 status code. But I am for sure using my correct credentials in the payload. If I include the headers argument in the post like I did before then the 403 becomes a 400. Something's way off either way.

hi there, as a paying user, you can use sftp

Hi again. Just wanted to loop around and report that I used Python's awesome paramiko module (see here) to programmatically create a secure SFTP client and pull files in from /var/log where the access logs are located. Just wanted to put it out there in case anyone else needs or wants to do this. Thanks for all the help.

Thanks, @pjones!

@pjones can you please help me with how to use the paramiko module to log in to python anywhere and download a file from my folder?

@Hoip -- you'd need a paid account for that, since it requires SSH connection.