Forums

Python 3 / scraping / ascii error when run under "schedule" tab

Hi y'all,

I'm running a little web scraper, Python 3, using Beautifulsoup and csv. Once a day, I hit a website, grab what's there and put it in a .csv.

When I run it under the "files" tab, it runs fine. It places the data I grab in a .csv.

But if I schedule it, it doesn't work due to a UnicodeEncodeError

 csvwriter.writerow(all_documents[i]) 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 18: ordinal not in range(128)

all_documents[i] is a list of str

I'm not clear on how encoding works.

How can I make this work as a scheduled script?

I think it might be a coincidence that the scheduled task hit that error - have you tried both ways of running it at the -same- time? Presumably the web page that you're scraping changes occasionally, and I suspect that sometimes the page content includes foreign / accented / other characters that your code isn't handling well.

The 'encode' function has options for error handling.

HTH

Jim

There are some slight differences in the environment for scheduled tasks and for consoles, which could explain why they have different defaults. (I know how infuriating this is. Python3 might be better than Python2, but I still can't, for the life of me, figure out why it doesn't just default to UTF8 everywhere).

In any case, to fix the problem, you'll want to just be explicit about your encodings and decodings. Use the encoding argument when you're opening your file for the csv.writer:

with open('/path/to/my.csv', newline='', encoding='utf-8') as f:

https://docs.python.org/3/library/csv.html