Forums

Program works locally, doesn't work on PythonAnywhere

I have this Python script that checks a webpage for live updates, and posts a message on a group when there is one. I have it running as an always-on task on my payed account.

When I run it locally it works fine, I get a message with the update, like so:

Arutz Sheva update: Tuesday: 9:00 p.m. The death toll from the Hamas massacre on Saturday has surpassed 1,000.

However, when I run it as a always running task on Python Anywhere, I just get an empty message, like so:

Arutz Sheva update:

I noticed that in the log I am having this error, which I do not have when running the program locally:

Oct 10 20:38:49 /home/Kovy/.local/lib/python3.10/site-packages/urllib3/connectionpool.py:842: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings Oct 10 20:38:49warnings.warn(( program locally:

My main question is, why do I get normal messages when running the script locally, but an empty message when running it on PythonAnywhere? And as an aside, does it have anything to do with this error.

Code:

from bs4 import BeautifulSoup
from groupy.client import Client
import time
import urllib3

updates = []

def get_wepage():
    http = urllib3.PoolManager()

    url = 'https://www.israelnationalnews.com/news/378017'  # Replace with the URL of the web page you want to scrape
    response = http.request('GET', url)

    # Step 2: Parse the HTML with BeautifulSoup
    soup = BeautifulSoup(response.data, 'html.parser')

    # Step 3: Find all <p> tags and extract their text
    paragraph = soup.find_all('p')

    # Step 4: Extract the text from each <p> tag
    text_content = '\n'.join([p.get_text() for p in paragraph])

    return text_content

def get_latest_update(text):
    days = ['Sunday:', 'Monday:', 'Tuesday:', 'Wednsday:', 'Thursday:', 'Friday:', 'Saturday:', 'Sunday', 'Monday', 'Tuesday', 'Wednsday', 'Thursday', 'Friday:', 'Saturday', 'Sunday,', 'Monday,', 'Tuesday,', 'Wednsday,', 'Thursday,', 'Friday,', 'Saturday,']
    live_update = False
    update = []
    lines = text.splitlines()
    old_updates = []

    for line in lines:
        if line == 'Live Updates:':
            live_update = True
        if live_update == True:
            words = line.split()
            if words[0] in days and update != []:
                live_updates = False
                break
            else:
                if line != "Live Updates:": update.append(line)
        else:
            pass

    update = '\n'.join(update)
    return update

def check_if_new(update):
    file = open('updates.txt', 'r')

    updates_file = file.read()
    updates_file = updates_file.splitlines()

    if update in updates_file:
        print (f"\n\nLatest update: {update}\nUpdate is not new.")
    elif update not in updates_file:
        print (f"\n\nLatest update: {update}\nUpdate is new!")

        file = open('updates.txt', 'a')
        file.write(f"{update}\n")

        file.close()

        client = Client.from_token(token)
        chat = client.groups.get(id)

        post = chat.post(text=f"\nArutz Sheva update:\n{update}")




while True:
    text = get_wepage()
    update = get_latest_update(text)
    check_if_new(update)
    time.sleep(60)

Update: Turns out its because of Cloudflare... How do I avoid this?

How did you establish that it's Cloudflare issue? Are you getting any errors (warnings are not errors, usually)? Also -- I see you upgraded your account yesterday in the morning. Did you try your code in a freshly created console after the upgrade?

I'm running it as a PythonAnywhere task. I know its a cloudflare issue because the response I was getting (I worked out some logging situation), it said that my ip had be blocked. But the same thing running on my computer worked perfectly.

Making a new task didn't help, the new IP just got banned, too.

Right now I'm using Selenium but it eats up cpu seconds.

If you're doing stuff that gets the IP banned, that suggests that you should not be doing that. If they have set up CloudFlare to prevent the scraping of their site, we are not going to help you to circumvent it.