Forums

Cannot Figure Out Why Selenium + Scraper + Pandas Script does not Run as a Task

Dear All,

I am trying to run the following code:

import pandas as pd
from bs4 import BeautifulSoup
from datetime import datetime
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--ignore-certificate-errors")
browser = webdriver.Chrome(options=chrome_options)

def scroll_page(driver, scroll_pause_time=1.5, max_scrolls=10):
    scroll_count = 0
    while scroll_count < max_scrolls:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        scroll_count += 1
        driver.implicitly_wait(scroll_pause_time)

driver = webdriver.Chrome(options=chrome_options)
url = 'https://bettingtips1x2.com/'
driver.get(url)

# Scroll through the page to load all content
scroll_page(driver)

# Parse the page source with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')

# Find the table with class "results"
table = soup.find('table', class_='results')

# Find all the rows (tr) inside the table and extract data
table_data = []
for tr in table.find_all('tr'):
    row_data = [td.text for td in tr.find_all('td')]
    table_data.append(row_data)

# Explicitly kill the Chrome process
driver.service.stop()

# Create a DataFrame from the scraped data
df = pd.DataFrame(table_data)

# Drop rows with missing values
df = df.dropna()

# Split the '3' column into two columns based on the hyphen separator
# Split the 'Event Name' based on 'v' and expand into multiple columns
split_columns = df[3].str.split(' - ', expand=True)
split_score = df[7].str.split(':', expand=True)

# Assign the split columns to the desired column names
df['Home Team'] = split_columns[0]           # The first part before 'v' becomes 'Home Team'
df['Away Team'] = split_columns[1]           # The second part after 'v' becomes 'Away Team'
df['Home Score'] = split_score[0]           # The first part before 'v' becomes 'Home Team'
df['Away Score'] = split_score[1]           # The second part after 'v' becomes 'Away Team'

# Get the current system date
sysdate = datetime.now()

# Create the new 'Date' column with the value of "THISYEAR-SYSDATE"
df['Date'] = f"{sysdate.year}-{sysdate.month:02d}-{sysdate.day:02d}"
df["Home Score"] = df["Home Score"].astype(int)
df["Away Score"] = df["Away Score"].astype(int)

# Create the "Predicted Result" column based on the logic
def get_predicted_result(row):
    if row["Home Score"] > row["Away Score"]:
        return row["Home Team"]
    elif row["Away Score"] > row["Home Score"]:
        return row["Away Team"]
    else:
        return "Draw"

df["Predicted Result"] = df.apply(get_predicted_result, axis=1)

# Drop the unwanted columns
df = df.drop(columns=[3, 7, 'Home Score', 'Away Score'])

# Reorder the columns as 'Date', 'Home Team', 'Away Team', 'Predicted Result'
df = df[['Date', 'Home Team', 'Away Team', 'Predicted Result']]

bet1x2_results = df
print("Success")

It seems to be working fine on my local notebook but not on here. Please help.

What happens when you run it?

.

driver = webdriver.Chrome(options=chrome_options)
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/chrome/webdriver.py", line 70, in __init__
    super(WebDriver, self).__init__(DesiredCapabilities.CHROME['browserName'], "goog",
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/chromium/webdriver.py", line 92, in __init__
    RemoteWebDriver.__init__(
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 275, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 365, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 430, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created
from tab crashed
  (Session info: headless chrome=90.0.4430.212)
Stacktrace:
#0 0x557d0a7b4e89 <unknown>

[edit by admin: formatting]

It sounds like Chrome itself has crashed. Do you have a lot of processes running if you look at the table at the bottom of the "consoles" page?

Hi @Giles, It is about this: WARNING:root:Can not find chromedriver for currently installed chrome version. WARNING:selenium.webdriver.common.selenium_manager:The chromedriver version (90.0.4430.24) detected in PATH at /usr/local/bin/chromedriver might not be compatible with the detected chrome version (115.0.5790.170); currently, chromedriver 115.0.5790.170 is recommended for chrome 115.*, so it is advised to delete the driver in PATH and retry Traceback (most recent call last): File "/home/EscadeSupremo/mysite/whoscored.py", line 11, in <module> driver = webdriver.Chrome() File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in init super().init( File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/chromium/webdriver.py", line 56, in init super().init( File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 206, in init self.start_session(capabilities) File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 290, in start_session response = self.execute(Command.NEW_SESSION, caps)["value"] File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 345, in execute self.error_handler.check_response(response) File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: crashed. (unknown error: DevToolsActivePort file doesn't exist) (The process started from chrome location /home/EscadeSupremo/.cache/selenium/chrome/linux64/115.0.5790.170/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.) Stacktrace:

0 0x561148de9e89 <unknown>

Any idea on how to resolve this?

Hi @Giles, It is about this: WARNING:root:Can not find chromedriver for currently installed chrome version. WARNING:selenium.webdriver.common.selenium_manager:The chromedriver version (90.0.4430.24) detected in PATH at /usr/local/bin/chromedriver might not be compatible with the detected chrome version (115.0.5790.170); currently, chromedriver 115.0.5790.170 is recommended for chrome 115.*, so it is advised to delete the driver in PATH and retry Traceback (most recent call last): File "/home/EscadeSupremo/mysite/whoscored.py", line 11, in <module> driver = webdriver.Chrome() File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in init super().init( File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/chromium/webdriver.py", line 56, in init super().init( File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 206, in init self.start_session(capabilities) File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 290, in start_session response = self.execute(Command.NEW_SESSION, caps)["value"] File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 345, in execute self.error_handler.check_response(response) File "/home/EscadeSupremo/.local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: crashed. (unknown error: DevToolsActivePort file doesn't exist) (The process started from chrome location /home/EscadeSupremo/.cache/selenium/chrome/linux64/115.0.5790.170/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.) Stacktrace:

0 0x561148de9e89 <unknown>

Any idea on how to resolve this?

When you installed selenium did you include the --user flag? As detailed here https://help.pythonanywhere.com/pages/selenium/

No, How do I revert this?

pip uninstall selenium

pip install --user selenium