Forums

Selenium and Chrome Headless Browser

This code was working for a week when I was searching Amazon on a test code:

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
browser = webdriver.Chrome(chrome_options=chrome_options)

try:
browser.get("https://www.amazon.com")
print( "Page title was '{}'".format(browser.title))

try:
  search_box = driver.find_element_by_id('twotabsearchtextbox').send_keys("Pasta")
  search_button = browser.find_element_by_id("nav-search-submit-text").click()
  browser.implicitly_wait(5)
  num_page = browser.find_element_by_xpath('//*[@class="a-pagination"]/li[6]')
  browser.implicitly_wait(5)

  url_list = []

  for i in range(int(num_page.text)):
    page_ = i + 1
    url_list.append(browser.current_url)
    browser.implicitly_wait(4)
    click_next = browser.find_element_by_class_name('a-last').click()
    print("Page " + str(page_) + " grabbed")

except NoSuchElementException:
    print("ERROR")
    num_page = browser.find_element_by_class_name('a-last').click()


finally:

browser.quit()

Then it stopped working. I assume this is Amazon blocking the request but I'm not sure. It was working and I gues its not a total waste since I learned some Xpath and Selenium in the process.

I guess I can use requests and rotate headers Scrapy?

Its possible that they are, yes -- they tend to not like being scraped. You could perhaps make sure by using the function browser.get_screenshot_as_file(filename) to get a screenshot of the page that it is showing at the point that it fails and see if there's anything interesting.