This code was working for a week when I was searching Amazon on a test code:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
browser = webdriver.Chrome(chrome_options=chrome_options)
try:
browser.get("https://www.amazon.com")
print( "Page title was '{}'".format(browser.title))
try:
search_box = driver.find_element_by_id('twotabsearchtextbox').send_keys("Pasta")
search_button = browser.find_element_by_id("nav-search-submit-text").click()
browser.implicitly_wait(5)
num_page = browser.find_element_by_xpath('//*[@class="a-pagination"]/li[6]')
browser.implicitly_wait(5)
url_list = []
for i in range(int(num_page.text)):
page_ = i + 1
url_list.append(browser.current_url)
browser.implicitly_wait(4)
click_next = browser.find_element_by_class_name('a-last').click()
print("Page " + str(page_) + " grabbed")
except NoSuchElementException:
print("ERROR")
num_page = browser.find_element_by_class_name('a-last').click()
finally:
browser.quit()
Then it stopped working. I assume this is Amazon blocking the request but I'm not sure. It was working and I gues its not a total waste since I learned some Xpath and Selenium in the process.
I guess I can use requests and rotate headers Scrapy?