starting a browser in a function and using it elsewhere... : Forums : PythonAnywhere

starting a browser in a function and using it elsewhere...

So what i am trying to do is create a function that sets and starts my headless browser with a new proxy. And then call that function and go to a website. Sometimes I get a connection error, and while I have figured out how to re-try, that connection error will not be solved unless i have restarted the browser. Hence, why i was looking to create a setbrowser() function to call in the Exception case, which will restart the browser with a new proxy and TRY AGAIN to pull the URL.

See code below:

#!/usr/bin/env python2.7
import datetime
from selenium import webdriver
#import sys
from pyvirtualdisplay import Display
#import csv
from selenium.common.exceptions import NoSuchElementException
import time
import requests

def startbrowser():
    browser = None
    display = None
    display = Display(visible=0, size=(800, 600))

    ### using proxicity to get the proxy values ###

    for attempt in range(1,20):
        try:
            API = "value"
            url = "https://www.proxicity.io/api/v1/{}/proxy?isAnonymous=true&country=US".format(API)
            url_proxicity = "https://www.proxicity.io"
            p = requests.get(url_proxicity)

            if p.status_code == 200:
                r = requests.get(url).json()
                proxy_ip = r["ip"]
                proxy_port = r["port"]
                print(proxy_ip)

                profile = webdriver.FirefoxProfile()
                profile.set_preference("network.proxy.type", 1)
                profile.set_preference("network.proxy.http", proxy_ip)
                profile.set_preference("network.proxy.http_port", proxy_port)
                profile.update_preferences()
                display.start()
                browser = webdriver.Firefox(firefox_profile=profile)
                break
            else:
                display.start()
                browser = webdriver.Firefox()
                print("browser is set")
                break
        except Exception as e:
            print(e," Attempt:" + str(attempt))
            time.sleep(10)
            pass
        ### Proxicity Complete ###
    return display
    return browser

startbrowser()


try:
    for i in range(1,3):

        url = "http://www.google.com"

        for attempt in range(1,5):
            try:
                browser.get(url)
                print("browser success",i)
                break
            except Exception as e:
                print(e," Attempt:" + str(attempt))
                time.sleep(8)
                startbrowser()
                pass

finally:
    if browser is not None:
        browser.quit()
    if display is not None:
        display.stop()

[edit by admin: formatting]

deleted-user-1632074 | 7 posts | Sept. 16, 2016, 10:10 p.m. | permalink

There are a couple of problems with that code - I'd actually be very surprised if it worked if it's exactly as you posted it. The browser and display variables you are defining in the startbrowser function are local to that function, and although you're returning display at the end of the function, that means that the return browser will be skipped (return jumps right out of the function, so the following line will be ignored) and you're not storing the result of your call to startbrowser anyway. It also looks like you're not quitting the browser and stopping the display in cases where for some reason you're not able to get the URL in your main loop.

The following code fixes those problems:

#!/usr/bin/env python2.7
import datetime
from selenium import webdriver
#import sys
from pyvirtualdisplay import Display
#import csv
from selenium.common.exceptions import NoSuchElementException
import time
import requests

def startbrowser():
    browser = None
    display = Display(visible=0, size=(800, 600))

    ### using proxicity to get the proxy values ###

    for attempt in range(1,20):
        try:
            API = "value"
            url = "https://www.proxicity.io/api/v1/{}/proxy?isAnonymous=true&country=US".format(API)
            url_proxicity = "https://www.proxicity.io"
            p = requests.get(url_proxicity)

            if p.status_code == 200:
                r = requests.get(url).json()
                proxy_ip = r["ip"]
                proxy_port = r["port"]
                print(proxy_ip)

                profile = webdriver.FirefoxProfile()
                profile.set_preference("network.proxy.type", 1)
                profile.set_preference("network.proxy.http", proxy_ip)
                profile.set_preference("network.proxy.http_port", proxy_port)
                profile.update_preferences()

                display.start()
                browser = webdriver.Firefox(firefox_profile=profile)

                return display, browser
            else:
                display.start()
                browser = webdriver.Firefox()
                print("browser is set")
                return display, browser

        except Exception as e:
            print(e," Attempt:" + str(attempt))
            time.sleep(10)
            pass
        ### Proxicity Complete ###

    return None, None



for i in range(1,3):

    url = "http://www.google.com"
    display, browser = startbrowser()

    if display is not None and browser is not None:
        try:
            for attempt in range(1,5):
                try:
                    browser.get(url)
                    print("browser success",i)
                    break
                except Exception as e:
                    print(e," Attempt:" + str(attempt))
                    time.sleep(8)
        finally:
            if browser is not None:
                browser.quit()
            if display is not None:
                display.stop()

giles | 11788 posts | PythonAnywhere staff | Sept. 17, 2016, 1:04 p.m. | permalink

thank you giles. i was looking around elsewhere, and found an idea of doing the startbrower() as a class rather than just a function? Do you think that would make sense too?

#!/usr/bin/env python2.7
import datetime
from selenium import webdriver
#import sys
from pyvirtualdisplay import Display
#import csv
from selenium.common.exceptions import NoSuchElementException
import time
import requests
#import socket


class BrowserStart():
    """docstring for ClassName"""
    def __init__(self):
        self.browser = None
        self.display = None
        self.display = Display(visible=0, size=(800, 600))

        ### Proxicity Call###

        for attempt in range(1,20):
            try:
                API = "value"
                url = "https://www.proxicity.io/api/v1/{}/proxy?isAnonymous=true&country=US".format(API)
                url_proxicity = "https://www.proxicity.io"
                p = requests.get(url_proxicity)

                if p.status_code == 200:
                    r = requests.get(url).json()
                    proxy_ip = r["ip"]
                    proxy_port = r["port"]
                    print(proxy_ip)

                    profile = webdriver.FirefoxProfile()
                    profile.set_preference("network.proxy.type", 1)
                    profile.set_preference("network.proxy.http", proxy_ip)
                    profile.set_preference("network.proxy.http_port", proxy_port)
                    profile.update_preferences()
                    self.display.start()
                    self.browser = webdriver.Firefox(firefox_profile=profile)
                    break
                else:
                    self.display.start()
                    self.browser = webdriver.Firefox()
                    print("browser is set")
                    break
            except Exception as e:
                print(e," Attempt:" + str(attempt))
                time.sleep(10)
                pass
            ### Proxicity Complete ###



def maincode():
    driver = BrowserStart()


    ### start code ###
    try:
        for i in range(7500,7514):

            url = "https://www.pythonanywhere/forums/topic/={}".format(i)

            for attempt in range(1,20):
                try:
                    driver.browser.get(url)
                            print("browser success",i)

deleted-user-1632074 | 7 posts | Sept. 18, 2016, 7:44 p.m. | permalink

It doesn't make sense to put your scraping code inside of __init__ I would suggest just running your code and seeing if it errors.

conrad | 4232 posts | PythonAnywhere staff | Sept. 19, 2016, 12:36 p.m. | permalink