Forums

Failing Requests to Fetch Source from a Facebook Page

I have a simple Flask app that extracts post and profile id from the source code of a Facebook page. User enters the FB post's URL to input box, application makes a request to facebook to fetch the source, then process it with beautifulsoup. The rest is handled locally by regex.

def extract_post_link_from_posts(self, link):
        link = link.replace("m.f", "www.f")
        res = requests.get(link, headers = self.headers)
        soup_data = BeautifulSoup(res.text, 'html.parser')

        match = re.search(self.link_pattern, soup_data.prettify())
        if match:
            link = match.group()

            link2 = link.strip('link href=').strip('"').split('/')

            link3 = 'https://www.facebook.com/{}/posts/{}/'.format(link2[3],link2[6])
            result = link3
        return result

This is the function. When I use it locally on my PC it works just great. However, when I try to use it on my Pythonanywhere website, the request fails and requests.get only returns page source of login page. I use headers as you can see. I tried several things but can't get it to work.

What do you see in your logs?

I see nothing in the logs, because it does not return an error. I have try-except blocks and the function returns "local variable 'result' referenced before assignment". Nothing in the error log. I tried creating json file for each response and I see that the response is HTML code of login page. It cannot make requests even if the post is public.

Have you tried using the facebook api such as this https://facebook-sdk.readthedocs.io/en/latest/api.html? Rather than scraping the page

I tried a workaround with it, however; the API needs the post_id to fetch the information. The aim of my program is to find post and profile id. So it's not working for me.

There is also permission problem with the API. You cannot get information of the user without their permission.

What error message do you see when the request fails?

Exception block return "local variable 'result' referenced before assignment". I tried write the HTML source of the response into a file and what I see was the HTML source of the login page

You're returning result, which is only defined inside your if statement, so it will fail if match doesn't exist.

I know that, there is an "else" statement which is for this kind of situation. The problem is, when I do the exact same thing on my PC(locally), it works and result returns the requested information. It just cannot fetch the requested response on deployed app because Facebook refuses the request, which I am trying to solve. I tried mechanize library, I tried proxies, headers etc. Nothing seems to work

Looks like it is not something we could help with as it works as Facebook intended it to work.

Well, thanks anyway. I assume it's something with Facebook's policy. They think I'm a bot.. Still, thank you so much ^^

Let us know if you find something.