Failing Requests to Fetch Source from a Facebook Page : Forums : PythonAnywhere

Failing Requests to Fetch Source from a Facebook Page

I have a simple Flask app that extracts post and profile id from the source code of a Facebook page. User enters the FB post's URL to input box, application makes a request to facebook to fetch the source, then process it with beautifulsoup. The rest is handled locally by regex.

def extract_post_link_from_posts(self, link):
        link = link.replace("m.f", "www.f")
        res = requests.get(link, headers = self.headers)
        soup_data = BeautifulSoup(res.text, 'html.parser')

        match = re.search(self.link_pattern, soup_data.prettify())
        if match:
            link = match.group()

            link2 = link.strip('link href=').strip('"').split('/')

            link3 = 'https://www.facebook.com/{}/posts/{}/'.format(link2[3],link2[6])
            result = link3
        return result

This is the function. When I use it locally on my PC it works just great. However, when I try to use it on my Pythonanywhere website, the request fails and requests.get only returns page source of login page. I use headers as you can see. I tried several things but can't get it to work.

ulassahillioglu | 7 posts | July 9, 2023, 8:50 p.m. | permalink

What do you see in your logs?

nkahr | 219 posts | PythonAnywhere staff | July 10, 2023, 2:39 p.m. | permalink

I see nothing in the logs, because it does not return an error. I have try-except blocks and the function returns "local variable 'result' referenced before assignment". Nothing in the error log. I tried creating json file for each response and I see that the response is HTML code of login page. It cannot make requests even if the post is public.

ulassahillioglu | 7 posts | July 14, 2023, 6:36 p.m. | permalink

Have you tried using the facebook api such as this https://facebook-sdk.readthedocs.io/en/latest/api.html? Rather than scraping the page

sboyd | 279 posts | PythonAnywhere staff | July 15, 2023, 10:34 a.m. | permalink

I tried a workaround with it, however; the API needs the post_id to fetch the information. The aim of my program is to find post and profile id. So it's not working for me.

ulassahillioglu | 7 posts | July 15, 2023, 4:02 p.m. | permalink

There is also permission problem with the API. You cannot get information of the user without their permission.

ulassahillioglu | 7 posts | July 15, 2023, 5:05 p.m. | permalink

What error message do you see when the request fails?

nkahr | 219 posts | PythonAnywhere staff | July 16, 2023, 8:14 a.m. | permalink

Exception block return "local variable 'result' referenced before assignment". I tried write the HTML source of the response into a file and what I see was the HTML source of the login page

ulassahillioglu | 7 posts | July 16, 2023, 1:55 p.m. | permalink

You're returning result, which is only defined inside your if statement, so it will fail if match doesn't exist.

nkahr | 219 posts | PythonAnywhere staff | July 17, 2023, 4:10 p.m. | permalink

I know that, there is an "else" statement which is for this kind of situation. The problem is, when I do the exact same thing on my PC(locally), it works and result returns the requested information. It just cannot fetch the requested response on deployed app because Facebook refuses the request, which I am trying to solve. I tried mechanize library, I tried proxies, headers etc. Nothing seems to work

ulassahillioglu | 7 posts | July 17, 2023, 5:38 p.m. | permalink

Looks like it is not something we could help with as it works as Facebook intended it to work.

fjl | 4348 posts | PythonAnywhere staff | July 18, 2023, 9:24 a.m. | permalink

Well, thanks anyway. I assume it's something with Facebook's policy. They think I'm a bot.. Still, thank you so much ^^

ulassahillioglu | 7 posts | July 18, 2023, 10:01 a.m. | permalink

Let us know if you find something.

fjl | 4348 posts | PythonAnywhere staff | July 18, 2023, 10:18 a.m. | permalink