Forums

Python: flags=re.DOTALL regex search and replace doesn't work for parsing data

I want to make a simple parsing, from 2 html tags:

<title>Love Stars</title>

<meta name="Subject" content="MERCY"/>

must become:

<title>Love Stars</title>

<meta name="Subject" content="Love Stars"/>

My code is almost good, doesn't get any errors, but the parsing is not made. I believe something is not good at flags=re.DOTALL section. Can anyone help me?

import requests
import re

english_folder1 = r"c:\test\test\2\1"

extension_file = ".html"

use_parse_folder = True

import os

en1_directory = os.fsencode(english_folder1)

print('Going through english folder')
for file in os.listdir(en1_directory):
    filename = os.fsdecode(file)
    print(filename)
    if filename == 'y_key_e479323ce281e459.html' or filename == 'TS_4fg4_tr78.html': #ignore this files
        continue
    if filename.endswith(extension_file):
        with open(os.path.join(english_folder1, filename), encoding='utf-8') as html:
            html = html.read()

            try:
                with open(os.path.join(english_folder1, filename), encoding='utf-8') as en_html:
                    en_html = en_html.read()


                try:
                    parse_1 = re.search('<title>.+</title>', html, re.DOTALL)[0]
                    en_html = re.sub('<meta name="Subject" content=".+"', parse_1, html, re.DOTALL)

                except:
                        pass



            except FileNotFoundError:
                continue

        print(f'{filename} parsed')
        if use_parse_folder:
            try:
                with open(os.path.join(english_folder1+r'\parsed', 'parsed_'+filename), 'w', encoding='utf-8') as new_html:
                    new_html.write(en_html)
            except:
                os.mkdir(english_folder1+r'\parsed')
                with open(os.path.join(english_folder1+r'\parsed', 'parsed_'+filename), 'w', encoding='utf-8') as new_html:
                    new_html.write(en_html)
        else:
            with open(os.path.join(english_folder1, 'parsed_'+filename), 'w', encoding='utf-8') as html:
                html.write(en_html)

c:\test\test\2\1 does not look like anything on PythonAnywhere. We can help you with our services, but with general questions like that you need to ask on more general forums.

c:\test\test\2\1 that is just a folder, with a html file that contains this 2 lines:

<title>Love Stars</title>

<meta name="Subject" content="MERCY"/>

you can change it to C:\Test but not here is the problem...

There is no C: on PythonAnywhere