Forums

Extremely slow INSERT query

I'm adding data to a database on PythonAnywhere. When I add the data it takes approximately 50 seconds to add 275 rows, data in json is 750kb so total likely a bit less than that.

When I use the same script to add the identical data to a database on my local machine it takes 0.2 seconds. Playing with batch sized between commits reduced the total time from 75 seconds to 50 seconds, but that's the best I've got. It's for a web scraper, when the full thing is running it could have a few hundred to a few thousand rows to add/remove, so it's not remotely tenable to leave it like this. With performance this poor I must be doing something wrong, but I can't see what it is. The timer is wrapped around the insert query, it doesn't include the ssh connection time etc.

What's going on here?

The table columns are as follows, so far now indexes being used:

"listingID int PRIMARY KEY AUTO_INCREMENT, types VARCHAR(11), town VARCHAR(255), postcode CHAR(5), price INT UNSIGNED, agent VARCHAR(50), ref VARCHAR(30), bedrooms SMALLINT UNSIGNED, rooms SMALLINT UNSIGNED, plot MEDIUMINT UNSIGNED, size MEDIUMINT UNSIGNED, link_url VARCHAR(1024), description VARCHAR(14000), photos TEXT, photos_hosted TEXT, gps POINT, id VARCHAR(80), types_original VARCHAR(30)"

The functions adding the data are below. As I said it completes in 0.2 seconds locally so I don't think I'm doing anything too bad there, even if it's not perfect. listings is a list of dictionaries. The weird use of dictionaries for the inserts is because I was using named placeholders with mysql.connector, but I had to switch to MySQLdb due to another problem with accessing the database, and it doesn't permit named placeholders.

def insert_data_to_table(cursor, table_name, columns_list, values_dict, gps_string):
# Creates a string of csv %s placeholders for the values to be inserted
placeholders = ", ".join(f"%s" for _ in values_dict.keys())

# Creates a string of columns to be inserted
columns = ", ".join(columns_list)

# If GPS data is provided, insert specific string into query. Else build query without GPS sub string.
if gps_string:
    insert_query = f"INSERT INTO {table_name} (gps, {columns}) VALUES ({gps_string}, {placeholders})"
else:
    insert_query = f"INSERT INTO {table_name} ({columns}) VALUES ({placeholders})"

cursor.execute(insert_query, tuple(values_dict.values()))
# db.commit()


def add_listings(cursor, listings):
    print("Adding data...")
    counter = 0
    for listing in listings:
    # Committing every 50 rows rather than after each row reduces time by approx 33%, minimal differences 
     between 20, 50, 100 etc.
        if counter > 50:
            print("Commit", counter)
            db.commit()
            counter = 0
        columns_list = []

        values_dict = {key: None for key in listing if key != "gps"}

        if listing.get("gps") is None:
            gps_string = None
        elif isinstance(listing.get("gps"), list):
            gps_string = f"ST_GeomFromText('POINT({round(listing['gps'][0], 6)} {round(listing['gps'][1], 6)})', 4326)"

        for key in values_dict:
            if isinstance(listing.get(key), list):
                values_dict[key] = ":;:".join([str(x) for x in listing[key]])
            else:
                values_dict[key] = listing.get(key)
            columns_list.append(key)

        insert_data_to_table(cursor, table_name, columns_list, values_dict, gps_string)
        counter += 1
    db.commit()

Your database is hosted on a shared infrastructure and the server your database is on is quite busy recently. Is the issue constant or do you experience it sporadically?

The issue is constant, I've run it at several different points during the day including weekends. Each time it completes in 49.5s - 51.5s.

I've added logging for each row and the the time for each one is pretty consistent, they don't come through in bursts. Always between 4 and 6 rows added per second.

Below is the logs for adding the data, showing each row added, each time a commit is performed, and the time taken for the whole transfer. The timer starts right after "We're in!" and before "Adding data...", after the SSH tunnel is established and connection is made to the database.

Thanks for looking into this.

DEBUG 2023-08-26 13:48:16,710 - Secsh channel 0 opened.
We're in!
Adding data...
INFO 2023-08-26 13:48:17,373 - --------------- Row added
INFO 2023-08-26 13:48:17,597 - --------------- Row added
INFO 2023-08-26 13:48:17,866 - --------------- Row added
INFO 2023-08-26 13:48:17,980 - --------------- Row added
INFO 2023-08-26 13:48:18,098 - --------------- Row added
INFO 2023-08-26 13:48:18,215 - --------------- Row added
INFO 2023-08-26 13:48:18,480 - --------------- Row added
INFO 2023-08-26 13:48:18,593 - --------------- Row added
INFO 2023-08-26 13:48:18,861 - --------------- Row added
INFO 2023-08-26 13:48:19,128 - --------------- Row added
INFO 2023-08-26 13:48:19,246 - --------------- Row added
INFO 2023-08-26 13:48:19,520 - --------------- Row added
INFO 2023-08-26 13:48:19,788 - --------------- Row added
INFO 2023-08-26 13:48:20,057 - --------------- Row added
INFO 2023-08-26 13:48:20,160 - --------------- Row added
INFO 2023-08-26 13:48:20,429 - --------------- Row added
INFO 2023-08-26 13:48:20,662 - --------------- Row added
INFO 2023-08-26 13:48:20,931 - --------------- Row added
INFO 2023-08-26 13:48:21,191 - --------------- Row added
INFO 2023-08-26 13:48:21,464 - --------------- Row added
INFO 2023-08-26 13:48:21,588 - --------------- Row added
INFO 2023-08-26 13:48:21,851 - --------------- Row added
INFO 2023-08-26 13:48:22,113 - --------------- Row added
INFO 2023-08-26 13:48:22,390 - --------------- Row added
INFO 2023-08-26 13:48:22,644 - --------------- Row added
INFO 2023-08-26 13:48:22,762 - --------------- Row added
INFO 2023-08-26 13:48:22,991 - --------------- Row added
INFO 2023-08-26 13:48:23,222 - --------------- Row added
INFO 2023-08-26 13:48:23,490 - --------------- Row added
INFO 2023-08-26 13:48:23,768 - --------------- Row added
INFO 2023-08-26 13:48:24,041 - --------------- Row added
INFO 2023-08-26 13:48:24,302 - --------------- Row added
INFO 2023-08-26 13:48:24,582 - --------------- Row added
INFO 2023-08-26 13:48:24,695 - --------------- Row added
INFO 2023-08-26 13:48:24,970 - --------------- Row added
INFO 2023-08-26 13:48:25,086 - --------------- Row added
INFO 2023-08-26 13:48:25,196 - --------------- Row added
INFO 2023-08-26 13:48:25,312 - --------------- Row added
INFO 2023-08-26 13:48:25,428 - --------------- Row added
INFO 2023-08-26 13:48:25,696 - --------------- Row added
INFO 2023-08-26 13:48:25,969 - --------------- Row added
INFO 2023-08-26 13:48:26,083 - --------------- Row added
INFO 2023-08-26 13:48:26,348 - --------------- Row added
INFO 2023-08-26 13:48:26,611 - --------------- Row added
INFO 2023-08-26 13:48:26,888 - --------------- Row added
INFO 2023-08-26 13:48:26,995 - --------------- Row added
INFO 2023-08-26 13:48:27,273 - --------------- Row added
INFO 2023-08-26 13:48:27,657 - --------------- Row added
INFO 2023-08-26 13:48:27,925 - --------------- Row added
INFO 2023-08-26 13:48:28,144 - --------------- Row added
INFO 2023-08-26 13:48:28,424 - --------------- Row added
INFO 2023-08-26 13:48:28,424 - Commit
INFO 2023-08-26 13:48:28,659 - --------------- Row added
INFO 2023-08-26 13:48:28,926 - --------------- Row added
INFO 2023-08-26 13:48:29,189 - --------------- Row added
INFO 2023-08-26 13:48:29,311 - --------------- Row added
INFO 2023-08-26 13:48:29,582 - --------------- Row added
INFO 2023-08-26 13:48:29,835 - --------------- Row added
INFO 2023-08-26 13:48:30,075 - --------------- Row added
INFO 2023-08-26 13:48:30,194 - --------------- Row added
INFO 2023-08-26 13:48:30,419 - --------------- Row added
INFO 2023-08-26 13:48:30,689 - --------------- Row added
INFO 2023-08-26 13:48:30,903 - --------------- Row added
INFO 2023-08-26 13:48:31,027 - --------------- Row added
INFO 2023-08-26 13:48:31,145 - --------------- Row added
INFO 2023-08-26 13:48:31,260 - --------------- Row added
INFO 2023-08-26 13:48:31,529 - --------------- Row added
INFO 2023-08-26 13:48:31,756 - --------------- Row added
INFO 2023-08-26 13:48:32,034 - --------------- Row added
INFO 2023-08-26 13:48:32,143 - --------------- Row added
INFO 2023-08-26 13:48:32,369 - --------------- Row added
INFO 2023-08-26 13:48:32,602 - --------------- Row added
INFO 2023-08-26 13:48:32,866 - --------------- Row added
INFO 2023-08-26 13:48:33,103 - --------------- Row added
INFO 2023-08-26 13:48:33,209 - --------------- Row added
INFO 2023-08-26 13:48:33,327 - --------------- Row added
INFO 2023-08-26 13:48:33,601 - --------------- Row added
INFO 2023-08-26 13:48:33,717 - --------------- Row added
INFO 2023-08-26 13:48:33,942 - --------------- Row added
INFO 2023-08-26 13:48:34,211 - --------------- Row added
INFO 2023-08-26 13:48:34,427 - --------------- Row added
INFO 2023-08-26 13:48:34,867 - --------------- Row added
INFO 2023-08-26 13:48:34,977 - --------------- Row added
INFO 2023-08-26 13:48:35,094 - --------------- Row added
INFO 2023-08-26 13:48:35,355 - --------------- Row added
INFO 2023-08-26 13:48:35,625 - --------------- Row added
INFO 2023-08-26 13:48:35,887 - --------------- Row added
INFO 2023-08-26 13:48:36,125 - --------------- Row added
INFO 2023-08-26 13:48:36,391 - --------------- Row added
INFO 2023-08-26 13:48:36,657 - --------------- Row added
INFO 2023-08-26 13:48:36,891 - --------------- Row added
INFO 2023-08-26 13:48:37,157 - --------------- Row added
INFO 2023-08-26 13:48:37,276 - --------------- Row added
INFO 2023-08-26 13:48:37,503 - --------------- Row added
INFO 2023-08-26 13:48:37,720 - --------------- Row added
INFO 2023-08-26 13:48:37,826 - --------------- Row added
INFO 2023-08-26 13:48:38,065 - --------------- Row added
INFO 2023-08-26 13:48:38,284 - --------------- Row added
INFO 2023-08-26 13:48:38,520 - --------------- Row added
INFO 2023-08-26 13:48:38,782 - --------------- Row added
INFO 2023-08-26 13:48:39,058 - --------------- Row added
INFO 2023-08-26 13:48:39,174 - --------------- Row added
INFO 2023-08-26 13:48:39,400 - --------------- Row added
INFO 2023-08-26 13:48:39,400 - Commit
INFO 2023-08-26 13:48:39,765 - --------------- Row added
INFO 2023-08-26 13:48:40,002 - --------------- Row added
INFO 2023-08-26 13:48:40,217 - --------------- Row added
INFO 2023-08-26 13:48:40,451 - --------------- Row added
INFO 2023-08-26 13:48:40,719 - --------------- Row added
INFO 2023-08-26 13:48:40,982 - --------------- Row added
INFO 2023-08-26 13:48:41,249 - --------------- Row added
INFO 2023-08-26 13:48:41,515 - --------------- Row added
INFO 2023-08-26 13:48:41,751 - --------------- Row added
INFO 2023-08-26 13:48:41,858 - --------------- Row added
INFO 2023-08-26 13:48:41,974 - --------------- Row added
INFO 2023-08-26 13:48:42,205 - --------------- Row added
INFO 2023-08-26 13:48:42,466 - --------------- Row added
INFO 2023-08-26 13:48:42,740 - --------------- Row added
INFO 2023-08-26 13:48:42,966 - --------------- Row added
INFO 2023-08-26 13:48:43,092 - --------------- Row added
INFO 2023-08-26 13:48:43,318 - --------------- Row added
INFO 2023-08-26 13:48:43,550 - --------------- Row added
INFO 2023-08-26 13:48:43,805 - --------------- Row added
INFO 2023-08-26 13:48:44,047 - --------------- Row added
INFO 2023-08-26 13:48:44,274 - --------------- Row added
INFO 2023-08-26 13:48:44,499 - --------------- Row added
INFO 2023-08-26 13:48:44,769 - --------------- Row added
INFO 2023-08-26 13:48:45,030 - --------------- Row added
INFO 2023-08-26 13:48:45,254 - --------------- Row added
INFO 2023-08-26 13:48:45,484 - --------------- Row added
INFO 2023-08-26 13:48:45,716 - --------------- Row added
INFO 2023-08-26 13:48:45,983 - --------------- Row added
INFO 2023-08-26 13:48:46,201 - --------------- Row added
INFO 2023-08-26 13:48:46,472 - --------------- Row added
INFO 2023-08-26 13:48:46,591 - --------------- Row added
INFO 2023-08-26 13:48:46,866 - --------------- Row added
INFO 2023-08-26 13:48:47,122 - --------------- Row added
INFO 2023-08-26 13:48:47,401 - --------------- Row added
INFO 2023-08-26 13:48:47,623 - --------------- Row added
INFO 2023-08-26 13:48:47,740 - --------------- Row added
INFO 2023-08-26 13:48:48,004 - --------------- Row added
INFO 2023-08-26 13:48:48,233 - --------------- Row added
INFO 2023-08-26 13:48:48,358 - --------------- Row added
INFO 2023-08-26 13:48:48,586 - --------------- Row added
INFO 2023-08-26 13:48:48,814 - --------------- Row added
INFO 2023-08-26 13:48:49,081 - --------------- Row added
INFO 2023-08-26 13:48:49,299 - --------------- Row added
INFO 2023-08-26 13:48:49,536 - --------------- Row added
INFO 2023-08-26 13:48:49,749 - --------------- Row added
INFO 2023-08-26 13:48:49,982 - --------------- Row added
INFO 2023-08-26 13:48:50,247 - --------------- Row added
INFO 2023-08-26 13:48:50,483 - --------------- Row added
INFO 2023-08-26 13:48:50,703 - --------------- Row added
INFO 2023-08-26 13:48:50,932 - --------------- Row added
INFO 2023-08-26 13:48:51,170 - --------------- Row added
INFO 2023-08-26 13:48:51,171 - Commit
INFO 2023-08-26 13:48:51,414 - --------------- Row added
INFO 2023-08-26 13:48:51,640 - --------------- Row added
INFO 2023-08-26 13:48:51,865 - --------------- Row added
INFO 2023-08-26 13:48:52,121 - --------------- Row added
INFO 2023-08-26 13:48:52,348 - --------------- Row added
INFO 2023-08-26 13:48:52,621 - --------------- Row added
INFO 2023-08-26 13:48:52,740 - --------------- Row added
INFO 2023-08-26 13:48:52,965 - --------------- Row added
INFO 2023-08-26 13:48:53,183 - --------------- Row added
INFO 2023-08-26 13:48:53,414 - --------------- Row added
INFO 2023-08-26 13:48:53,537 - --------------- Row added
INFO 2023-08-26 13:48:53,640 - --------------- Row added
INFO 2023-08-26 13:48:53,765 - --------------- Row added
INFO 2023-08-26 13:48:53,982 - --------------- Row added
INFO 2023-08-26 13:48:54,247 - --------------- Row added
INFO 2023-08-26 13:48:54,372 - --------------- Row added
INFO 2023-08-26 13:48:54,591 - --------------- Row added
INFO 2023-08-26 13:48:54,704 - --------------- Row added
INFO 2023-08-26 13:48:54,934 - --------------- Row added
INFO 2023-08-26 13:48:55,214 - --------------- Row added
INFO 2023-08-26 13:48:55,436 - --------------- Row added
INFO 2023-08-26 13:48:55,668 - --------------- Row added
INFO 2023-08-26 13:48:55,937 - --------------- Row added
INFO 2023-08-26 13:48:56,152 - --------------- Row added
INFO 2023-08-26 13:48:56,380 - --------------- Row added
INFO 2023-08-26 13:48:56,651 - --------------- Row added
INFO 2023-08-26 13:48:56,881 - --------------- Row added
INFO 2023-08-26 13:48:57,104 - --------------- Row added
INFO 2023-08-26 13:48:57,330 - --------------- Row added
INFO 2023-08-26 13:48:57,454 - --------------- Row added
INFO 2023-08-26 13:48:57,574 - --------------- Row added
INFO 2023-08-26 13:48:57,795 - --------------- Row added
INFO 2023-08-26 13:48:58,026 - --------------- Row added
INFO 2023-08-26 13:48:58,137 - --------------- Row added
INFO 2023-08-26 13:48:58,254 - --------------- Row added
INFO 2023-08-26 13:48:58,481 - --------------- Row added
INFO 2023-08-26 13:48:58,713 - --------------- Row added
INFO 2023-08-26 13:48:58,976 - --------------- Row added
INFO 2023-08-26 13:48:59,095 - --------------- Row added
INFO 2023-08-26 13:48:59,313 - --------------- Row added
INFO 2023-08-26 13:48:59,530 - --------------- Row added
INFO 2023-08-26 13:48:59,763 - --------------- Row added
INFO 2023-08-26 13:48:59,980 - --------------- Row added
INFO 2023-08-26 13:49:00,088 - --------------- Row added
INFO 2023-08-26 13:49:00,313 - --------------- Row added
INFO 2023-08-26 13:49:00,550 - --------------- Row added
INFO 2023-08-26 13:49:00,813 - --------------- Row added
INFO 2023-08-26 13:49:01,032 - --------------- Row added
INFO 2023-08-26 13:49:01,267 - --------------- Row added
INFO 2023-08-26 13:49:01,495 - --------------- Row added
INFO 2023-08-26 13:49:01,723 - --------------- Row added
INFO 2023-08-26 13:49:01,724 - Commit
INFO 2023-08-26 13:49:02,079 - --------------- Row added
INFO 2023-08-26 13:49:02,186 - --------------- Row added
INFO 2023-08-26 13:49:02,304 - --------------- Row added
INFO 2023-08-26 13:49:02,529 - --------------- Row added
INFO 2023-08-26 13:49:02,769 - --------------- Row added
INFO 2023-08-26 13:49:02,994 - --------------- Row added
INFO 2023-08-26 13:49:03,216 - --------------- Row added
INFO 2023-08-26 13:49:03,446 - --------------- Row added
INFO 2023-08-26 13:49:03,567 - --------------- Row added
INFO 2023-08-26 13:49:03,787 - --------------- Row added
INFO 2023-08-26 13:49:03,903 - --------------- Row added
INFO 2023-08-26 13:49:04,133 - --------------- Row added
INFO 2023-08-26 13:49:04,387 - --------------- Row added
INFO 2023-08-26 13:49:04,501 - --------------- Row added
INFO 2023-08-26 13:49:04,620 - --------------- Row added
INFO 2023-08-26 13:49:04,845 - --------------- Row added
INFO 2023-08-26 13:49:04,969 - --------------- Row added
INFO 2023-08-26 13:49:05,091 - --------------- Row added
INFO 2023-08-26 13:49:05,202 - --------------- Row added
INFO 2023-08-26 13:49:05,319 - --------------- Row added
INFO 2023-08-26 13:49:05,551 - --------------- Row added
INFO 2023-08-26 13:49:05,809 - --------------- Row added
INFO 2023-08-26 13:49:06,245 - --------------- Row added
INFO 2023-08-26 13:49:06,352 - --------------- Row added
INFO 2023-08-26 13:49:06,633 - --------------- Row added
INFO 2023-08-26 13:49:06,859 - --------------- Row added
INFO 2023-08-26 13:49:07,102 - --------------- Row added
INFO 2023-08-26 13:49:07,328 - --------------- Row added
INFO 2023-08-26 13:49:07,617 - --------------- Row added
INFO 2023-08-26 13:49:07,890 - --------------- Row added
INFO 2023-08-26 13:49:08,017 - --------------- Row added
INFO 2023-08-26 13:49:08,266 - --------------- Row added
Data added successfully
Time taken: 51.36s
DEBUG 2023-08-26 13:49:08,400 - EOF in transport thread

Contact us on support@pythonanywhere.com so we can discuss options for you.

Email sent, thank you.

Thanks! We'll pick up the conversation there.