Forums

Does the number of web workers effect database speed?

I'm updating a database, at the moment it's adding around 5 thousand rows every second or two up to around 2 million rows - this is testing from a new table. When I'm trying to view sections of this data within my app it's taking 30+ seconds to load the data.

Is this due to web workers? i.e. should I be adding more to my account?

I'm not a professional developer but my table is fairly simple.. only 9 columns.

My query is...

def get_profile_info(self, profile):
    try:
        self.cursor.execute('''select keyword, landing_page, 
                          position, impressions, clicks, ctr
                          from search_data where profile_id={}'''.format(profile))
        return self.cursor.fetchall()
    except:
        pass
    finally:
        self.db.close()

I've added an index to the profile_id column - anything obvious I'm doing wrong or is it a web working thing?

Any help is much appreciated.

No, web workers won't help (unless another request happens to be slow and blocking while you run this one).

I would try:

  1. open a mysql console and try the same query, and see how long that takes (comparing it to your app's view)
  2. if there is a sizeable difference, then maybe you know it is the python code causing the problem
  3. if there is no sizeable difference, maybe there is something weird happening, like you are writing rows such that it puts a lock on the db to the extent that you aren't even allowed to read (select) from it.

The other thing I would check is to see how many mysql connections you have open. Maybe you have some other queries that are just holding the sql connections open, and say closing every 30s, meaning that anybody else wanting to do a query needs to wait for the opportunity to connect. You should be able to do this with a show processlist; from your mysql console.

Thanks Conrad.

I've followed your advice and still nothing - no open processes and I've made a slight improvement where I was doing a look-up on the users session vs what they could access but the route is literally just...

@app.route('/profile/<profile_id>')
def profile_main(profile_id):

    # check if user has access to this profile
    db = Db()
    if db.does_user_have_access(session['user_id'], profile_id):
        results = db.get_profile_info(profile_id)

    return render_template('profile_main.html', results=results)

With the database methods being...

def does_user_have_access(self, user_id, profile_id):
    '''Check to see if a profile is accessible to a user
    '''

    self.cursor.execute('select * from profiles where user_id={} and id={}'.format(user_id, profile_id))
    return self.cursor.fetchone()


def get_profile_info(self, profile):
    '''Gets all info for a profile
    '''
    try:
        self.cursor.execute('select keyword, landing_page, position, impressions, clicks, ctr from search_data where profile_id={} group by keyword'.format(profile))
        return self.cursor.fetchall()
    except:
        pass
    finally:
        self.db.close()

I just can't understand why 2 million rows would be so slow - maybe that's my lack of experience (hobbyist) but still. meh. :)

agree- 2mil rows should not be so slow. So the raw sql query in the mysql console is also slow?

I've reduced the size to 300K for testing and it's still hellishly slow. The SQL output is saying 5.56 seconds. That doesn't sound right to you does it?

I've just ran a query in the sql console which is more accurate, the one above was a select *.

I've just done...

mysql> select keyword, landing_page, position, ctr, impressions from search_data where profile_id=21 group by keyword;

Output....

31701 rows in set (13.01 sec)

What happens if you get rid of the "group by"? I imagine you'll get a lot of data back, but the timestamp printed out at the end should still be an accurate reflection of the time spent on the query.

Hey Giles,

I get the following...

293416 rows in set (15.54 sec)

Hmm, interesting. So it's not the group by that's causing the issue. I can't see anything obvious in the MySQL slow query log.

One thing -- is the table you're querying here the one you're writing lots of rows to? If it's gaining ~1,000 rows a second then I can imagine a query would be slow just because of all of the file locking involved.

This is on a static table, I'm not adding and updating anything - only querying it.

I'm kind of at a loss. I've double-checked the database server you're running on, and it's using about 44% CPU. So I don't think it's system load. Very puzzling indeed! Perhaps one for Stack Overflow? Maybe there are MySQL gurus out there who'd know the right questions to ask...

Ok, thanks for you're help anyway :(

Can the consoles have anything to do with it?

I've been using the same console for over a week and sometimes when running a query it will reconnect. I have just killed it and the above queries are now come in at...

select keyword, landing_page, position, ctr, impressions from search_data where profile_id=21;

293416 rows in set (0.60 sec)

and...

select keyword, landing_page, position, ctr, impressions from search_data where profile_id=21 group by keyword;

30678 rows in set (2.70 sec)

Now that's odd. But it sounds like a promising angle of attack. Maybe somehow the old connection wasn't using the index that you added, and you needed to create a new one? It would be really weird, but perhaps... Let's see if the new console slows down over time; if it doesn't, then the problem is fixed, and if it does, it sounds like a usable workaround would be to disconnect DB connections after some time period.