Forums

Using Selenium for webscraping and I have serious doubts about CPU-time

Hello. I was going to upgrade my account to one of the paid plans in order to configure a always-on-task which would be responsible for scraping. Then I read some topics.

As far as I understood, sleep() function does not consume CPU-time but even though program is in sleep, the open browser can consume my CPU-time. Is that correct? My code keeps chrome open all the time. Would altering the code to open Chrome only when sleep() is inactive and scraping has just started help?

And I don't want to use scheduled tasks because scraping will be done very often, every minute or so.

What else do you recommend to me?

Browser will consume CPU while being open, since it's never truly "idle". You can experiment and see what's more efficient CPU-wise: closing and starting new browser or keeping one open. It probably depends on the intervals?

I see. I'm going to do some experiments on this.

Before I do that, can you shed some light on how much a browser will consume relatively? Hypothetically speaking, does every second count even though the browser consumes a very small amount of CPU usage?

For example, if I open a web page using Selenium and put my program to sleep for a long period of time, does every second of that sleep time count as consumed? Or is there a formula to convert cpu-usage to cpu-time?

Thanks for your help!

Okay, I've done the experiment and got some interesting results. Anyone who is having second thoughts may benefit from this:

I've tried with a code piece which yields

CPU-Time: 0.21535476300000006
Wall Time: 14.220614433288574

I didn't quit the driver until the very end of the script. So it was open for at least 14 seconds. Although the browser was open for more than 14 secs, my CPU-time in my dashboard just went up 2.61 seconds. That's approximately %18 of wall time.

I made another test.

CPU-Time: 0.5336521949999999
Wall Time: 34.145110845565796

My CPU-time went up by 3.23 seconds. That's almost %9.5 of wall time. In conclusion, yes browser keeps consuming your CPU-time, but not as much as wall time. And I think you can't know how much it'll went up in advance.

Interesting results. Thanks for sharing!