-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Dowloader #41
Comments
Maybe using processes instead of threads will be better (maybe there is some GIL stuff that is locking all the threads and in the end the code is running like serial code). |
That's the thing: We need to do some profiling first as well as measure throughput. |
The collection object is shared among all threads. In other projects when I need to do some crawling in parallel what I usually do is:
Only the main process have access to the database. I think the class |
Good point. I'll try to refactor it soon. But if you want to take a stab
|
Downloader is currently taking 143 minutes to run through our list of 2162 feed and download new articles (662 for this particular measurement)
We need to improve the efficiency of the capture, perhaps increasing the number of concurrent threads doing the downloads or simply by optimizing the time required to handle each download.
The text was updated successfully, but these errors were encountered: