Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kiwix goes into non responsive when trying to download 78GB wikipedia #87

Closed
Emnolope opened this issue Dec 23, 2018 · 45 comments · Fixed by #1038
Closed

Kiwix goes into non responsive when trying to download 78GB wikipedia #87

Emnolope opened this issue Dec 23, 2018 · 45 comments · Fixed by #1038
Assignees
Labels
Milestone

Comments

@Emnolope
Copy link

Honestly this was expected. Is there a better way for Kiwix to handle downloads of such large files?
While I was doubting I did check the file size in windows explorer and indeed it is going up in size, however because of the large size, the computer is having difficulty with using the full capabilities of the network card.

Basically there should be a cleaner way for Kiwix handling such large downloads.

@kelson42
Copy link
Collaborator

kelson42 commented Dec 23, 2018

@Emnolope This is a bit strange. This happens directly after starting the download? Or later? Is Kiwix Desktop then unresponsive during the whole download?

@kelson42 kelson42 added the bug label Dec 23, 2018
@kelson42 kelson42 self-assigned this Dec 23, 2018
@Emnolope
Copy link
Author

Emnolope commented Dec 25, 2018 via email

@Emnolope
Copy link
Author

Emnolope commented Dec 25, 2018 via email

@mgautierfr
Copy link
Member

This shouldn't. The download itself is handle by a different process. And kiwix ui just update the information every second.
Which version of kiwix-desktop are you using ? Windows, Linux ?

@kelson42
Copy link
Collaborator

@Emnolope Have you been able to reproduce the problem with the last beta?

@kelson42
Copy link
Collaborator

@Emnolope I'm pretty convinced we have fixed all of this in last betas. If the problem still happen, please reopen the ticket.

@kelson42
Copy link
Collaborator

kelson42 commented Aug 21, 2019

@mgautierfr @jetownfeve21 I have to reopen this ticket as it still does not work properly with the RC1. I have downloaded the last version of WPDE (with pictures). The Kiwix UI get frozen (and the Ubuntu OS also complains about it) time to time during a long/big download. It also get frozen at the very end of the download process.

@kelson42 kelson42 reopened this Aug 21, 2019
@kelson42 kelson42 assigned mgautierfr and unassigned kelson42 Sep 7, 2019
@kelson42 kelson42 pinned this issue Sep 27, 2019
@stale
Copy link

stale bot commented Nov 27, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Nov 27, 2019
@kelson42 kelson42 unpinned this issue Apr 9, 2020
@stale stale bot removed the stale label Apr 9, 2020
@stale
Copy link

stale bot commented Jun 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale
Copy link

stale bot commented Oct 28, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Oct 28, 2020
@AllanWegan
Copy link

I too experience absurd GUI response delays of multiple seconds up to a minute when saturating my internet connection by downloading multiple things. There is almost no CPU usage (probably some IO thing going on).

Does the GUI loop interact synchronously with the download processes?

@stale stale bot removed the stale label Jan 31, 2021
@kelson42
Copy link
Collaborator

kelson42 commented Feb 1, 2021

@AllanWegan It looks like indeed that part of the process still run in the main UI loop. Unclear so far which one so far.

@mgautierfr
Copy link
Member

I think I have found the root cause : In libkiwix's aria2.cpp, we use a lock to prevent a race condition when we could reuse the same curl context (https://github.com/kiwix/libkiwix/blob/main/src/aria2.cpp#L137-L156).

While this make the aria2 wrapper threadsafe (as we can call it from different threads safely), it is not really multithread compliant (we cannot do several requests is parallel).
So by definition, if addUri method (which is used to start a download) takes time, all other requests will be blocked, whatever if they are made from the same thread or not.

We have to make the aria2 wrapper fully multrithread and also make the libkiwix::Downloader thread safe/compliant.
Then it would be possible to use it correctly in a multithreaded client (kiwix-desktop) without such bottleneck.

@adamlamar
Copy link
Collaborator

@mgautierfr I believe this line is blocking the BackgroundDownloader's event loop: https://github.com/kiwix/kiwix-desktop/pull/919/files#diff-de6d6dc21894f626a8d8aa19ae0974692384776ff9ea5796987397fd1dcf2832R111

So the idea is to NOT get the lock when doing rpc call

The RPC call is outside of the lock.The overall event loop looks like this:

Thread 1 - UI Thread
Runs code from many classes, including ContentManager
Has its own event loop

Thread 2 - parentless QThread started in BackgroundDownloader
Runs code from BackgroundDownloader only
  event loop runs one of:
  - updateStatus() (once per second as invoked by the QTimer)
  - startDownload()
  - completeDownload()
  - pauseDownload()
  - resumeDownload()
  - cancelDownload()

When updateStatus() blocks, the whole event loop blocks and received signals queue up behind. Due to file preallocation, this could happen for minutes. And the user sees the delay when they go do the next action, say downloading another zim. The program does not go into Not Responding (as it did before), but the UI does not act correctly (e.g. the download does not start after pressing the Download text).

in BackgroundDownloader::startDownload you have a lock

That's true. I don't believe the startDownload call normally blocks, but I can remove the locking around mp_downloader since there is no concurrent access (only event loop access). The only concurrent access occurs against m_status.

My interpretation of aria2/aria2#1851, aria2/aria2#1842, and aria2/aria2#1396 is that file preallocation will always occur if split>1, and split=5 by default. Setting file-allocation=trunc might help when NTFS is used, but the user will still see the delay if FAT/exFAT is used (e.g. removable disk). This seems to be true in my testing - when I set split=1 manually, there is no delay starting downloads, but they run much slower.

@adamlamar
Copy link
Collaborator

On the lock in aria2.cpp, I don't know if it would make a difference in kiwix-desktop because there is only one thread trying to invoke the downloader at any given time. So while it could be an overall improvement, I am not sure if it will solve the specific problem here.

Since we cannot always prevent aria2 from blocking during file preallocation, I will look into timing out the libcurl request and let you know.

@mgautierfr
Copy link
Member

My interpretation of aria2/aria2#1851, aria2/aria2#1842, and aria2/aria2#1396 is that file preallocation will always occur if split>1, and split=5 by default. Setting file-allocation=trunc might help when NTFS is used, but the user will still see the delay if FAT/exFAT is used (e.g. removable disk). This seems to be true in my testing - when I set split=1 manually, there is no delay starting downloads, but they run much slower.

Ok. Indeed, the issue is not in the "real" preallocation, but just after when aria2 starts the different downloads, it does some preallocation to be sure that download threads write data at the right position. If I understand correctly, it does tihs preallocation not as a specific step, but when real downloads start.

On the lock in aria2.cpp, I don't know if it would make a difference in kiwix-desktop because there is only one thread trying to invoke the downloader at any given time. So while it could be an overall improvement, I am not sure if it will solve the specific problem here.

I was thinking that it was the startDownload which was blocking (because of preallocation). If it was true, we could have move the updateStatus in its own thread and so we would have always uptodate, even if starting a download is blocking for minute.
But if it is the updateStatus which is blocking, we are a bit stuck here. I see different things:

  • We currently run one updateStatus per download. So we could put the started download in a special status (allocating) and display that to the user. If other rpc call are not blocked (need a change at least in libkiwix), other download information would be keep uptodate.
  • If aria2 itself is blocked by the long rpc call and don't answer other rpc calls, we cannot do a lot here. A blocking allocation will block us. What you've done (not blocking the ui) is the best we can do. But if it is confirm, we should open a issue upstream to have aria2 not blocking.

And maybe --file-allocation=none is conterproductive here. If we would have file preallocation, maybe the preallocation is handle by aria as a specific step and it returns a correct status (which we lost here https://github.com/kiwix/libkiwix/blob/main/src/downloader.cpp#L59-L65). But with no preallocation, the allocation is made later as a implementation details and then the status is blocked. Can you try to set a --file-allocation=falloc and see what is the status returned by aria ?

@adamlamar
Copy link
Collaborator

Yes I agree, it seems like aria is doing the "preallocation" during the download itself, since I often see a few status updates (having downloaded a few bytes) before it hangs.

Setting a timeout on RPC requests to aria should help prevent updateStatus blocking for a long period. And that's a good point - I am actually not sure if aria blocks all other RPC requests, or just the one. I'll try the file-allocation=falloc too. Will keep investigating as I have time.

@mgautierfr
Copy link
Member

@adamlamar I've a PR to make libkiwix more thread safe and compliant on the downloading side. kiwix/libkiwix#886

With this PR, you will be able to call the Downloader from different threads safely and be able to get the status of the different downloads in parallel.
IF aria itself doesn't have a internal lock, we should be good to make it works properly on kiwix-desktop side.

@mgautierfr
Copy link
Member

@adamlamar Do you plan to finish this PR or I finish it ?

@Sopheus
Copy link

Sopheus commented Jul 12, 2023

I presume this issue still not fixed? I am on 2.3.1 version and it crashes every time I try download big files, and when it is not it simply does not download them fully (zim and aria file in the roaming folder), while in Kiwix it says it downloaded a file, if you try open unfinished file Kiwix crashes as well.

@fartwhif
Copy link

fartwhif commented Aug 9, 2023

\kiwix-desktop_windows_x64_2.3.1-2
kiwix-desktop.exe

freezes while downloading
crashes when i end the task in task manager
killed all processes
start it again, now the process runs, but the UI won't appear. it's in the list of running processes, but there is NO GUI. Perhaps like it's stuck on some initialization.

the UI reliably hangs while it is downloading, sometimes briefly, sometimes practically forever.

🔧 🔨 ⚒ 🛠 ⛏

@Ultra980
Copy link

Ultra980 commented Sep 7, 2023

On NixOS (Linux) it also freezes while downloading. If I click something, it opens after around 6 seconds. In the command line, it also throws out errors like:

Cannot download favicon from library.kiwix.org/catalog/v2/illustration/91bb58ae-13df-0100-9423-d2b8617607b0/?size=48

@kelson42
Copy link
Collaborator

@mgautierfr @adamlamar Adding @veloman-yunkan as he is foreseen to complete the PR... and hopefully after years and years fix this issue.

@veloman-yunkan
Copy link
Collaborator

I've started working on this issue. It looks like #946 has introduced new bugs (#1021, #1022, #1023) related to download management. I will fix those too.

@kelson42
Copy link
Collaborator

@veloman-yunkan Thank you very much!!!

@adamlamar
Copy link
Collaborator

Thanks a bunch @veloman-yunkan, I kind of lost steam on this issue. Thinking about it I wonder if using the QT Download manager would be a better approach. If we can get the URL from libzim, we can have the QT Download manager fetch the zim file asynchronously.

@kelson42
Copy link
Collaborator

@adamlamar we have to rely on aria2. We don't want to stick only to http download.

@adamlamar
Copy link
Collaborator

adamlamar commented Dec 11, 2023

What do you mean only http download? Looks like the QT Download manager supports HTTP, HTTPS, and FTP.

Are you saying kiwix-desktop also supports other protocols like BitTorrent today using aria2?

@kelson42
Copy link
Collaborator

Are you saying kiwix-desktop also supports other protocols like BitTorrent today using aria2?

Yes, even if this is not used yet. It's based on the whole Metalink infrastructure.

@adamlamar
Copy link
Collaborator

I see. If we want metalink, BitTorrent, and other protocol support, maybe we need to fix aria2's blocking on file allocation. Its pretty hard to work around aria2's blocking in the UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment