-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid multiple downloads for same image #2732
Comments
Somehow related: #1354 |
No, just that we should not timeout waiting for the lock; we need to wait indefinitely and rely on the process holding the lock to eventually release the lock. Otherwise you will have to start checking for this in scripts running |
This comment was marked as resolved.
This comment was marked as resolved.
IIRC the behavior above (attempting to download the snapshot) happens because another process have the release file already created, but checksum is wrong (because it's still downloading). That's why 2nd lima process tries to go snapshot instead. Not 100% sure though.. I roughly remember something like that. |
This should not be possible with current code. The data file is created only when the download is completed. But the checksum file is created only after the data file, so there is a tiny window when the checksum file is missing. |
To solve the races during concurrent downloads and avoid unneeded work and bandwidth, we allow one concurrent download of the same image. When a limactl process try to access the cache, it takes a lock on the file cache directory. If multiple processes try to get the lock in the same time, only one will take the lock, and the other will block. The process that took the lock tries to get the file from the cache. This is the fast path and the common case. This can fail if the file is not in the cache, the digest does not match, or the cached last modified time does not match the last modified returned from the server. If the process cannot get the file from the cache, it downloads the file from the remote server, and update the cached data and metadata files. Finally the process release the lock on the cache directory. Other limactl processes waiting on the lock wake up and take the lock. In the common case they will find the image in the cache and will release the lock quickly. Since we have exactly one concurrent download, updating the metadata files is trivial and we don't need the writeFirst() helper. Fixes: lima-vm#2902 Fixes: lima-vm#2732 Signed-off-by: Nir Soffer <[email protected]>
To solve the races during concurrent downloads and avoid unneeded work and bandwidth, we allow one concurrent download of the same image. When a limactl process try to access the cache, it takes a lock on the file cache directory. If multiple processes try to get the lock in the same time, only one will take the lock, and the other will block. The process that took the lock tries to get the file from the cache. This is the fast path and the common case. This can fail if the file is not in the cache, the digest does not match, or the cached last modified time does not match the last modified returned from the server. If the process cannot get the file from the cache, it downloads the file from the remote server, and update the cached data and metadata files. Finally the process release the lock on the cache directory. Other limactl processes waiting on the lock wake up and take the lock. In the common case they will find the image in the cache and will release the lock quickly. Since we have exactly one concurrent download, updating the metadata files is trivial and we don't need the writeFirst() helper. Fixes: lima-vm#2902 Fixes: lima-vm#2732 Signed-off-by: Nir Soffer <[email protected]>
Description
Currently when starting multiple clusters using the same image before the image is cached, we download the same image in parallel. The first download stores the file in the cache and the other downloaded files are dropped.
Possible flow
we can use it in these cases:
Possible issues
@jandubois you was concerned about issues, anything to add?
The text was updated successfully, but these errors were encountered: