Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changing any LFS object invalidates the entire cache, which can cost lots of bandwidth #27

Open
connorjclark opened this issue Jan 22, 2023 · 0 comments · May be fixed by #34
Open

changing any LFS object invalidates the entire cache, which can cost lots of bandwidth #27

connorjclark opened this issue Jan 22, 2023 · 0 comments · May be fixed by #34

Comments

@connorjclark
Copy link

connorjclark commented Jan 22, 2023

As an extreme example, in my CI I have 4 environments to run tests on, which each have 6 shards. They all start at roughly the same time, so that means when something tracked in LFS storage changes they will all attempt to update at the same time, eating up bandwidth very quickly.

For a workaround, I've added an extra job that simply runs this action and have all my other jobs wait for it to finish, such that the LFS cache will be warm by the time they run. Still paying the cost of downloading everything again, though.

jobs:
  # Get the LFS cache ready. This avoids every shard of each environment downloads the LFS objects
  # at the same time whenever any change, which is disastrous for quota.
  warm-lfs-cache:
    runs-on: ubuntu-latest
    steps:
      - name: git clone
        uses: nschloe/action-cached-lfs-checkout@v1

  test:
    needs: warm-lfs-cache
    strategy:
      matrix:
        ...

Some approaches that may fix the problem:

  • utilize the restore-keys: property of actions/cache@v3, so that the LFS objects that haven't been modified recently won't need to be downloaded again. something like key: lfs-v3-${{ github.repository }}-${{ hashFiles('.lfs-assets-id') }} restore-keys: lfs-v3-${{ github.repository }}- may work
  • split up the LFS objects across multiple caches, perhaps bucketing them by a hash of their filename
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant