Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Users need better access to reproducers. #5345

Open
Tracked by #5331
tarasmadan opened this issue Sep 27, 2024 · 20 comments
Open
Tracked by #5331

Users need better access to reproducers. #5345

tarasmadan opened this issue Sep 27, 2024 · 20 comments
Assignees

Comments

@tarasmadan
Copy link
Collaborator

tarasmadan commented Sep 27, 2024

Motivation:

  1. LPC customers asked for a better access to reproducers.
  2. Distros need them to improve their own CIs.
  3. github.com/ksteuck wants to get reproducers by filter.
@tarasmadan tarasmadan mentioned this issue Sep 27, 2024
14 tasks
@tarasmadan tarasmadan self-assigned this Sep 27, 2024
@tarasmadan
Copy link
Collaborator Author

We have a tools/syz-reprolist created for this purpose. It currently uses the dashAPI and requires client names + access keys.

The proposal is to:

  1. Export required data as a jsonAPI. We're already doing it for bugs.
  2. Switch reprolist.go from dashapi to jsonAPI.
  3. Let reprolist.go download "upstream" namespace reproducers if called w/o any parameters
  4. Add filers support to get only the specific subsystem reproducers etc.
  5. Switch reprolist.go authentication to "gcloud auth login".
  6. Let reprolist.go download reproducers from selected namespace.

@tarasmadan
Copy link
Collaborator Author

@dvyukov @a-nogikh wdyt?

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 27, 2024

Do you consider doing just raw export, or something that does regression testing out-of-the-box?
I would assume that raw export won't be too useful for most users. They won't be able to use them, or will use incorrectly.
End-to-end solution that distros can use for testing should also include build/run wrappers that will check kernel config, run tests in parallel with timeouts, monitor dmesg output for bugs + docs on how to use this.

If we export them (which is required for export form non-public namespaces), then the current auth can work as well. "gcloud auth login" is a bit handier, but not a game changer. What would be a game changer is fully automated periodic export.

+there is an unresolved problem with missing C repros in lots of cases. syz-reprolist is slow and unreliable (may be broken already). I think we should keep C repros in datastore rather than re-create.

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 27, 2024

For filtering purposes we could also annotate exported reproducers with some metadata (subsystem, expected running time, bug type, etc). There will be lots of reproducers (tens of thousands), so users may want to invoke some subsets of tests (faster ones, or for more critical bug types only). Runner program could accept these filter and run corresponding subsets.

@tarasmadan
Copy link
Collaborator Author

tarasmadan commented Sep 27, 2024

Do you consider doing just raw export, or something that does regression testing out-of-the-box?

I want the user to get a C reproducers collection like https://github.com/dvyukov/syzkaller-repros.

What would be a game changer is fully automated periodic export.

What do you mean? I want every syz-reprolist call to create the latest snapshot.

@a-nogikh
Copy link
Collaborator

What would be a game changer is fully automated periodic export.

+1. Maybe even to some git repository exactly like it was done manually before.

I think we should keep C repros in datastore rather than re-create.

But for older ones we'd still have to invoke older syz-prog2c versions, right? Or, probably, just ignore the syz repro bugs in this export? There are not too many of them.

@tarasmadan
Copy link
Collaborator Author

to some git repository exactly like it was done manually before

Pro:

  1. It offloads the traffic to git repo.
  2. It makes the results reachable for robots.
  3. Generally looks easier to do.
  4. Some access to the per-bug historical repro data out of the box.

Contra:

  1. What about private namespaces? More git repos?
  2. How to track usage?
  3. The filter based selection looks more complex.

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 27, 2024

What do you mean? I want every syz-reprolist call to create the latest snapshot.

Is it OK to export tens of thousands of reproducers each time? I was thinking of checking them into a git repo.

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 27, 2024

But for older ones we'd still have to invoke older syz-prog2c versions, right? Or, probably, just ignore the syz repro bugs in this export? There are not too many of them.

Yes, either ignore, or upload once what we can easily recover.
syz-reprolist may run for days, but it's fine if done once.

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 27, 2024

What about private namespaces? More git repos?

I would export into a single repo all reproducers that were obtained on kernels with public source code.

How to track usage?

Don't track. I not sure raw number of API invocations is very important. Users may still cache result on their side, then the number will be low. Or they can pull it every minute, but what's the impact of that.

The filter based selection looks more complex.

I would concentrate on end user use cases. This looks like a minor impl detail. Not writing several dozens lines of code to sacrifice user experience and adoption does not looks like a good tradeoff.

@tarasmadan
Copy link
Collaborator Author

What do you mean? I want every syz-reprolist call to create the latest snapshot.

Is it OK to export tens of thousands of reproducers each time? I was thinking of checking them into a git repo.
Tens of thousands is doable if we have good benefits.

6k_repros.tar.gz from https://github.com/dvyukov/syzkaller-repros is 28 megabytes.
But it is a 2 years old repo. We added the filesystems... and want to scale fuzzing. It can take hundreds of megabytes in a few years.
Agree, git looks better from this perspective.
Combined with repro annotations it covers any scenario I can think about.

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 27, 2024

@gkennedy12 also periodically asks for updates (which unfortunately slept through the cracks).

@tarasmadan
Copy link
Collaborator Author

Thanks for the inputs. Let's try once more!
For every public namespace we want to mirror ReproC files from the datastore to some public git repository.

Something like this:

  • repo
    • upstream
      • bug1
        • repro1.c
        • repro2.c
    • android-6-1
      • bug1
        • repro1.c
        • repro2.c

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 27, 2024

What's the use case for separating them by namespace?
We can also export from non-public but open-source kernels (that's that I used to do).

  • store all tentative C repros in the datastore
  • easy way to build, properly run, and monitor these reproducers

@tarasmadan
Copy link
Collaborator Author

tarasmadan commented Sep 27, 2024

and monitor these reproducers

What is it about?

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 27, 2024

and monitor these reproducers

What is it about?

Detect that they triggered a bug. Lots of kernel test suites just run tests and then ignore actual bugs they provoked in the kernel, so tests look like passing.

@syzbot-noreply
Copy link

https://github.com/syzbot-noreply is now registered to perform the bot operations.

@tarasmadan
Copy link
Collaborator Author

https://github.com/syzbot-noreply is now registered to perform the bot operations.

It was me.

@dvyukov
Copy link
Collaborator

dvyukov commented Sep 30, 2024

Detect that they triggered a bug. Lots of kernel test suites just run tests and then ignore actual bugs they provoked in the kernel, so tests look like passing.

We have lots of the required logic in syzkaller already. It could be a new syz-manager/execprog mode. But on the other hand, it may complicate things for users. Not sure what's the right balance.

@tarasmadan
Copy link
Collaborator Author

#5374 to continuously export the reproducers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants