Users need better access to reproducers. #5345

tarasmadan · 2024-09-27T08:31:51Z

Motivation:

LPC customers asked for a better access to reproducers.
Distros need them to improve their own CIs.
github.com/ksteuck wants to get reproducers by filter.

tarasmadan · 2024-09-27T08:55:03Z

We have a tools/syz-reprolist created for this purpose. It currently uses the dashAPI and requires client names + access keys.

The proposal is to:

Export required data as a jsonAPI. We're already doing it for bugs.
Switch reprolist.go from dashapi to jsonAPI.
Let reprolist.go download "upstream" namespace reproducers if called w/o any parameters
Add filers support to get only the specific subsystem reproducers etc.
Switch reprolist.go authentication to "gcloud auth login".
Let reprolist.go download reproducers from selected namespace.

tarasmadan · 2024-09-27T08:59:53Z

@dvyukov @a-nogikh wdyt?

dvyukov · 2024-09-27T09:25:15Z

Do you consider doing just raw export, or something that does regression testing out-of-the-box?
I would assume that raw export won't be too useful for most users. They won't be able to use them, or will use incorrectly.
End-to-end solution that distros can use for testing should also include build/run wrappers that will check kernel config, run tests in parallel with timeouts, monitor dmesg output for bugs + docs on how to use this.

If we export them (which is required for export form non-public namespaces), then the current auth can work as well. "gcloud auth login" is a bit handier, but not a game changer. What would be a game changer is fully automated periodic export.

+there is an unresolved problem with missing C repros in lots of cases. syz-reprolist is slow and unreliable (may be broken already). I think we should keep C repros in datastore rather than re-create.

dvyukov · 2024-09-27T09:30:09Z

For filtering purposes we could also annotate exported reproducers with some metadata (subsystem, expected running time, bug type, etc). There will be lots of reproducers (tens of thousands), so users may want to invoke some subsets of tests (faster ones, or for more critical bug types only). Runner program could accept these filter and run corresponding subsets.

tarasmadan · 2024-09-27T09:36:09Z

Do you consider doing just raw export, or something that does regression testing out-of-the-box?

I want the user to get a C reproducers collection like https://github.com/dvyukov/syzkaller-repros.

What would be a game changer is fully automated periodic export.

What do you mean? I want every syz-reprolist call to create the latest snapshot.

a-nogikh · 2024-09-27T09:38:18Z

What would be a game changer is fully automated periodic export.

+1. Maybe even to some git repository exactly like it was done manually before.

I think we should keep C repros in datastore rather than re-create.

But for older ones we'd still have to invoke older syz-prog2c versions, right? Or, probably, just ignore the syz repro bugs in this export? There are not too many of them.

tarasmadan · 2024-09-27T09:48:11Z

to some git repository exactly like it was done manually before

Pro:

It offloads the traffic to git repo.
It makes the results reachable for robots.
Generally looks easier to do.
Some access to the per-bug historical repro data out of the box.

Contra:

What about private namespaces? More git repos?
How to track usage?
The filter based selection looks more complex.

dvyukov · 2024-09-27T10:00:51Z

What do you mean? I want every syz-reprolist call to create the latest snapshot.

Is it OK to export tens of thousands of reproducers each time? I was thinking of checking them into a git repo.

dvyukov · 2024-09-27T10:01:55Z

But for older ones we'd still have to invoke older syz-prog2c versions, right? Or, probably, just ignore the syz repro bugs in this export? There are not too many of them.

Yes, either ignore, or upload once what we can easily recover.
syz-reprolist may run for days, but it's fine if done once.

dvyukov · 2024-09-27T10:06:52Z

What about private namespaces? More git repos?

I would export into a single repo all reproducers that were obtained on kernels with public source code.

How to track usage?

Don't track. I not sure raw number of API invocations is very important. Users may still cache result on their side, then the number will be low. Or they can pull it every minute, but what's the impact of that.

The filter based selection looks more complex.

I would concentrate on end user use cases. This looks like a minor impl detail. Not writing several dozens lines of code to sacrifice user experience and adoption does not looks like a good tradeoff.

tarasmadan · 2024-09-27T10:18:09Z

What do you mean? I want every syz-reprolist call to create the latest snapshot.

Is it OK to export tens of thousands of reproducers each time? I was thinking of checking them into a git repo.
Tens of thousands is doable if we have good benefits.

6k_repros.tar.gz from https://github.com/dvyukov/syzkaller-repros is 28 megabytes.
But it is a 2 years old repo. We added the filesystems... and want to scale fuzzing. It can take hundreds of megabytes in a few years.
Agree, git looks better from this perspective.
Combined with repro annotations it covers any scenario I can think about.

dvyukov · 2024-09-27T11:24:16Z

@gkennedy12 also periodically asks for updates (which unfortunately slept through the cracks).

tarasmadan · 2024-09-27T11:55:59Z

Thanks for the inputs. Let's try once more!
For every public namespace we want to mirror ReproC files from the datastore to some public git repository.

Something like this:

repo
- upstream
  - bug1
    - repro1.c
    - repro2.c
- android-6-1
  - bug1
    - repro1.c
    - repro2.c

dvyukov · 2024-09-27T12:11:19Z

What's the use case for separating them by namespace?
We can also export from non-public but open-source kernels (that's that I used to do).

store all tentative C repros in the datastore
easy way to build, properly run, and monitor these reproducers

tarasmadan · 2024-09-27T12:16:22Z

and monitor these reproducers

What is it about?

dvyukov · 2024-09-27T12:54:10Z

and monitor these reproducers

What is it about?

Detect that they triggered a bug. Lots of kernel test suites just run tests and then ignore actual bugs they provoked in the kernel, so tests look like passing.

syzbot-noreply · 2024-09-30T09:57:11Z

https://github.com/syzbot-noreply is now registered to perform the bot operations.

tarasmadan · 2024-09-30T10:01:42Z

https://github.com/syzbot-noreply is now registered to perform the bot operations.

It was me.

dvyukov · 2024-09-30T10:03:07Z

Detect that they triggered a bug. Lots of kernel test suites just run tests and then ignore actual bugs they provoked in the kernel, so tests look like passing.

We have lots of the required logic in syzkaller already. It could be a new syz-manager/execprog mode. But on the other hand, it may complicate things for users. Not sure what's the right balance.

tarasmadan · 2024-10-09T11:27:55Z

#5374 to continuously export the reproducers

tarasmadan mentioned this issue Sep 27, 2024

all: lpc24 requests #5331

Open

14 tasks

tarasmadan self-assigned this Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Users need better access to reproducers. #5345

Users need better access to reproducers. #5345

tarasmadan commented Sep 27, 2024 •

edited

Loading

tarasmadan commented Sep 27, 2024

tarasmadan commented Sep 27, 2024

dvyukov commented Sep 27, 2024 •

edited

Loading

dvyukov commented Sep 27, 2024

tarasmadan commented Sep 27, 2024 •

edited

Loading

a-nogikh commented Sep 27, 2024

tarasmadan commented Sep 27, 2024

dvyukov commented Sep 27, 2024

dvyukov commented Sep 27, 2024

dvyukov commented Sep 27, 2024

tarasmadan commented Sep 27, 2024

dvyukov commented Sep 27, 2024

tarasmadan commented Sep 27, 2024

dvyukov commented Sep 27, 2024

tarasmadan commented Sep 27, 2024 •

edited

Loading

dvyukov commented Sep 27, 2024

syzbot-noreply commented Sep 30, 2024

tarasmadan commented Sep 30, 2024

dvyukov commented Sep 30, 2024

tarasmadan commented Oct 9, 2024

Users need better access to reproducers. #5345

Users need better access to reproducers. #5345

Comments

tarasmadan commented Sep 27, 2024 • edited Loading

tarasmadan commented Sep 27, 2024

tarasmadan commented Sep 27, 2024

dvyukov commented Sep 27, 2024 • edited Loading

dvyukov commented Sep 27, 2024

tarasmadan commented Sep 27, 2024 • edited Loading

a-nogikh commented Sep 27, 2024

tarasmadan commented Sep 27, 2024

dvyukov commented Sep 27, 2024

dvyukov commented Sep 27, 2024

dvyukov commented Sep 27, 2024

tarasmadan commented Sep 27, 2024

dvyukov commented Sep 27, 2024

tarasmadan commented Sep 27, 2024

dvyukov commented Sep 27, 2024

tarasmadan commented Sep 27, 2024 • edited Loading

dvyukov commented Sep 27, 2024

syzbot-noreply commented Sep 30, 2024

tarasmadan commented Sep 30, 2024

dvyukov commented Sep 30, 2024

tarasmadan commented Oct 9, 2024

tarasmadan commented Sep 27, 2024 •

edited

Loading

dvyukov commented Sep 27, 2024 •

edited

Loading

tarasmadan commented Sep 27, 2024 •

edited

Loading

tarasmadan commented Sep 27, 2024 •

edited

Loading