-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add machine-readable patch to fix script injections in workflows #4218
base: main
Are you sure you want to change the base?
Conversation
Note: this feature is large enough it won't make the v5.0.0 cutoff, but excited to take a look later |
Thanks for the PR, I'll try to take a more in-depth look tomorrow but a few questions now based only on your PR description:
My initial thoughts were around clobbering the environment variables, but it seems like you have a lot of test cases that deal with these scenarios. So I'll have to wait until my deep dive review tomorrow.
This is a really cool validation step! Nicely done. Questions for you
Open question responses
In the
Until something else wants to use it, I'd say don't worry about where it could live. I'd say a good practice is marking the code as So making |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some initial thoughts, ran out of review time for today
Sorry, I'd missed these questions before.
Those hashes aren't relevant; the resulting patch could be applied at any time, at any stage of the repository. In fact, those hashes aren't even for the actual repository, they're the hashes for the "in-memory" repository used to generate the diff.
I don't really know how expensive this will be in the worst case. But on the vast majority of cases it'll be a no-op, since most projects don't have workflows vulnerable to script injection. Looking at the latest BQ data, out of the 1.2M projects scanned, it only found ~2.5k workflows with script injections, each of which likely only has one or two vulnerabilities. But if we were to try to fix a "malicious" workflow with hundreds of script injections... yeah, I don't know how long that'd take (I'd still guess not too long, though?).
Done. PTAL. I added documentation to
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good thanks. With regard to efficiency, the cron has other bottle necks, so I don't think this will be an issue. But if it is, we can always profile and revisit.
I tried using --format sarif with the probe, but the SARIF came out empty, so I don't know if SARIF and probes are integrated yet
The magic incantation:
ENABLE_SARIF=1 go run main.go \
--local=. --checks Dangerous-Workflow --show-details \
--format sarif --policy ../scorecard-action.git/policies/template.yml| jq
}) | ||
findings = append(findings, *f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm conflicted here, because we add the finding, and then mutate a copy later. but it works because Remediation is a pointer. But if we move this append later, it complicates the error handling (all of your continue
s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the issue? Would you prefer that we only append the finding once it's "finished"?
If so, I can extract the if curr != wp
logic (which has all those continue
s) into a separate function, simplifying the error handling:
for _, e := range r.Workflows {
// ...
f = f.WithLocation(...)
err = parseWorkflow(...)
if err == nil {
// generate the patch and store in finding
}
findings = append(findings, *f)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the issue?
The issue is the probe returns a []finding
. As written, the code adds a copy of the finding to the slice when we deference the pointer append(findings, *f)
, but then we go on to modify the original and the copy in slice doesn't change.
The fact that this works as written is accidental in my opinion, since both copies have a pointer to the same finding.Remediation
object.
Would you prefer that we only append the finding once it's "finished"?
Yes, that's the outcome I'd like to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL.
probes/hasDangerousWorkflowScriptInjection/internal/patch/impl.go
Outdated
Show resolved
Hide resolved
also DCO and |
/scdiff generate Dangerous-Workflow |
…-fix create environment for patch on DW script injections Signed-off-by: Diogo Teles Sant'Anna <[email protected]> Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
…-with-remediation-output Include the generated patch in the output Signed-off-by: Joyce Brum <[email protected]> Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
…erate-patch Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Git diff created using hexops/gotextdiff, WHICH IS ARCHIVED. It is unfortunately the only package I found which could do it. To be discussed with Scorecard maintainers whether it's worth it. Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
- Test patchWorkflow instead of GeneratePatch. This avoids the complication of comparing diff files; we can instead simply compare the output workflow to an expected "fixed" workflow. - Examples with multiple findings must have separate "fixed" workflows for each finding, not a single file which covers all findings - Instead of hard-coding the finding details (snippet, line position), run raw.DangerousWorkflow() to get that data automatically. This does make these tests a bit more "integration-test-like", but makes them substantially easier to maintain. Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
- misc refactors - use go-git to generate diff - Most functions now return errors instead of bools. This can be later used for simpler logging - Existing environment variables are now detected by parsing the files as GH workflows. This is WIP to handle existing envvars in our patches. - Remove instances of C-style for-loops, unnecessarily dangerous! - Fixed proper detection of existing env, handling blank lines and comments. Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
- Fix inconsistencies between original and "fixed" versions - Store multiple "fixed" workflows for tests with multiple findings. Each "fixed" workflow fixes a single finding. The files are numbered according to the order in which the findings are found by moving down the file. - allKindsOfUserInput removed. Would require too many "fixed" workflows to test. The behavior can be tested more directly. Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
- If an envvar with our name and value already existed but simply wasn't used, the patch no longer duplicates it. - After the patched workflow is created, we validate that it is valid. Or, at least did not introduce any syntax errors that were not present in the original workflow. Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
- Create helper function `readWorkflow` - Improved error handling in case of failed workflow validation - Allow the declaration of duplicate findings (cases where 2+ findings have the same patch) Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
- Simplify use of unsafePatterns - Replaced boolean returns with errors, for easier log/debugging - Improved documentation - Changes to satisfy linter, adoption of 120-char line limit Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
Signed-off-by: Pedro Kaj Kjellerup Nacht <[email protected]>
What kind of change does this PR introduce?
What is the current behavior?
Findings have a
.Remediation.Patch
field which is meant to contain a machine-readable patch fixing that particular finding:scorecard/finding/probe.go
Lines 50 to 52 in 3155309
This field currently isn't used by any Scorecard probe.
What is the new behavior (if this is a feature change)?**
This PR adds a machine-readable patch (following the "unified diff" format which users can then apply using
patch
orgit apply
) fixing allhasDangerousWorkflowScriptInjection
findings.Each finding is patched by creating a global environment variable that wraps the dangerous GitHub variable and replacing that GitHub variable by the envvar in the relevant
run
command. That is:Or, on a real example:
Where the patch is:
The patched workflow is then validated by parsing it with
actionlint
. As long as the patch added no new parsing errors, it is accepted.Which issue(s) this PR fixes
Fixes #3950
Special notes for your reviewer
pkg/scorecard_result.populateRawResults
.actionlint.Parse()
because it loses style information (i.e. whitespace, order of elements, etc). We must therefore parse the workflow ourselves to ensure we make minimal patches that follow the original file's style as best we can. However, the logic does use theactionlint.Workflow
when possible (which is only to read any existing environment variables).env
must be created:jobs:
labeljobs:
label. The number of lines matches the number between thejobs:
label and the end of the preceding block in the original workflow.sclog.NewLogger(WarnLevel)
?).Known limitations:
This would require a full parsing of the entire workflow to understand precisely which step, in which job, the finding is flagging, which environment variables exist at that step, etc. Given that such name collisions seem exceedingly rare, the current "basic" implementation seems sufficient, in my opinion.
$FOO
, butenv.foo
(i.e. in a more complex GitHub variable expansion) orprocess.env.foo
(i.e. when usingactions/github-script
). The current implementation does not handle these situations properly, and always uses$FOO
.Open questions:
checks.yml/md
?remediation/patch.go
?Does this PR introduce a user-facing change?
Thanks to @joycebrum and @diogoteles08 who helped come up with the tests and the logic to integrate with
hasDangerousWorkflowScriptInjection.Run
.