Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add provenance.jq #76

Merged
merged 6 commits into from
Sep 19, 2024

Conversation

mrjoelkamp
Copy link
Contributor

@mrjoelkamp mrjoelkamp commented Sep 9, 2024

summary

  • adds provenance.jq filter to generate in-toto provenance for GHA builds

usage

example usage:

json="$(
  jq -L.scripts '
    include "meta";
    .[env.BUILD_ID]
    | select(needs_build and .build.arch == env.BASHBREW_ARCH) # sanity check
    | .commands = commands
  ' builds.json
)"

image-digest() {
  local dir="$1"
  local manifest
  manifest="$dir/blobs/$(jq -r '.manifests[0].digest | sub(":"; "/")' "$dir/index.json")" || return "$?"
  jq -L.scripts -r -s '
    include "oci";
    . | image_digest("linux"; env.BASHBREW_ARCH)
  ' $manifest || return "$?"
}
digest=$(image-digest temp)

echo $buildJson | jq -L.scripts --argjson github '${{ env.GITHUB_CONTEXT }}' --argjson runner '${{ toJson(runner) }}' --arg digest ${digest} '
  include "provenance";
  github_actions_provenance($github; $runner; $digest)
' >> provenance.json

output

example output:

{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [
    {
      "name": "pkg:docker/notary:server-0.7.0?platform=linux%2Famd64",
      "digest": {
        "sha256": "b58c16c79286924b4b96f11061c34498fe137db319aa3e5b606e309f492f2edb"
      }
    },
    {
      "name": "pkg:docker/notary:server?platform=linux%2Famd64",
      "digest": {
        "sha256": "b58c16c79286924b4b96f11061c34498fe137db319aa3e5b606e309f492f2edb"
      }
    },
    {
      "name": "pkg:docker/amd64/notary:server-0.7.0?platform=linux%2Famd64",
      "digest": {
        "sha256": "b58c16c79286924b4b96f11061c34498fe137db319aa3e5b606e309f492f2edb"
      }
    },
    {
      "name": "pkg:docker/amd64/notary:server?platform=linux%2Famd64",
      "digest": {
        "sha256": "b58c16c79286924b4b96f11061c34498fe137db319aa3e5b606e309f492f2edb"
      }
    },
    {
      "name": "pkg:docker/oisupport/staging-amd64:96b43f70bdd4601f5dd42b5525f9818e4b834fde87a0311e28401b4c9a836ff4?platform=linux%2Famd64",
      "digest": {
        "sha256": "b58c16c79286924b4b96f11061c34498fe137db319aa3e5b606e309f492f2edb"
      }
    }
  ],
  "predicateType": "https://slsa.dev/provenance/v1",
  "predicate": {
    "buildDefinition": {
      "buildType": "https://actions.github.io/buildtypes/workflow/v1",
      "externalParameters": {
        "inputs": {
          "bashbrewArch": "amd64",
          "buildId": "96b43f70bdd4601f5dd42b5525f9818e4b834fde87a0311e28401b4c9a836ff4",
          "firstTag": "notary:server-0.7.0"
        },
        "workflow": {
          "ref": "refs/heads/main",
          "repository": "https://github.com/mrjoelkamp/gha-test",
          "path": ".github/workflows/test.yml",
          "digest": {
            "sha256": "11d1f3b6c0efade1b907d0ff3ab41e960787fbf4"
          }
        }
      },
      "internalParameters": {
        "github": {
          "event_name": "workflow_dispatch",
          "repository_id": "712982891",
          "repository_owner_id": "2976326"
        }
      },
      "resolvedDependencies": [
        {
          "uri": "git+https://github.com/mrjoelkamp/gha-test@refs/heads/main",
          "digest": {
            "gitCommit": "11d1f3b6c0efade1b907d0ff3ab41e960787fbf4"
          }
        }
      ]
    },
    "runDetails": {
      "builder": {
        "id": "https://github.com/mrjoelkamp/gha-test/.github/workflows/test.yml@refs/heads/main"
      },
      "metadata": {
        "invocationId": "https://github.com/mrjoelkamp/gha-test/actions/runs/10780651714/attempts/1"
      }
    }
  }
}

@tianon
Copy link
Member

tianon commented Sep 10, 2024

I think the shape of this is probably roughly OK -- I've got a lot of minor comments, but I don't want the higher-level discussion to get lost in them, so I'll save them for now. 😄

I am concerned by how GitHub-centric it is. Perhaps slightly more accurately, I'm concerned that there aren't any obvious indications or guardrails or places to put a clear TODO for non-GitHub usage of code that is mostly generic. 👀

Do you have any thoughts on how/where this might change if we were to generate it from Jenkins also? Which values would need to change? To be clear, I don't think we need to do all the work for that, but IMO it's worth annotating at least the things we know are GitHub-only and/or somewhere in the file adding a note that makes it really clear this is GitHub specific for now, perhaps even naming the function github_actions_provenance or something instead? (Thinking through it more, that last one is probably enough to make me feel satisfied 😂)

Do you want me to go through nits here/now, or wait until we're closer to integrating before hitting the low-level/cosmetic stuff?

@mrjoelkamp
Copy link
Contributor Author

... IMO it's worth annotating at least the things we know are GitHub-only and/or somewhere in the file adding a note that makes it really clear this is GitHub specific for now, perhaps even naming the function github_actions_provenance or something instead? (Thinking through it more, that last one is probably enough to make me feel satisfied 😂)

Good point! I added github to the functions that are specific to GHA provenance. If we define Jenkins worker provenance at some point we can add similar functions that are specific to that provenance statement.

Do you want me to go through nits here/now, or wait until we're closer to integrating before hitting the low-level/cosmetic stuff?

Sure! Whatever is more convenient for you, comments here, pairing or even just pushing changes to my branch (you should have write access). This is mostly a PoC to get things started but I would be delighted if we ended up using it!

@mrjoelkamp mrjoelkamp marked this pull request as ready for review September 11, 2024 15:12
@mrjoelkamp mrjoelkamp force-pushed the feat-add-provenance branch 2 times, most recently from 1fec599 to 11c49b5 Compare September 12, 2024 19:01
Copy link
Collaborator

@whalelines whalelines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question

provenance.jq Outdated Show resolved Hide resolved
Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more specific comments to hopefully generate more discussion to fine-tune a few points before getting into nits like whitespace 🙈 ❤️

provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely done, and I think there's still some discussion points open (notably the high-level input) so I'm trying to avoid commenting in places that might change or go away based on those discussions, but here's a few more nit-picking notes before I go to lunch 👀

provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
@mrjoelkamp
Copy link
Contributor Author

mrjoelkamp commented Sep 18, 2024

I'm not completely done, and I think there's still some discussion points open (notably the high-level input) so I'm trying to avoid commenting in places that might change or go away based on those discussions, but here's a few more nit-picking notes before I go to lunch 👀

Sounds good! @LaurentGoderre and I hit most of these comments and we also refactored the high level input to use the build object for the buildId as an output from the build job pass into the sign job to hopefully resolve #76 (comment)

Remember that we have a working test environment and some of the high-level changes aren't captured here but will be in a future PR with changes to build.yml. all of these changes are tested and verified working in GHA in a separate repo.

…r calculation adjustments

The primary justification for collapsing everything back into the main document (aside from there only being a single caller for each function) is that it makes verifying that we didn't miss anything easier -- as you scan the document, for each field that contains a "load-bearing" URL (first https://in-toto.io/Statement/v1, then https://slsa.dev/provenance/v1, and finally https://actions.github.io/buildtypes/workflow/v1), if you open the URL it describes the expected format, fields, and values, and with them all here and in-order, they're *much* easier to match up and validate to be correct and exhaustive.  Granted, that will change over time as we shove more and more (optional) data into this document so that it includes a more complete picture, but for now, this is makes it really easy to double check our work (and the end result is no less organized; for example, the `externalParameters` are still all grouped together under a suitable heading describing what their purpose is).

I also made some minor changes to the way values were calculated, especially in the `workflow` block, but very related to the above justification: now the way we calculate the values matches the way they're described in https://actions.github.io/buildtypes/workflow/v1 (specifically using the exact fields parsed in the exact ways they suggest).  We will probably deviate from that over time (as suggested by a new "TODO" comment I included), but at least this way our baseline matches theirs and the delta will be easier to track.

Additionally, I removed the `(env.GITHUB_CONTEXT | fromjson) as $github` line from here, because I think that's more appropriate behavior for the caller (and added back the explicit function arguments).  This will be more clearly meaningful in my follow-up commit adding a basic test.
@tianon
Copy link
Member

tianon commented Sep 19, 2024

I had more suggestions, but figured they might be easier to digest as explicit commits instead of a pile of interconnected suggestions (or prose / whatever it is that GitHub does with suggestion blocks that span a large number of lines):

  • 96f29bc

    Collapse more functions into main doc, add more trailing commas, minor calculation adjustments

    The primary justification for collapsing everything back into the main document (aside from there only being a single caller for each function) is that it makes verifying that we didn't miss anything easier -- as you scan the document, for each field that contains a "load-bearing" URL (first https://in-toto.io/Statement/v1, then https://slsa.dev/provenance/v1, and finally https://actions.github.io/buildtypes/workflow/v1), if you open the URL it describes the expected format, fields, and values, and with them all here and in-order, they're much easier to match up and validate to be correct and exhaustive. Granted, that will change over time as we shove more and more (optional) data into this document so that it includes a more complete picture, but for now, this is makes it really easy to double check our work (and the end result is no less organized; for example, the externalParameters are still all grouped together under a suitable heading describing what their purpose is).

    I also made some minor changes to the way values were calculated, especially in the workflow block, but very related to the above justification: now the way we calculate the values matches the way they're described in https://actions.github.io/buildtypes/workflow/v1 (specifically using the exact fields parsed in the exact ways they suggest). We will probably deviate from that over time (as suggested by a new "TODO" comment I included), but at least this way our baseline matches theirs and the delta will be easier to track.

    Additionally, I removed the (env.GITHUB_CONTEXT | fromjson) as $github line from here, because I think that's more appropriate behavior for the caller (and added back the explicit function arguments). This will be more clearly meaningful in my follow-up commit adding a basic test.

  • ad28b37

    Add a basic test of the provenance (both amd64 and windows, to illustrate/validate the inputs variance)

You can also view them via 24e679e...ad28b37 or steal them directly via fetch-by-commit or https://github.com/infosiftr/doi-meta-scripts/tree/tianon-provenance if you find them unobjectionable as-is. (Of course, happy to discuss further, explain more/better, push directly, adjust, etc etc.)

@mrjoelkamp
Copy link
Contributor Author

You can also view them via 24e679e...ad28b37 or steal them directly via fetch-by-commit or https://github.com/infosiftr/doi-meta-scripts/tree/tianon-provenance if you find them unobjectionable as-is. (Of course, happy to discuss further, explain more/better, push directly, adjust, etc etc.)

@tianon 🙇 thanks for making the time to take a pass on this! I pushed your commits since I don't really have any objections and like the way it turned out.

I also added the runner context, in order to not hardcode runner.environment in d9b5735.

I didn't realize there was a whole test framework here 😆 , appreciate the addition and updated it with the runner context.

@mrjoelkamp
Copy link
Contributor Author

mrjoelkamp commented Sep 19, 2024

I also made some minor changes to the way values were calculated, especially in the workflow block, but very related to the above justification: now the way we calculate the values matches the way they're described in https://actions.github.io/buildtypes/workflow/v1 (specifically using the exact fields parsed in the exact ways they suggest). We will probably deviate from that over time (as suggested by a new "TODO" comment I included), but at least this way our baseline matches theirs and the delta will be easier to track.

I might want to revert the workflow back to what we had before. Specifically, because this really should represent the workflow claims. As implemented in https://github.com/actions/attest-build-provenance for example, they also decided to use the github.workflow_ref as a basis for the workflow metadata:

    const [workflowPath, workflowRef] = claims.workflow_ref
        .replace(`${claims.repository}/`, '')
        .split('@');

...

                externalParameters: {
                    workflow: {
                        ref: workflowRef,
                        repository: `${serverURL}/${claims.repository}`,
                        path: workflowPath
                    }
                },

I believe this is more accurate despite the example given in the build type definition.

edit: fixed in 03048f2

@tianon
Copy link
Member

tianon commented Sep 19, 2024

The problem is we don't have a workflow_repository variable, so we're still relying on the context being the same so it didn't feel worthwhile IMO to only fix that dissonance halfway 🤔

provenance.jq Outdated Show resolved Hide resolved
provenance.jq Show resolved Hide resolved
provenance.jq Show resolved Hide resolved
provenance.jq Outdated Show resolved Hide resolved
@mrjoelkamp
Copy link
Contributor Author

The problem is we don't have a workflow_repository variable, so we're still relying on the context being the same so it didn't feel worthwhile IMO to only fix that dissonance halfway 🤔

hmm, yeah. This is not ideal.

The key thing for me here is that github.ref is the commit or tag that triggered the workflow and github.workflow_ref could point to a workflow on a different commit or tag.

Although, if we are limiting this to workflow_dispatch events, I don't think this is even a possible scenario since workflow_ref would always be equivalent to github.ref https://docs.github.com/en/rest/actions/workflows?apiVersion=2022-11-28#create-a-workflow-dispatch-event

Having talked through it, I think I'll go back to what you had for this and then we can avoid #76 (comment)

Co-authored-by: Tianon Gravi <[email protected]>
Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

@LaurentGoderre LaurentGoderre merged commit 3f90094 into docker-library:main Sep 19, 2024
1 check passed
@mrjoelkamp mrjoelkamp deleted the feat-add-provenance branch September 19, 2024 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants