index.bs

<pre class='metadata'>
Title: Signature-based Integrity
Shortname: signature-based-sri
Level: none
Status: w3c/CG-DRAFT
Group: wicg
Repository: mikewest/signature-based-sri
URL: https://mikewest.github.io/signature-based-sri/
Editor: Mike West, Google LLC., mkwst@google.com
Abstract: 
    A monkey-patch spec that enhances SRI with signature-based
    integrity checks. These are conceptually similar to the
    content-based checks currently defined, but have different
    properties that seem interesting to explore.
Complain About: accidental-2119 yes, missing-example-ids yes
Markup Shorthands: markdown yes, css no
Toggle Diffs: true
</pre>
<pre class="anchors">
urlPrefix: https://www.rfc-editor.org/rfc/rfc9651; type: dfn; spec: RFC9651
    text: structured header; url: #name-introduction
    for: structured header
        text: token; url: #name-tokens
        text: list; url: #name-list
urlPrefix: https://w3c.github.io/webappsec-subresource-integrity/; spec: SRI
    type: dfn
        text: valid SRI hash algorithm token
urlPrefix: https://www.ietf.org/archive/id/draft-pardue-http-identity-digest-01.html; spec: ID.pardue-http-identity-digest
    type: http-header; 
        text: Identity-Digest; url: name-the-identity-digest-field
urlPrefix: https://www.rfc-editor.org/rfc/rfc9421.html; spec: RFC9421
    type: http-header; 
        text: Accept-Signature; url: name-the-accept-signature-field
        text: Signature-Input; url: name-the-signature-input-field
        text: Signature; url: name-the-signature-field
</pre>
<pre class="biblio">
{
  "PCIv4-SRI-Gaps": {
    "authors": [ "Yoav Weiss", "Ilya Grigorik" ],
    "href": "https://docs.google.com/document/d/1RcUpbpWPxXTyW0Qwczs9GCTLPD3-LcbbhL4ooBUevTM/edit?usp=sharing",
    "title": "PCIv4: SRI gaps and opportunities"
  },
  "ID.pardue-http-identity-digest": {
    "authors": [ "Lucas Pardue" ],
    "href": "https://www.ietf.org/archive/id/draft-pardue-http-identity-digest-01.html",
    "title": "HTTP Identity Digest"
  }
}
</pre>

Introduction {#intro}
=====================

Subresource Integrity [[SRI]] defines a mechanism by which developers can
ensure that script or stylesheet loaded into their pages' contexts are
_exactly_ those scripts or stylesheets the developer expected. By specifying
a SHA-256 hash of a resource's content, any malicious or accidental deviation
will be blocked before being executed. This is an excellent defense, but its
deployment turns out to be brittle. If the resource living at a specific URL
is dynamic, then content-based integrity checks require pages and the
resources they depend upon to update in lockstep. This turns out to be
~impossible in practice, which makes SRI less usable than it could be.

Particularly as the industry becomes more interested in supply-chain integrity
(see Shopify's [[PCIv4-SRI-Gaps]], for instance), it seems reasonable to explore
alternatives to static hashes that could allow wider deployment of these checks,
and therefore better understanding of the application experiences that
developers are _actually_ composing. 

This document outlines the changes that would be necessary to [[Fetch]], and
[[SRI]] in order to support the simplest version of a signature-based check:

<div class="example" id="basic-example">
    Pages will embed an Ed25519 public key assertion into `integrity`
    attributes:

    <xmp highlight="html">
      <script src="https://my.cdn/script.js"
              crossorigin="anonymous"
              integrity="ed25519-[base64-encoded-public-key]"
              ...></script>
    </xmp>

    Servers will deliver a signature over the resource content using the
    corresponding private key along with the resource as an HTTP message
    signature [[RFC9421]]:

    <xmp highlight="http">
        HTTP/1.1 200 OK
        Accept-Ranges: none
        Vary: Accept-Encoding
        Content-Type: text/javascript; charset=UTF-8
        Access-Control-Allow-Origin: *
        Identity-Digest: sha-512=:[base64-encoded digest of `console.log("Hello, world!");`]:
        Signature-Input: sig1=("identity-digest"); alg="Ed25519"; keyid="[base64-encoded public key]"
        Signature: sig1=:[base64-encoded result of Ed25519(`console.log("Hello, world!");`)]:

        console.log("Hello, world!");
    </xmp>

    The user agent will validate the signature using the expected public key
    before executing the response.

    That's it!
</div>

The goal here is to flesh out the proposal for discussion, recognizing that it
might be too simple to ship. Then again, it might be _just_ simple enough...

Signatures are not Hashes {#signatures-vs-hashes}
-------------------------------------------------

Subresource Integrity's existing hash-based checks ensure that specific, known
_content_ executes. It doesn't care who made the file or from which server it
was retrieved: as long as the content matches the expectation, we're good to
go. This gives developers the ability to ensure that a specific set of audited
scripts are the only ones that can execute in their pages, providing a strong
defense against some kinds of threats.

The signature-based checks described briefly above are different. Rather than
validating that a specific script or stylesheet is known-good, they instead
act as a proof of _provenance_ which ensures that scripts will only execute if
they're signed with a known private key. Assuming good key-management practices
(easy, right?), this gives a guarantee which is different in kind, but
similarly removes the necessity to trust intermediaries.

With these properties in mind, signature-based integrity checks aim to protect
against attackers who might be able to manipulate the content of resources that
a site depends upon, but who cannot gain access to the signing key.

Monkey Patches {#monkey-patches}
================================

Extending SRI to support signatures will require changes to three
specifications, along with some additional infrastructure.

Patches to SRI {#monkey-patch-sri}
----------------------------------

At a high level, we'll make the following changes to SRI:

1.  We'll define the accepted algorithm values. Currently, these are left up to
    user agents in order to allow for future flexibility: given that the years
    since SRI's introduction have left the set of accepted algorithms and their
    practical ordering unchanged, we should define that explicitly.

2.  With known algorithms, we can adjust the prioritization model to return a
    set of the strongest content-based and signature-based algorithms specified
    in a given element. This would enable developers to specify both a hash and
    signature expectation for a resource, ensuring both that known resources
    load, _and_ that they're accepted by a trusted party.
    
    ISSUE: This might not be necessary. It allows us to explain things like
    packaging constraints in ways that seem useful, but does introduce some
    additional complexity in developers' mental model. So, consider it a
    decision point.

3.  Finally, we'll adjust the matching algorithm to correctly handle signatures
    by passing the public key in to the comparison operation.

The following sections adjust algorithms accordingly.


<h4 id="parsing" algorithm>Parse |metadata|.</h4>

First, we'll define valid signature algorithms:

*   <ins>The <dfn>valid SRI signature algorithm token set</dfn> is the
    [=ordered set=] « "`ed25519`" » (corresponding to Ed25519 [[!RFC8032]]).</ins>

*   <ins>A string is a <dfn>valid SRI signature algorithm token</dfn> if its
    [=ASCII lowercase=] is [=set/contained=] in the
    [=valid SRI signature algorithm token set=].

Then, we'll adjust SRI's <dfn abstract-op>Parse |metadata|</dfn>. algorithm as
follows:

This algorithm accepts a string, and returns a map containing one set of hash
expressions whose hash functions are understood by the user agent, and one set
of signature expressions which are likewise understood:

1.  Let |result| be <del>the empty set</del><ins>the [=ordered map=]
      «[ "hashes" → « », "signatures" → « » ]».</ins>
2.  For each |item| returned by <a lt="strictly split">splitting</a>
    |metadata| on spaces:
    1.  Let |expression-and-options| be the result of
        <a lt="strictly split">splitting</a> |item| on U+003F (?).
    2.  Let |algorithm-expression| be |expression-and-options|[0].
    3.  Let |base64-value| be the empty string.
    4.  Let |algorithm-and-value| be the result of
        <a lt="strictly split">splitting</a> |algorithm-expression| on U+002D (-).
    5.  Let |algorithm| be |algorithm-and-value|[0].
    6.  If |algorithm-and-value|[1] <a for=list>exists</a>, set
        |base64-value| to |algorithm-and-value|[1].
    7.  <del>If |algorithm| is not a [=valid SRI hash algorithm token=], then [=iteration/continue=].</del>
    8.  Let <del>|metadata|</del><ins>|data|</ins> be the ordered map  «["alg" → |algorithm|, "val" → |base64-value|]».</del>
    9.  <del><a for=list>Append</a> |metadata| to |result|.</del>
    11. <ins>If |algorithm| is a [=valid SRI hash algorithm token=], then [=set/append=] |data| to |result|["`hashes`"].</ins>
    12. <ins>Otherwise, if |algorithm| is a [=valid SRI signature algorithm token=], then [=set/append=] |data| to |result|["`signatures`"].</ins>
3.  Return |result|.


<h4 id="matching" algorithm>Do |bytes| and |response| match |metadataList|?</h4>

Since we adjusted the result of [[#parsing]] above, we need to adjust the
matching algorithm to match. The core change will be processing both hashing
and signature algorithms: if only one kind is present, the story will be
similar to today, and multiple strong algorithms can be present, allowing
multiple distinct resources. If both hashing and signature algorithms are
present, both will be required to match. This is conceptually similar to
the [application of multiple Content Security Policies](https://w3c.github.io/webappsec-csp/#multiple-policies).

In order to validate signatures, we'll need to change Fetch to pass in the
relevant HTTP response header. For the moment, let's simply pass in the
entire [=/response=] (|response|), as that makes the integration with [[RFC9421]] somewhat
explicable:

1.  Let |parsedMetadata| be the result of executing [[SRI#parse-metadata]|] on |metadataList|.
2.  If both |parsedMetadata|<ins>["`hashes`"] and |parsedMetadata["`signatures`"]</ins> are [=set/empty=] set, return `true`.
3.  Let <del>|metadata|</del><ins>|hash-metadata|</ins> be the result of executing [[SRI#get-the-strongest-metadata]] on |parsedMetadata|<ins>["`hashes`"]</ins>.</a>.
4.  <ins>Let |signature-metadata| be the result of executing [[SRI#get-the-strongest-metadata]] on |parsedMetadata|["`signatures`"].</ins>
5.  <ins>Let |hash-match| be `true` if |hash-metadata| is [=list/empty=], and `false` otherwise.</ins>
6.  <ins>Let |signature-match| be `true` if |signature-metadata| is [=list/empty=], and `false` otherwise.</ins>
7.  For each |item| in <del>|metadata|</del><ins>|hash-metadata|</ins>:
    1.  Let |algorithm| be the |item|["alg"].
    2.  Let |expectedValue| be the |item|["val"].
    3.  Let |actualValue| be the result of [[SRI#apply-algorithm-to-response]] on |algorithm| and |bytes|.
    4.  If |actualValue| is a case-sensitive match for
        |expectedValue|, <del>return `true`</del><ins>set |hash-match| to `true` and [=iteration/break=].</ins>
8.  <ins>For each |item| in |signature-metadata|:</ins>
    1.  <ins>Let |algorithm| be the |item|["alg"].</ins>
    2.  <ins>Let |public key| be the |item|["val"].</ins>
    3.  <ins>Let |result| be the result of [$validating an integrity signature$]
        over |response| using |algorithm| and |public key|.</ins>
    4.  <ins>If |result| is `true`, set |signature-match| to `true` and [=iteration/break=].</ins>
9.  <del>Return `false`.</del><ins>Return `true` if both |hash-match| and |signature-match| are `true`. Otherwise return `false`.</ins>

<h4 id="validation" algorithm>Validate a signature over |response| using |algorithm| and |public key|</h4>

The matching algorithm above calls into a new signature validation function.
Let's write that down. At core, it will execute the Ed25519 validation steps
from [[RFC8032]], using signatures extracted from HTTP Message Signature
headers defined in [[RFC9421]].

To <dfn abstract-op lt="validating an integrity signature">validate an integrity signature</dfn>
over a [=/response=] |response|, [=string=] |algorithm|, and [=string=] |public key|, execute the
following steps. They return `valid` if the signature is valid, or `invalid` otherwise.

1.  If |algorithm| is not an [=ASCII case-insensitive=] match for "ed25519",
    then return "`invalid`".
2.  Verify an HTTP message signature as defined in
    [Section 3.2](https://www.rfc-editor.org/rfc/rfc9421.html#name-verifying-a-signature)
    of [[!RFC9421]], given |response| and the [=verification requirements for SRI=] given
    "`Ed25519`" and |public key|.
    
    Note: When we support more than one algorithm, we'll want to change this from a
    hard-coded string to a mapping between case-insensitive and canonical strings.
3.  If a signature can be validated, return "`valid`".
4.  Otherwise, return "`invalid`".


The HTTP Message Signature <dfn>verification requirements for SRI</dfn> are the
following, given a [=string=] |algorithm| and a [=string=] |public key|:

1.  The signature's covered components MUST include the [:Identity-Digest:] field.
    
    ISSUE: It might be a good idea to limit this to *only* including the single
    `Identity-Digest` header while we're gaining implementation experiences with
    the signature mechanism generally?

2.  The signature's parameters satisfy all of the following requirements:
    *   The `alg` parameter's value is |algorithm|.
    *   The `keyid` parameter's value is |public key|.

    ISSUE: It might be reasonable to require an `expires` parameter to give
    developers clear guidance about mitigating some of the risks described in
    [[#security-rollback]].

</ins>


Patches to Fetch {#monkey-patch-fetch}
--------------------------------------

The only change we need to make to Fetch is to pass additional information into the
matching algorithm as redefined above.

Step 22.3.1 of [[Fetch#main-fetch]] should be updated as follows:

1.  If <var ignore>bytes</var> do not match <var ignore>request</var>’s
    [=request/integrity metadata=]<ins> and <var ignore>response</var></ins>,
    then run processBodyError and abort these steps. [[!SRI]]


Deployment Scenarios {#deployment-scenarios}
=======================================

Signature-based SRI is meant to be a general primitive that can be used in a wide variety of ways that we can't possibly exhaustively document. But below we document a few different scenarios for how signature-based SRI can be used to enable new functionality for the web. 

Non-versioned third-party libraries {#deployment-scenario-3p}
-------------------------------------------

The web is built on composability and it is quite common to include JS from third-parties (e.g. analytics scripts or tools for [real user monitoring](https://en.wikipedia.org/wiki/Real_user_monitoring)). These scripts are often non-versioned to allow third-parties to continually update and improve these libraries. Signature-based SRI makes it possible to enable integrity validation for these libraries, to ensure that the included libraries are built and served in a trustworthy manner.

### Architectural Notes ### {#deployment-scenario-3p-notes}

In this deployment scenario, `third-party.com/library.js` would deploy signature-based SRI. `third-party.com` would then document that when including the library, reliant websites should specify `integrity="ed25519-[base64-encoded-public-key]"`. 

If `third-party.com` offers multiple different libraries for different purposes, it is recommended to use isolated keys for each library. This ensures that an attacker can't swap in `third-party.com/foo.js` for `third-party.com/bar.js`.

Protecting first-party libraries {#deployment-scenario-1p}
-------------------------------------------

An alternate deployment scenario is a site using this to protect first-party resources. In many cases, hash-based SRI can work well for first-party use cases. But, some sites have deploy processes where they deploy the main-page separately from subresources, in which case it is possible for any hashes specified in the main-page to become out of date with the contents of subresources. Signature-based SRI makes it possible to enable integrity validation for these first-party resources without adding any constraints on how web apps are deployed.


Deployment Considerations {#deployment}
=======================================

Key Management {#deployment-key-management}
-------------------------------------------

Key management is hard. This proposal doesn't change that.

It aims instead to be very lightweight. Perhaps it errs in that direction, but
the goal is to be the simplest possible mechanimsm that supports known
use-cases.

A different take on this proposal could be arbitrarily complex, replicating
aspects of the web PKI to chain trust, allow delegation, etc. That seems like
more than we need today, and substantially more work. Perhaps something small
is good enough?


Key Rotation {#deployment-key-rotation}
---------------------------------------

Since this design relies on websites pinning a specific public key in the
`integrity` attribute, this design does not easily support key rotation. If a
signing key is compromised, there is no easy way to rotate the key and ensure
that reliant websites check signatures against an updated public key.

For now, we think this is probably enough. If the key is compromised, the
security model falls back to the status quo web security model, meaning that
the impact of a compromised key is limited. In the future if this does turn
out to be a significant issue, we could also explore alternate designs that
do support key rotation. One simple proposal could be adding support for the
client to signal the requested public key in request headers, allowing
different parties to specify different public keys. A more complex proposal
could support automated key rotation.

Note: This proposal does support pinning multiple keys for a single
resource, so it will be possible to support rotation in a coordinated way
without requiring each entity to move in lockstep.


Security Considerations {#security}
===================================

Secure Contexts {#security-secure-context}
------------------------------------------

SRI does not require a secure context, nor does it apply only to resources
delivered via encrypted and authenticated channels. That means that it's
entirely possible to believe that SRI offers a level of protection that it
simply cannot aspire to. Signatures do not change that calculus.

Thus, it remains recommended that developers rely on integrity metadata only
within [=secure contexts=]. See also [[SECURING-WEB]].


Provenance, not Content {#security-provenance-not-content}
----------------------------------------------------------

Signatures do not provide any assurance that the content delivered is the
content a developer expected. They ensure only that the content was signed
by the expected entity. This could allow resources signed by the same
entity to be substituted for one another in ways that could violate developer
expectations.

In some cases, developers can defend against this confusion by using hashes
instead of signatures (or, as discussed above, both hashes *and* signatures).
Servers can likewise defend against this risk by minting fresh keys for each
interesting resource. This, of course, creates more key-management problems,
but it might be a reasonable tradeoff.


Rollback Attacks {#security-rollback}
-------------------------------------

The simple signature checks described in this document *only* provide proof of
provenance, ensuring that a given resource was at one point signed by someone
in posession of the relevant private key. It does not say anything about whether
that entity intended to deliver a given resource to you *now*. In other words,
these checks do not prevent rollback/downgrade attacks in which old, known-bad
versions of a resource might be delivered, along with their known signatures.

This might not be a problem, depending on developers' use cases. If it becomes
a problem, it seems possible to add mitigations in the future. These could take
various forms. For example:

1.  We could allow developers to require an "`expires`" parameter in the
    [:Signature-Input:] field, and adjust our [=verification requirements for SRI=]
    to enforce against it. Similarly, we could enforce a maximum age based on
    the inclusion of a "`created`" parameter.

2.  We could require additional components of the request to be included in the
    signature ("`content-type`", "`@path;req`", "`@method;req`", etc) in order
    to reduce the scope of potential resource substitution.

3.  We could allow developers to send a challenge along with the request (as
    an [:Accept-Signature:] parameter), and require that it be incorporated into
    the [:Signature-Input:]'s "`nonce`" parameter.

We'd want to evaluate the tradeoffs in these and other approaches (the last, for
example, makes offline signing difficult), of course, but they seem quite plausibly
valuable as future enhancements.


Privacy Considerations {#privacy}
=================================

Given that the validation of a response's signature continues to require the
response to opt-into legibility via CORS, this mechanism does not seem to add
any new data channels from the server to the client. The choice of private
key used to sign the resource is potentially interesting, but doesn't seem to
offer any capability that isn't possible more directly by altering the resource
body or headers.