diff --git a/src/http-gateways/path-gateway.md b/src/http-gateways/path-gateway.md index e23061ddb..61aa6b568 100644 --- a/src/http-gateways/path-gateway.md +++ b/src/http-gateways/path-gateway.md @@ -595,11 +595,7 @@ The following response types require an explicit opt-in, can only be requested w - Raw Block (`?format=raw`) - Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw). - CAR (`?format=car`) - - A CAR file or a stream that contains all blocks required to trustlessly verify the requested content path query, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) and :cite[trustless-gateway]. - - **Note:** by default, block order in CAR response is not deterministic, - blocks can be returned in different order, depending on implementation - choices (traversal, speed at which blocks arrive from the network, etc). - An opt-in ordered CAR responses MAY be introduced in a future IPIP. + - A CAR file or a stream that contains all blocks required to trustlessly verify the requested content path query, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) and Section 5 (CAR Responses) at :cite[trustless-gateway]. - TAR (`?format=tar`) - Deserialized UnixFS files and directories as a TAR file or a stream, see :cite[ipip-0288]. - IPNS Record diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 4a8632898..949e2b0bf 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -13,6 +13,10 @@ editors: - name: Henrique Dias github: hacdias url: https://hacdias.com/ +xref: + - url + - path-gateway + - ipip-0412 tags: ['httpGateways', 'lowLevelHttpGateways'] order: 1 --- @@ -25,11 +29,11 @@ The minimal implementation means: - response type is always fully verifiable: client can decide between a raw block or a CAR stream - no UnixFS/IPLD deserialization -- for CAR files: - - the behavior is identical to :cite[path-gateway] - for raw blocks: - data is requested by CID, only supported path is `/ipfs/{cid}` - no path traversal or recursive resolution +- for CAR files: + - the pathing behavior is identical to :cite[path-gateway] # HTTP API @@ -37,11 +41,11 @@ A subset of "HTTP API" of :cite[path-gateway]. ## `GET /ipfs/{cid}[/{path}][?{params}]` -Downloads verifiable data for the specified **immutable** content path. +Downloads verifiable, content-addressed data for the specified **immutable** content path. -Optional `path` is permitted for requests that specify CAR format (`application/vnd.ipld.car`). +Optional `path` is permitted for requests that specify CAR format (`?format=car` or `Accept: application/vnd.ipld.car`). -For RAW requests, only `GET /ipfs/{cid}[?{params}]` is supported. +For block requests (`?format=raw` or `Accept: application/vnd.ipld.raw`), only `GET /ipfs/{cid}[?{params}]` is supported. ## `HEAD /ipfs/{cid}[/{path}][?{params}]` @@ -49,7 +53,7 @@ Same as GET, but does not return any payload. ## `GET /ipns/{key}[?{params}]` -Downloads data at specified IPNS Key. Verifiable :cite[ipns-record] can be requested via `?format=ipns-record` +Downloads data at specified IPNS Key. Verifiable :cite[ipns-record] can be requested via `?format=ipns-record` or `Accept: application/vnd.ipfs.ipns-record`. ## `HEAD /ipns/{key}[?{params}]` @@ -63,17 +67,26 @@ Same as in :cite[path-gateway], but with limited number of supported response ty ### `Accept` (request header) -This HTTP header is required when running in a strict, trustless mode. +A Client SHOULD send this HTTP header to leverage content type negotiation +based on section 12.5.1 of :cite[rfc9110]. + +Below response types MUST be supported: -Below response types MUST to be supported: -- [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – requests a single, verifiable raw block to be returned +- [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) + - A single, verifiable raw block to be returned. -Below response types SHOULD to be supported: -- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned -- [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record) – requests a verifiable :cite[ipns-record] (multicodec `0x0300`). +Below response types SHOULD be supported: -Gateway SHOULD return HTTP 400 Bad Request when running in strict trustless -mode (no deserialized responses) and `Accept` header is missing. +- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) + - Disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be + returned, implementations MAY support optional CAR content type parameters + (:cite[ipip-0412]) and the explicit [CAR format signaling in HTTP Request](#car-format-signaling-in-request). + +- [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record) + - A verifiable :cite[ipns-record] (multicodec `0x0300`). + +A Gateway SHOULD return HTTP 400 Bad Request when running in strict trustless +mode (no deserialized responses) and `Accept` header is missing. ## Request Query Parameters @@ -113,7 +126,7 @@ When the terminating entity at the end of the specified content path: specified byte range of that entity. - When dealing with a sharded UnixFS file (`dag-pb`, `0x70`) and a non-zero - `from` value, the UnixFS data and `blocksizes` determine the + `from` value, the UnixFS data and `blocksizes` determine the corresponding starting block for a given `from` offset. - cannot be interpreted as a continuous array of bytes (such as a DAG-CBOR/JSON @@ -150,14 +163,14 @@ that includes enough blocks for the client to understand why the requested returned: - If the requested `entity-bytes` resolves to a range that partially falls - outside of the entity's byte range, the response MUST include the subset of + outside the entity's byte range, the response MUST include the subset of blocks within the entity's bytes. - This allows clients to request valid ranges of the entity without needing to know its total size beforehand, and it does not require the Gateway to buffer the entire entity before returning the response. - If the requested `entity-bytes` resolves to a zero-length range or falls - fully outside of the entity's bytes, the response is equivalent to + fully outside the entity's bytes, the response is equivalent to `dag-scope=block`. - This allows client to produce a meaningful error (e.g, in case of UnixFS, leverage `Data.blocksizes` information present in the root `dag-pb` block). @@ -180,41 +193,45 @@ Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gatew MUST be returned and include additional format-specific parameters when possible. -If a CAR stream was requested, the response MUST include the parameter specifying CAR version. -For example: `Content-Type: application/vnd.ipld.car; version=1` +If a CAR stream was requested: +- the response MUST include the parameter specifying CAR version. For example: + `Content-Type: application/vnd.ipld.car; version=1` +- the response SHOULD include additional content type parameters, as noted in + [CAR format signaling in Response](#car-format-signaling-in-response). ### `Content-Disposition` (response header) MUST be returned and set to `attachment` to ensure requested bytes are not rendered by a web browser. -## Response Payload - -### Block Response +# Block Responses (application/vnd.ipld.raw) An opaque bytes matching the requested block CID ([application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw)). The Body hash MUST match the Multihash from the requested CID. -### CAR Response +# CAR Responses (application/vnd.ipld.car) A CAR stream for the requested [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) -content type, path and optional `dag-scope` and `entity-bytes` URL parameters. +content type (with optional `order` and `dups` params), path and optional +`dag-scope` and `entity-bytes` URL parameters. -#### CAR version +## CAR version Value returned in [`CarV1Header.version`](https://ipld.io/specs/transport/car/carv1/#header) field MUST match the `version` parameter returned in `Content-Type` header. -#### CAR roots +## CAR roots The behavior associated with the [`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) field is not currently specified. -Clients MAY ignore it. +The lack of standard here means a client MUST assume different Gateways could return a different value. + +A Client SHOULD ignore this field. :::issue @@ -222,27 +239,148 @@ As of 2023-06-20, the behavior of the `roots` CAR field remains an [unresolved ::: -#### CAR determinism +## CAR `order` (content type parameter) + +The `order` parameter allows clients to specify the desired block order in the +response. It supports the following values: + +- `dfs`: [Depth-First Search](https://en.wikipedia.org/wiki/Depth-first_search) + order, enables streaming responses with minimal memory usage. +- `unk` (or missing): Unknown order, which serves as the implicit default when the `order` + parameter is unspecified. In this case, the client cannot make any assumptions + about the block order: blocks may arrive in a random order or be a result of + a custom DAG traversal algorithm. + +A Gateway SHOULD always return explicit `order` in CAR's `Content-Type` response header. + +A Gateway MAY skip `order` in CAR response if no order was explicitly requested +by the client and the default order is unknown. + +A Client MUST assume implicit `order=unk` when `order` is missing, unknown, or empty. + +## CAR `dups` (content type parameter) + +The `dups` parameter specifies whether duplicate blocks (the same block +occurring multiple times in the requested DAG) will be present in the CAR +response. Useful when a deterministic block order is used. + +It accepts two values: +- `y`: Duplicate blocks MUST be sent every time they occur during the DAG walk. +- `n`: Duplicate blocks MUST be sent only once. + +When set to `y`, light clients are able to discard blocks after +reading them, removing the need for caching in-memory or on-disk. + +Setting to `n` allows for more efficient data transfer of certain types of +data, but introduces additional resource cost on the receiving end, as each +block needs to be kept around in case its CID appears again. + +If the `dups` parameter is absent from the `Accept` request header, the +behavior is unspecified. In such cases, a Gateway should respond with `dups=n` +if it has control over the duplicate status, or without `dups` parameter if it +does not. +Defaulting to the inclusion of duplicate blocks (`dups=y`) SHOULD only be +implemented by Gateway systems that exclusively support `dups=y` and do not +support any other behavior. + +A Client MUST not assume any implicit behavior when `dups` is missing. + +If the `dups` parameter is absent from the `Content-Type` response header, the +behavior is unspecified, and the CAR response includes an arbitrary list of +blocks. In this unknown state, the client MUST assume duplicates are not sent, +but also MUST ignore duplicates and other unexpected blocks if they are present. + +A Gateway MUST always return `dups` in `Content-Type` response header +when the duplicate status is known at the time of processing the request. +A Gateway SHOULD not return `dups` if determining the duplicate status is not +possible at the time of processing the request. + +A Gateway MUST NOT include virtual blocks identified by identity CIDs +(multihash with `0x00` code) in CAR responses. This exclusion applies regardless +of their presence in the DAG or the value assigned to the "dups" parameter, as +the raw data is already present in the parent block that links to the identity +CID. -The default CAR header and block order in a CAR response is not specified and is non-deterministic. +## CAR format parameters and determinism + +The default header and block order in a CAR format is not specified by IPLD specifications. Clients MUST NOT assume that CAR responses are deterministic (byte-for-byte identical) across different gateways. Clients MUST NOT assume that CAR includes CIDs and their blocks in the same order across different gateways. +Clients MUST assume block order and duplicate status only if `Content-Type` returned with CAR responses includes optional `order` or `dups` parameters, as specified by :cite[ipip-0412]. + +A Gateway SHOULD support some aspects of determinism by implementing content type negotiation and signaling via `Accept` and `Content-Type` headers. + :::issue -In controlled environments, clients MAY choose to rely on undocumented CAR determinism, -subject to the agreement of the following conditions between the client and the -gateway: +In controlled environments, clients MAY choose to rely on implicit and +undocumented CAR determinism, subject to the agreement of the following +conditions between the client and the gateway: - CAR version - content of [`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) field -- order of blocks -- status of duplicate blocks +- order of blocks (`order` from :cite[ipip-0412]) +- status of duplicate blocks (`dups` from :cite[ipip-0412]) -In the future, there may be an introduction of a convention to indicate aspects -of determinism in CAR responses. Please refer to -[IPIP-412](https://github.com/ipfs/specs/pull/412) for potential developments -in this area. +Mind this is undocumented behavior, and MUST NOT be used on public networks. ::: + +### CAR format signaling in Request + +Content type negotiation is based on section 12.5.1 of :cite[rfc9110]. + +Clients MAY indicate their preferred block order by sending an `Accept` header in +the HTTP request. The `Accept` header format is as follows: + +``` +Accept: application/vnd.ipld.car; version=1; order=dfs; dups=y +``` + +In the future, when more orders or parameters exist, clients will be able to +specify a list of preferences, for example: + +``` +Accept: application/vnd.ipld.car;order=foo, application/vnd.ipld.car;order=dfs;dups=y;q=0.5 +``` + +The above example is a list of preferences, the client would really like to use +the hypothetical `order=foo` however if this isn't available it would accept +`order=dfs` with `dups=y` instead (lower priority indicated via `q` parameter, +as noted in :cite[rfc9110]). + +### CAR format signaling in Response + +The Trustless Gateway MUST always respond with a `Content-Type` header that includes +information about all supported and known parameters, even if the client did not +specify them in the request. + +The `Content-Type` header format is as follows: + +``` +Content-Type: application/vnd.ipld.car;version=1;order=dfs;dups=n +``` + +Gateway implementations SHOULD decide on the implicit default ordering or +other parameters, and use it in responses when client did not explicitly +specify any matching preference. + +A Gateway MAY choose to implement only some parameters and return HTTP +400 Bad Request or 406 Not Acceptable when a client requested a response with +unsupported content type variant. + +A Client MUST verify `Content-Type` returned with CAR response before +processing the payload, as the legacy gateway may not support optional content +type parameters like `order` an `dups` and return plain +`application/vnd.ipld.car`. + +# IPNS Record Responses (application/vnd.ipfs.ipns-record) + +An opaque bytes matching the [Signed IPNS Record](https://specs.ipfs.tech/ipns/ipns-record/#ipns-record) +for the requested [IPNS Name](https://specs.ipfs.tech/ipns/ipns-record/#ipns-name) +returned as [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record). + +A Client MUST confirm the record signature match `libp2p-key` from the requested IPNS Name. + +A Client MUST [perform additional record verification according to the IPNS specification](https://specs.ipfs.tech/ipns/ipns-record/#record-verification). diff --git a/src/ipips/ipip-0412.md b/src/ipips/ipip-0412.md new file mode 100644 index 000000000..93f96314c --- /dev/null +++ b/src/ipips/ipip-0412.md @@ -0,0 +1,206 @@ +--- +title: "IPIP-0412: Signaling Block Order in CARs on HTTP Gateways" +date: 2023-05-15 +ipip: ratified +editors: + - name: Marcin Rataj + github: lidel + url: https://lidel.org/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ + - name: Jorropo + github: Jorropo + affiliation: + name: Protocol Labs + url: https://protocol.ai/ +relatedIssues: + - https://github.com/ipfs/specs/issues/348 + - https://github.com/ipfs/specs/pull/330 + - https://github.com/ipfs/specs/pull/402 + - https://github.com/ipfs/specs/pull/412 +order: 412 +tags: ['ipips'] +--- + +## Summary + +Adds support for additional, optional content type options that allow the +client and server to signal or negotiate a specific block order in the returned +CAR. + +## Motivation + +We want to make it easier to build light-clients for IPFS. We want them to have +low memory footprints on arbitrary sized files. The main pain point preventing +this is the fact that CAR ordering isn't specified. + +This requires keeping some kind of reference either on disk, or in memory to +previously seen blocks for two reasons. + +1. Blocks can arrive out of order, meaning when a block is consumed (data is + read and returned to the consumer) and when it's received might not match. + +1. Blocks can be reused multiple times, this is handy for cases when you plan + to cache on disk but not at all when you want to process a stream with use & + forget policy. + +What we really want is for the gateway to help us a bit, and give us blocks in +a useful order. + +The existing Trustless Gateway specification does not provide a mechanism for +negotiating the order of blocks in CAR responses. + +This IPIP aims to improve the status quo. + +## Detailed design + +CAR content type +([`application/vnd.ipld.car`](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)) +already supports `version` parameter, which allows gateway to indicate which +CAR flavor is returned with the response. + +The proposed solution introduces two new parameters for the content type headers +in HTTP requests and responses: `order` and `dups`. + +The `order` parameter allows the client to indicate its preference for a +specific block order in the CAR response, and the `dups` parameter specifies +whether duplicate blocks are allowed in the response. + +A Client SHOULD send `Accept` HTTP header to leverage content type negotiation +based on section 12.5.1 of :cite[rfc9110] to get the preferred response type. + +More details in Section 5. (CAR Responses) of :cite[trustless-gateway]. + +## Design rationale + +The proposed specification change aims to address the limitations of the +existing Trustless Gateway specification by introducing a mechanism for +negotiating the block order in CAR responses. + +By allowing clients to indicate their preferred block order, Trustless Gateways +can cache CAR responses for popular content, resulting in improved performance +and reduced network load. Clients benefit from more efficient data handling by +deserializing blocks as they arrive, + +We reuse exiting HTTP content type negotiation, and the CAR content type, which +already had the optional `version` parameter. + +### User benefit + +The proposed specification change brings several benefits to end users: + +1. Improved Performance: Gateways can decide on their implicit default ordering + and cache CAR responses for popular content. In turn, clients can benefit + from strong `Etag` in ordered (deterministic) responses. This reduces the + response time for subsequent requests, resulting in faster content retrieval + for users. + +2. Reduced Memory Usage: Clients no longer need to buffer the entire CAR + response in memory until the deserialization of the requested entity is + finished. With the ability to deserialize blocks as they arrive, users can + conserve memory resources, especially when dealing with large CAR responses. + +3. Efficient Data Handling: By discarding blocks as soon as the CID is + validated and data is deserialized, clients can efficiently process the data + in real-time. This is particularly useful for light clients, IoT devices, + mobile web browsers, and other streaming applications where immediate access + to the data is required. + +4. Customizable Ordering: Clients can indicate their preferred block order in the + `Accept` header, allowing them to prioritize specific ordering strategies that + align with their use cases. This flexibility enhances the user experience + and empowers users to optimize content retrieval according to their needs. + +### Compatibility + +The proposed specification change is backward compatible with existing client +and server implementations. + +Trustless Gateways that do not support the negotiation of block order in CAR +responses will continue to function as before, providing their existing default +behavior, and the clients will be able to detect it by inspecting the +`Content-Type` header present in HTTP response. + +Clients that do not send the `Accept` header or do not recognize the `order` +and `dups` parameters in the `Content-Type` header will receive and process CAR +responses as they did before: buffering/caching all blocks until done with the +final deserialization. + +Existing implementations can choose to adopt the new specification and +implement support for the negotiation of block order incrementally. This allows +for a smooth transition and ensures compatibility with both new and old +clients. + +### Security + +The proposed specification change does not introduce any negative security +implications beyond those already present in the existing Trustless Gateway +specification. It focuses on enhancing performance and data handling without +affecting the underlying security model of IPFS. + +Light clients with support for `order` and `dups` CAR content type parameters +will be able to detect malicious response faster, reducing risks of +memory-based DoS attacks from malicious gateways. + +### Alternatives + +Several alternative approaches were considered before arriving at the proposed solution: + +1. Implicit Server-Side Configuration: Instead of negotiating the block order, + in the CAR response, the Trustless Gateway could have a server-side + configuration that specifies the default order. However, this approach would + limit the flexibility for clients, requiring them to have prior knowledge + about order supported by each gateway. + +2. Fixed Block Order: Another option was to enforce a fixed block order in the + CAR responses. However, this approach would not cater to the varying needs + and preferences of different clients and use cases, and is not backward + compatible with the existing Trustless Gateways which return CAR responses + with Weak `Etag` and unspecified block order. + +3. Separate `X-` HTTP Header: Introduction of a separate HTTP reader was + rejected because we try to use HTTP semantics where possible, and gateways + already use HTTP content type negotiation for CAR `version` and reusing it + saves a few bytes in each round-trip. Also, :cite[rfc6648] advises against + use of `X-` and similar constructs in new protocols. + +4. The decision to not implement a single preset pack with predefined behavior, + instead of separate parameters for order and duplicates (dups), was driven + by considerations of ambiguity and potential future problems when adding + more determinism to responses. For instance, if we were to include a new + behavior like `foo=y|n` alongside an existing preset like `pack=orderdfs+dupsy`, + it would either necessitate the addition of a separate parameter or impose + the adoption of a new version of every preset (e.g., `orderdfs-dupsy+fooy` and + `orderdfs+dupsy+foon`). Maintaining and deploying such changes across a + decentralized ecosystem, where gateways may operate on different software, + becomes more complex. In contrast, utilizing separate parameters for each + behavior enables easier maintenance and deployment in a decentralized + ecosystem with varying gateway software. + +The proposed solution of negotiating the block order through headers is +future-proof, allows for flexibility, interoperability, and customization while +maintaining compatibility with existing implementations. + +## Test fixtures + +Implementation compliance can be determined by testing the negotiation process +between clients and Trustless Gateways using various combinations of `order` and +`dups` parameters. + +Relevant tests were added to +[gateway-conformance](https://github.com/ipfs/gateway-conformance) test suite +in [#87](https://github.com/ipfs/gateway-conformance/pull/87), and include the below fixture. + +- `bafybeihchr7vmgjaasntayyatmp5sv6xza57iy2h4xj7g46bpjij6yhrmy` + ([CAR](https://github.com/ipfs/gateway-conformance/raw/v0.3.0/fixtures/trustless_gateway_car/dir-with-duplicate-files.car)) + - An UnixFS directory with two files that are the same (same CID). + - If `dups=n`, then there should be no duplicate blocks in the returned CAR. + - If `dups=y`, then the blocks of the file are sent twice. + - The same fixture can be used for testing `order=dfs` and checking if blocks that belong to files arrive in the DFS order. + - It is encouraged to also test DFS order with HAMT fixture such as `bafybeidbclfqleg2uojchspzd4bob56dqetqjsj27gy2cq3klkkgxtpn4i` + ([CAR](https://github.com/ipfs/gateway-conformance/raw/v0.3.0/fixtures/trustless_gateway_car/single-layer-hamt-with-multi-block-files.car)) + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).