Partial CAR Support on Trustless Gateways
:cite[trustless-gateway] solves the verifiability problem in HTTP contexts, and allows for inexpensive retrieval and caching in a way that is not tied to a specific service, library or IPFS implementation. This IPIP improves the performance related to trustless HTTP gateways by adding additional capabilities to requests for CAR files. The goal is to enable a client capable of translating/decodin
No reviewsSpecification
Summary
Add path and partial CAR response support to the :cite[trustless-gateway].
Motivation
:cite[trustless-gateway] solves the verifiability problem in HTTP contexts, and allows for inexpensive retrieval and caching in a way that is not tied to a specific service, library or IPFS implementation.
This IPIP improves the performance related to trustless HTTP gateways by adding additional capabilities to requests for CAR files.
The goal is to enable a client capable of translating/decoding CAR files to make a single request to a trustless gateway that in most case allows them to render the same output generated via a request to a trusted gateway (or, if not in a single request, as few requests as possible).
Save round-trips, allow more efficient resume and parallel downloads.
Detailed design
The solution is to allow the :cite[trustless-gateway] to support partial responses by:
-
allowing for requesting sub-paths within a DAG, and getting blocks necessary for traversing all path segments for end-to-end verification
-
opt-in
dag-scopeparameter that allows for narrowing down returned blocks to ablock,entity(a logical IPLD entity, such as a file, directory, CBOR document), orall(default) -
opt-in
entity-bytesparameter that allows for returning only a subset of blocks within a logical IPLD entity
Details are in :cite[trustless-gateway].
Design rationale
The approach here is pragmatic: we add a minimum set of predefined CAR export
scopes based on real world needs, as a product of the rough consensus across
key stakeholders: gateway operators, Project Saturn, Project Rhea (ipfs.io
and dweb.link), light clients such as Capyloon and IPFS in Chromium, and
gateway implementation in boxo/gateway library (Go).
Terse rationale for each feature:
-
Including blocks necessary for traversing parents when a sub-path is present makes a better default, as it produces verifiable archive that does not require any follow-up requests. The response is always enough to verify end-to-end and reject any unexpected / invalid blocks.
-
The ability to narrow down CAR response based on logical scope or specific byte range within an entity comes directly from the types of requests existing path gateways need to handle.
dag-scope=blockallows for resolving content paths to the final CID, and learn its type (unixfs file/directory, or a custom codec)dag-scope=entitycovers the majority of website hosting needs (returning a file, enumerating directory contents, or any other IPLD entity)dag-scope=allreturns all blocks in a DAG: was the existing behavior and remains the implicit defaultentity-bytes=from:toenables efficient, verifiable analog to HTTP Range Requests (resuming downloads or seeking within bigger files, such as videos)fromandtomatch the behavior of HTTP Range Requests.
User benefit
-
Trustless HTTP Clients will be able to fetch a CAR with a file, byte range, or a directory enumeration using a way lower number of HTTP requests, which will translate to improved resource utilization, longer battery time on mobile, and lower latency due to lower number of round trips.
-
CAR files downloaded from HTTP Gateways will always be end-to-end verifiable. In the past, user had to manually ensure they have blocks for all path segments. With this IPIP, the CAR will always include parent blocks when a file located on a sub-path is requested.
-
Creating a standard way of fetching partial CAR over HTTP enables a diverse set of clients and gateway services to interoperate, and reuse libraries:
- Service Worker Gateway based on Helia
- IPFS in Chromium
- boxo/gateway provides HTTP Gateway implementation for
-
Trustless Gateway is solidified as the ecosystem wide standard.
-
IPIP tests added to gateway-conformance test suite and fixtures listed at the end of this IPIP make it easier to implement or operate a conformant gateway, and reduce maintenance costs.
-
End users are empowered with primitives and tools that reduce retrieval cost, encourage self-hosting, or make validation of conformance claims of free or commercial gateways possible.
-
Compatibility
CAR responses with blocks for parent path segments
In order to serve CAR requests for content paths other than just a CID root in a trustless manner, we are requiring the gateway to return intermediate blocks from the CID root to the path terminus as part of the returned CAR file.
HTTP Gateway implementations are currently only returning blocks starting at the end of the content path, which means an implementation of this IPIP will introduce additional blocks required for verifying.
As long the client was written in a trustless manner, and follows ring and was discarding unexpected blocks, this will be a backward-compatible change.
CAR format with entity-bytes and dag-scope parameters
These parameters are opt-in, which means no breaking changes.
Gateways ignore unknown URL parameters. A client sending them to a gateway that does not implement this IPIP will get all blocks for the requested DAG.
CAR roots and determinism
As of 2023-06-20, the behavior of the roots CAR field remains an unresolved item within the CARv1 specification:
Regarding the roots property of the Header block:
- The current Go implementation assumes at least one CID when creating a CAR
- The current Go implementation requires at least one CID when reading a CAR
- The current JavaScript implementation allows for zero or more roots
- Current usage of the CAR format in Filecoin requires exactly one CID
[..]
It is unresolved how the roots array should be constrained. It is recommended that only a single root CID be used in this version of the CAR format.
A work-around for use-cases where the inclusion of a root CID is difficult but needing to be safely within the "at least one" recommendation is to use an empty CID:
\x01\x55\x00\x00(zero-length "identity" multihash with "raw" codec). Since current implementations for this version of the CAR specification don't check for the existence of root CIDs (see Root CID block existence), this will be safe as far as CAR implementations are concerned. However, there is no guarantee that applications that use CAR files will correctly consume (ignore) this empty root CID.
Due to the inconsistent and non-deterministic nature of CAR implementations, the gateway specification faces limitations in providing specific recommendations. Nevertheless, it is crucial for implementations to refrain from making implicit assumptions based on the legacy behavior of individual CAR implementations.
Due to this, gateway specification changes introduced in this IPIP clarify that:
- The CAR
rootsbehavior is out of scope and flags that clients MAY ignore it. - CAR determinism is not present by default, responses may differ across requests and gateways.
- Opt-in determinism is possible, but standardized signaling mechanism does not exist until we have IPIP-412 or similar.
Security
This IPIP allows clients to narrow down the amount of data returned as a CAR, and introduces a need for defensive programming when the feature set of the remote gateway is unknown.
To avoid denial of service, and resource starvation, clients should probe if the gateway supports features described in this IPIP before requesting data, to avoid fetching big DAGs when only a small subset of blocks is expected.
Following the robustness principle, invalid, duplicate or unexpected blocks should be discarded.
Alternatives
Below are alternate designs that were considered, but decided to be out of scope for this IPIP.
Arbitrary IPLD Selectors
Passing arbitrary selectors was rejected due to the implementation complexity, risks, and weak value proposition, as discussed during IPFS Thing 2022
Additional "Web" Scope
A request for
/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki/?format=car&dag-scope=entity
returns all blocks required for enumeration of the big HAMT /wiki directory,
and then an additional request for index.html needs to be issued.
Website hosting use case could be made more efficient if gateway returned a CAR
with index.html instead of all blocks for directory enumeration. The server
already did the work: it knows the entity is a directory, already parsed it, it
knows it has child entity named index.html, and everyone would pay a lower cost due
to lower number of blocks being returned in a single round-trip, instead of two.
Rhea/Saturn projects requested this to be out of scope for now, but this "web" entity scope could be added in the future, as a follow-up optimization IPIP.
Requesting specific DAG depth
Blindly requesting specific DAG depth did not translate to any type of
requests web gateways like ipfs.io or one in Brave browser have to handle.
It is impossible to know if some entity on a sub-path is a file or a directory, without sending a probe for the root block, which introduces one round-trip overhead per entity.
This problem is not present in the case of dag-scope=entity, which shifts the
decision to the server, and allows for fetching unknown UnixFS entity with a
single request.
Test fixtures
Relevant tests were added to
gateway-conformance test suite
in #56 and
#85.
A detailed list of compliance checks for dag-scope and entity-bytes can be found in
v0.2.0/trustless_gateway_car_test.go or later.
Below are CIDs, CARs, and short summary of each fixture.
Testing pathing
The main utility of this scope is saving round-trips when retrieving a specific entity as a member of a bigger DAG.
To test, request a small file that fits in a single block from a sub-path. The returned CAR MUST include both the block with the file data and all blocks necessary for traversing from the root CID to the terminating element (all parents, root CID and a subdirectory below it).
:::example
Sample fixtures:
-
bafybeietjm63oynimmv5yyqay33nui4y4wx6u3peezwetxgiwvfmelutzufromsubdir-with-two-single-block-files.carfor testing/ipfs/dag-pb-cid/subdir/ascii.txt?format=car(UnixFS file in a subdirectory) -
bafybeidbclfqleg2uojchspzd4bob56dqetqjsj27gy2cq3klkkgxtpn4ifromsingle-layer-hamt-with-multi-block-files.carfor testing/ipfs/dag-pb-hamt-cid/686.txt?format=car(UnixFS file on a path within HAMT-sharded parent directory) -
bafybeia264q44a3kmfc2otctzu4egp2k235o3t7mslz2yjraymp4nv6asifromdir-with-dag-cbor-with-links.carfor testing/ipfs/dag-cbor-cid/document?format=car(UnixFS file on a path with DAG-CBOR root CID)
:::
Testing dag-scope=block
The main utility of this scope is resolving content paths. This means a CAR response with blocks related to path traversal, and the root block of the terminating entity.
To test real world use, request UnixFS file or a directory from a sub-path.
The returned CAR MUST include blocks required for path traversal and ONLY the
root block of the terminating entity.
:::example
Sample fixtures:
-
bafybeietjm63oynimmv5yyqay33nui4y4wx6u3peezwetxgiwvfmelutzufromsubdir-with-two-single-block-files.carfor testing/ipfs/dag-pb-cid/subdir/ascii.txt?format=car&dag-scope=block(UnixFS file in a subdirectory)/ipfs/dag-pb-cid?format=car&dag-scope=block(UnixFS directory)
-
bafybeidbclfqleg2uojchspzd4bob56dqetqjsj27gy2cq3klkkgxtpn4ifromsingle-layer-hamt-with-multi-block-files.carfor testing:/ipfs/dag-pb-hamt-cid/1.txt?format=car&dag-scope=block(UnixFS multi-block file on a path within HAMT-sharded parent directory)/ipfs/dag-pb-hamt-cid?format=car&dag-scope=block(UnixFS HAMT-sharded directory)
:::
Testing dag-scope=entity
The main utility of this scope is retrieving all blocks related to a meaningful IPLD entity. Currently, the most popular entity types are:
-
UnixFS
file(blocks for all chunks with file data) -
UnixFS
directory(blocks for the directory node, allowing its enumeration; no root blocks for any of the child entities). -
raw/dag-cbor(block with raw data or DAG-CBOR document, potentially linking to other CIDs)
:::example
Sample fixtures:
-
bafybeidh6k2vzukelqtrjsmd4p52cpmltd2ufqrdtdg6yigi73in672fwufromsubdir-with-mixed-block-files.carfor testing:/ipfs/dag-pb-cid/subdir/multiblock.txt?format=car&dag-scope=entity(UnixFS multi-block file in a subdirectory)/ipfs/dag-pb-cid/subdir?format=car&dag-scope=entity(UnixFS directory)
-
bafybeidbclfqleg2uojchspzd4bob56dqetqjsj27gy2cq3klkkgxtpn4ifromsingle-layer-hamt-with-multi-block-files.carfor testing:/ipfs/dag-pb-hamt-cid/1.txt?format=car&dag-scope=entity(UnixFS multi-block file on a path within HAMT-sharded parent directory, returned blocks MUST be enough to deserialize the file)/ipfs/dag-pb-hamt-cid?format=car&dag-scope=entity(UnixFS HAMT-sharded directory, response MUST include the minimal set of blocks required for enume
[Content truncated — view full spec at source]
Discussion (0 threads)
Loading discussions...