Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: concept doc for PoDSI and how to for retrieving filecoin info #58

Merged
merged 11 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/pages/docs/concepts/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"architecture-options": "Architecture options",
"car": "Content Archive (CAR) files",
"filecoin-storage": "Filecoin",
"podsi": "Proof of Data Segment Inclusion (PoDSI)",
"ipfs-gateways": "IPFS Gateways",
"did": {
"display": "hidden"
Expand Down
122 changes: 122 additions & 0 deletions src/pages/docs/concepts/podsi.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
import { Callout, Steps } from 'nextra/components'

# Proof of Data Segment Inclusion (PoDSI)

Data uploaded to web3.storage is aggregated together with data from other users of the service. When an aggregate is big enough, it is stored with multiple Filecoin Storage Providers.
alanshaw marked this conversation as resolved.
Show resolved Hide resolved

The web3.storage service uses **Proof of Data Segment Inclusion** (PoDSI), which allows clients to verify the correct aggregation of their data and prove this fact to third parties.

## Shard CIDs

Data uploaded to web3.storage is packed up and sent as CAR files. We call these _shards_, and each one is referenced by it's shard CID.

```sh
# example CAR shard CID
bagbaieralcueppbj7cpxhlsfuokxatqzqdutb47mgs44myg7dsmktsb34zxa
```

The IPLD codec for a CAR shard CID is `0x0202`. You can [inspect this CID on cid.ipfs.tech](https://cid.ipfs.tech/#bagbaieralcueppbj7cpxhlsfuokxatqzqdutb47mgs44myg7dsmktsb34zxa). It is **not** your content root CID - that's a different CID that refers to the root node of a DAG that has been built from your data. The shard CID is a hash of the (CAR) file your DAG has been packed into.

<Callout type="info">
Your content may be split between 1 or more shards.
</Callout>

## Piece CIDs

Piece CIDs are the primary means of referencing data stored in a sector of a Filecoin Storage Provider. Each piece CID is loosely equivalent to a corresponding [shard CID](#shard-cids).

The piece CIDs used in web3.storage are v2 piece CIDs since these also encode tree _height_ information. On chain and in various chain explorers online you may see v1 piece CIDs displayed. You can convert from v2 to v1 but not v1 to v2, unless you also know the tree height.

[FRC-0069 documentation for Piece CID v2](https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0069.md).

```sh
# example piece v2 CID
bafkzcibe52kq2dtip2bmrrw5qhphsa35onsxxvkuxl33dnotq2allfpz7tdxlhc5di
```

The IPLD codec for the above multihash is `0x1011` (fr32-sha2-256-trunc254-padded-binary-tree). You can [inspect this CID on cid.ipfs.tech](https://cid.ipfs.tech/#bafkzcibe52kq2dtip2bmrrw5qhphsa35onsxxvkuxl33dnotq2allfpz7tdxlhc5di).

## Proof of Data Segment Inclusion (PoDSI)

PoDSI enables clients using data aggregation services like web3.storage to verify the correct aggregation of their data and allow proving of this fact to third parties.

Put simply, it is a proof that a smaller piece (a segment) has been included in a larger piece (an aggregate).

[FRC-0058 documentation for PoDSI](https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0058.md).

```sh
# example merkle proof showing path from aggregate (piece) CID to segment (piece) CID
bafkzcibcaapbrpjpxk32treyrtw5kamyh5ayxoj7rp4obkeoloydktubycnkufy
└─┬ bafkzcibcaapie6hlazrzph5ui3dx2xxhkie3qbju35shf2bhsdhocrgbp5i2opq
└─┬ bafkzcibcaaorabhed2eafo5l7xhwiqycjqp24dfidpxtanhdi53uz33anbjpsma
└─┬ bafkzcibcaaojol2me7hy6z2egwmr24otf3pklb2ybxu6qo7fqmwj7avmqise2mi
└─┬ bafkzcibcaans72xvshaijv5ew3nnr7ytn32fesfhwkmh3bm42uqtmmckim3jcdy
└─┬ bafkzcibcaanc4f4qf2fgmmmt6svgqkk3bkgek3xcubrp6zliq4ldtuhcgt3lcmi
└─┬ bafkzcibcaam5ab7crtl3upsf24prtm7egprc4uamwmdy65or5ttx3tvtiipfypi
└─┬ bafkzcibcaamdx5p77d4mvlhalb4le4l2mdgkivjqnr4y6yuccdyvda72wctwmoy
└─┬ bafkzcibcaal6br6ujd6rlazb4grghepfitehzhxowh7zsbte33gvvcvzetnuuaa
└─┬ bafkzcibcaalnrywyaaduk63o6vbewveipgvgn4hd6fbx7z3hfyrcxiakj2bzefa
└─┬ bafkzcibcaakuwcl737abdrpbanbtyhzrwwlqzjvshenvyfda5d66zbnkyuf5cey
└─┬ bafkzcibcaakdjuz3vxhpsh4z5enxfsphdquyufqki4jwfq6xwmpmzyzzvkzicfi
└─┬ bafkzcibcaajvatl2odwkinga5qdngzqhxkuv4np2vjmm2yzia6el3zktd2lc6ki
└─┬ bafkzcibcaajk3bjkzabhi53fmdmc2rgmbqhanpkmxoqrg6nvd4lmomu4klvc2lq
└─┬ bafkzcibcaaiua66kdlvtfpgbrhoyha32wemunseubsszahixj2io2zjvthxxypq
└─┬ bafkzcibcaaift45zx5rhxe4tucqjwpekucnij54onv4n52njju4zg7e2insaiea
└── bafkzcibe52kq2dtip2bmrrw5qhphsa35onsxxvkuxl33dnotq2allfpz7tdxlhc5di
```

<Callout type="info">
The above example does not visualize _all_ the information that a PoDSI contains, just the direct path from aggregate (piece) CID to segment (piece) CID.
</Callout>

## Data Aggregation Proof

A data aggregation proof is a [PoDSI](#proof-of-data-segment-inclusion-podsi), _plus_ information that ties an aggregate piece to a Filecoin Storage Provider. At time of writing this one or more **Deal ID**s.

## Verifiable Aggregation Pipeline

The web3.storage aggregation pipeline is fully verifiable thanks to [UCAN](/docs/concepts/ucans-and-web3storage)s. Your piece can be tracked through the pipeline via signed UCAN receipts.

There are 4 roles in the aggregation pipeline:

1. **Storefront** - facilitates data storage services to applications and users, getting the requested data stored into Filecoin deals asynchronously.
1. **Aggregator** - aggregates smaller data (Filecoin Pieces) into a larger piece that can effectively be stored with a Filecoin Storage Provider.
1. **Dealer** - arranges deals with Filecoin Storage Probviders for the aggregates.
1. **Deal Tracker** - follows the filecoin chain to keep track of successful deals.

Roughly speaking, a piece progresses through the pipeline via the following stages:

<Steps>
### `filecoin/offer`

The client submits a piece to the _storefront_ (web3.storage) for aggregation and storage in Filecoin storage providers. The reciept for this invocation contains two links for async tasks:
alanshaw marked this conversation as resolved.
Show resolved Hide resolved

1. `filecoin/submit` - allows the client to continue following the receipt chain through the aggregation pipeline. It is executed after the _storefront_ has verified the piece CID corresponds to the shard CID.
1. `filecoin/accept` - a "short cut" to the end of the pipeline, where the data aggregation proof will eventually become available. It is executed when the _dealer_ has successfully stored an aggregate containing the submitted piece in one or more Filecoin Storage Providers.

### `filecoin/submit`

The _storefront_ issues a receipt for `filecoin/submit` to indicate it has verified the offered piece and submitted it to the pipeline. The reciept for this invocation contains a link for an async task `piece/offer`, which is executed when the storefront offers the piece to an _aggregator_.
alanshaw marked this conversation as resolved.
Show resolved Hide resolved

### `piece/offer`

The _aggregator_ issues a receipt for `piece/offer` when the storefront offers a piece to be aggregated. The reciept contains a link for an async task `piece/accept`, which is executed when the piece has been included in an aggregate.
alanshaw marked this conversation as resolved.
Show resolved Hide resolved

### `piece/accept`

The _aggregator_ issues `piece/accept` receipts when an aggregate is big enough. _Every_ piece in the aggregate is issued a receipt which includes a [Proof of Data Segment Inclusion (PoDSI)](#proof-of-data-segment-inclusion-podsi). The reciept contains a link for an async task `aggregate/offer`, which is executed when the _aggregator_ offers the aggregate to a _dealer_.
alanshaw marked this conversation as resolved.
Show resolved Hide resolved

### `aggregate/offer`

The _dealer_ issues an `aggregate/offer` receipt when the aggregator offers a piece to be stored by Filecoin Storeage Providers. The reciept contains a link for an async task `aggregate/accept`, which is executed when the aggregate has been stored by at least one Filecoin Storage Provider.
alanshaw marked this conversation as resolved.
Show resolved Hide resolved

### `aggregate/accept`

The _dealer_ issues an `aggregate/accept` receipt when an aggregate has been stored by at least one Filecoin Storage Provider. The receipt includes information that ties an aggregate to a Storage Provider which is used in the next step to create a [Data Aggregation Proof](#data-aggregation-proof).

### `filecoin/accept`

The _storefront_ periodically checks for an `aggregate/accept` receipt for offered aggregates. When an aggregate is accepted, the _storefront_ issues `filecoin/accept` receipts for each piece in the aggregate. The receipt includes a [Data Aggregation Proof](#data-aggregation-proof). This is the end of the pipeline.
</Steps>

[Spec for w3filecoin pipeline](https://github.com/web3-storage/specs/blob/main/w3-filecoin.md).
13 changes: 13 additions & 0 deletions src/pages/docs/glossary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,19 @@
When delegating permissions, the issuer is the agent who currently possesses
the given permissions and is delegating them to another agent (the audience).

- id: podsi
name: PoDSI
definition: Proof of Data Segment Inclusion.
details: |
PoDSI enables clients using data aggregation services like web3.storage to
verify the correct aggregation of their data and allow proving of this fact
to third parties.

Put simply, it is a proof that a smaller piece (a segment) has been
included in a larger piece (an aggregate).

https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0058.md

- id: space
definition: A unique identifier that acts as a "namespace" for uploaded content.
details: |
Expand Down
1 change: 1 addition & 0 deletions src/pages/docs/how-to/_filecoin-info.js
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
export const items = ['JS client', 'CLI']
3 changes: 2 additions & 1 deletion src/pages/docs/how-to/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@
"receipts": {
"title": "Query UCAN receipts",
"display": "hidden"
}
},
"filecoin-info": "Get Filecoin info"
}
88 changes: 88 additions & 0 deletions src/pages/docs/how-to/filecoin-info.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
import { Callout, Tabs } from 'nextra/components'
import { items } from './_filecoin-info.js'

# Get Filecoin information for a Piece

To retrieve a [Data Aggregation Proof](/docs/concepts/podsi#data-aggregation-proof) (including [PoDSI](/docs/concepts/podsi)) you can issue a `filecoin/info` invocation to the web3.storage service.

<Tabs items={items}>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow this is not in the deployed version

<Tabs.Tab>
Piece CIDs are calculated by the client before data is uploaded and are made available in the `onShardStored` callback of `uploadFile`, `uploadDirectory` or `uploadCAR`.

<Callout type="info">
If you did not store the piece CID when your content was uploaded you can query [content claims](https://www.npmjs.com/package/@web3-storage/content-claims) for an equivalency claim.
</Callout>

Later, you can use the piece CID to retrieve a [Data Aggregation Proof](/docs/concepts/podsi#data-aggregation-proof):

```js
const result = await client.capability.filecoin.info(piece)
```

The result contains aggregate piece CIDs and inclusion proofs (PoDSI):

```js
for (const { aggregate, inclusion } of result.ok.aggregates) {
console.log(`Aggregate CID: ${aggregate}`)
console.log(`Inclusion Proof: ${inclusion}`)
}
```

Inclusion proofs can be verified using the `@web3-storage/data-segment` library:

```js
import { Proof, Piece } from '@web3-storage/data-segment'

const result = Proof.verify(inclusion.subtree, {
tree: Piece.fromLink(aggregate).root,
node: Piece.fromLink(piece).root
})
```

The result also contains details of storage providers and deals the aggregate appears in:

```js
for (const { provider, aux } of result.ok.deals) {
console.log(`Storage Provider: f0${provider}`)
console.log(`Deal ID: ${aux.dataSource.dealID}`)
}
```
</Tabs.Tab>
<Tabs.Tab>
Piece CIDs are calculated by the client before data is uploaded and will be printed to the terminal if you pass the `--verbose` option to `w3 up`.

<Callout type="info">
If you did not store the piece CID when your content was uploaded you can query [content claims](https://www.npmjs.com/package/@web3-storage/content-claims) for an equivalency claim.
</Callout>

Later, you can use the piece CID to print out [Data Aggregation Proof](/docs/concepts/podsi#data-aggregation-proof) information:

```sh
$ w3 can filecoin info bafkzcibe52kq2dtip2bmrrw5qhphsa35onsxxvkuxl33dnotq2allfpz7tdxlhc5di

Piece CID: bafkzcibe52kq2dtip2bmrrw5qhphsa35onsxxvkuxl33dnotq2allfpz7tdxlhc5di
Deals:
Aggregate: bafkzcibcaapbrpjpxk32treyrtw5kamyh5ayxoj7rp4obkeoloydktubycnkufy
Provider: 1392893
Deal ID: 65895671

Aggregate: bafkzcibcaapbrpjpxk32treyrtw5kamyh5ayxoj7rp4obkeoloydktubycnkufy
Provider: 1771403
Deal ID: 65895759

Aggregate: bafkzcibcaapbrpjpxk32treyrtw5kamyh5ayxoj7rp4obkeoloydktubycnkufy
Provider: 97777
Deal ID: 65903995

Aggregate: bafkzcibcaapbrpjpxk32treyrtw5kamyh5ayxoj7rp4obkeoloydktubycnkufy
Provider: 717969
Deal ID: 65922477

Aggregate: bafkzcibcaapbrpjpxk32treyrtw5kamyh5ayxoj7rp4obkeoloydktubycnkufy
Provider: 20378
Deal ID: 65929686
```


</Tabs.Tab>
</Tabs>
Loading