Skip to content

Commit

Permalink
Native git support: lsRefs(), sparseCheckout(), GitPathControl (#1764)
Browse files Browse the repository at this point in the history
## Motivation

Related to #1787

Adds a set of TypeScript functions that support the native git protocol
and can power a sparse checkout feature. This is the basis for a faster,
more user-friendly git integration. No more guessing repository paths.
Just provide the repo URL, browse the files, and tell Playground which
directories are plugins, themes, etc.

Technically, this PR performs [git sparse checkout using just
JavaScript](https://adamadam.blog/2024/06/21/cloning-a-git-repository-from-a-web-browser-using-fetch/page/1)
and a generic CORS proxy.

**This PR doesn't provide any user-facing feature yet.** However, it
paves the way to features like:

* Checkout any git repo, even non-GitHub ones, without going through the
OAuth flow
* Retrieve a subset of the files directly from the repo and without
going through zipballs.
* Provide a visual git repo browser (instead of asking the user to
manually type the path)
* Introduce a new Blueprint resource type: git repo
* Fetch the names of all the repository branches (or just the branches
with the specified prefix)
* (future) commit and push to any git repo, even non-GitHub ones

## Notable points of this PR

* Exposes the `sparseCheckout()`, `lsRefs()`, and `listFiles()`
functions from the `@wp-playground/storage` package. I'm not yet sure
whether we need a dedicated `@wp-playground/git` package or not.
* Ships basic unit test coverage for those functions.
* Silences a few warnings in the CORS proxy. CC @brandonpayton we may
not want to do that in the production release.
* Adds `isomorphic-git` as a git submodules in the `/isomorphic-git`
path. We can't rely in the published npm package because it doesn't
export the internal APIs we need to use here.
* Adds a bunch of WIP components in `@wp-playground/components`. They're
not used anywhere on the website yet and I'd rather keep them moving
with the project than isolate them in a PR until they're perfect. We'll
need some accessibility and mobile testing before using them in the
webapp, though.

## How does it even work?

Let me quote [my own
article](https://adamadam.blog/2024/06/21/cloning-a-git-repository-from-a-web-browser-using-fetch/):

### Running a Git Client in the browser

The good news was
[isomorphic-git](https://github.com/isomorphic-git/isomorphic-git),
[wasm-git](https://github.com/petersalomonsen/wasm-git), and a few other
projects were already running Git in the browser. The bad news was none
of them supported fetching a subset of files via [sparse
checkout](https://git-scm.com/docs/git-sparse-checkout). You’d still
have to download 20MB of data even if you only wanted 100KB.

However, Everything the desktop Git client does, including sparse
checkouts, can be done via
[HTTP](https://git-scm.com/docs/http-protocol/2.5.6) by requesting URLs
like
[https://github.com/WordPress/wordpress-playground.git](https://github.com/isomorphic-git/isomorphic-git.git).

Git [documentation](https://git-scm.com/) was… less than helpful, but
eventually it worked! A few hours later I was running Git commands by
sending GET and POST requests to the repository-URLs.

### Fetching a hash of the branch

The first command I needed was ls-refs to get the SHA1 hash of the right
git branch. Here’s how you can get it with fetch() for the HEAD branch
of the WordPress/wordpress-playground repo:

```ts
const response = await fetch(
  'https://github.com/WordPress/gutenberg.git/git-upload-pack',
  {
    method: 'POST',
    headers: {
        'Accept': 'application/x-git-upload-pack-advertisement',
        'content-type': 'application/x-git-upload-pack-request',
        'Git-Protocol': 'version=2'
    },
    body: [
        `0014command=ls-refs\n`,
      // ^^^^ line length in hex
        `0015agent=git/2.37.3\n`,
        `0017object-format=sha1\n`,
        '0001',
      // ^^^^ command separator
        // Filter the results to only contain the HEAD branch,
        // otherwise it will return all the branches and
        // tags which may require downloading many 
        // megabytes of data:
        `0009peel\n`,
        `0014ref-prefix HEAD\n`,
        '0000',
      // ^^^^ end of request
    ].join(""),
  }
);
```

I won’t go into details of the Git protocol – the point is with a few
special headers and lines you can be a Git client. If you paste that
fetch() in your devtools while on GitHub.com, it would return a response
similar to this:

```
0032950f5c8239b6e78e9051ec5e845bac5aa863c4cb HEAD
0000
```

Good! That’s our commit hash.

Fetching a list of objects at a specific commit
With this, we can fetch [the list of
objects](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects) in
that branch:

```ts
fetch("https://github.com/wordpress/gutenberg/git-upload-pack", {
  "headers": {
    "accept": "application/x-git-upload-pack-advertisement",
    "content-type": "application/x-git-upload-pack-request",
  },
  "referrer": "http://localhost:8000/",
  "referrerPolicy": "strict-origin-when-cross-origin",
  "body": [
      `0088want 950f5c8239b6e78e9051ec5e845bac5aa863c4cb multi_ack_detailed no-done side-band-64k thin-pack ofs-delta agent=git/2.37.3 filter \n`,
      `0015filter blob:none\n`,
      // ^ sparse checkout secret says.
      // only fetches a list of objects without
      // their content
      `0035shallow 950f5c8239b6e78e9051ec5e845bac5aa863c4cb\n`,
      `000ddeepen 1\n`,
      `0000`,
      `0009done\n`,
      `0009done\n`,
  ].join(""),
  "method": "POST"
});
```

And here’s the response:

```
00000008NAK
0026�Enumerating objects: 2189, done.
0025�Counting objects:   0% (1/2189)
...
0032�Compressing objects: 100% (1568/1568), done.
2004�PACK��(binary data)
0040 Total 2189 (delta 1), reused 1550 (delta 0), pack-reused 0
0006��0000
```

The binary data after PACK is a compressed list of all objects the
repository had at commit `950f5c8239b6e78e9051ec5e845bac5aa863c4cb`. It
is not a list of files that were committed in `950f5c`. It’s all files.

The [pack format](https://git-scm.com/docs/pack-format) is a binary
blob. It’s similar to
[ZIP](https://en.wikipedia.org/wiki/ZIP_(file_format)) in that it
encodes of a series of objects encoded as a binary header followed by
binary data. Here’s an approximate visual to help grok the idea:

```
PACK format – inaccurate explanation,
Pack consists of the string "PACK" and binary data structured roughly as follows:

 ___________________________________
|                                   |
|        ASCII string "PACK"        |
|        Binary data starts         |
|           Pack Header             |
|___________________________________|
|                                   |
|        Offset 0x0010              |
|          Object 1 Header          |  (Object type, hash,
|                                   |   data length, etc.)
|        ________________           |
|       |                |          |
|       |  Object 1 Data |          |  (Gzipped data)
|       |________________|          |
|                                   |
|        Offset 0x0050              |
|          Object 2 Header          |  
|                                   | 
|        ________________           |
|       |                |          |
|       |  Object 2 Data |          |  (Gzipped data)
|       |________________|          |
|___________________________________|
|                                   |
|           Pack Footer             |
|         Binary data ends          |
|___________________________________|
```

The decoding is tedious so I used [the
decoder](https://github.com/isomorphic-git/isomorphic-git/blob/main/src/models/GitPackIndex.js)
provided by isomorphic Git package:

```ts
const iterator = streamToIterator(await response.body);
const parsed = await parseUploadPackResponse(iterator);
const packfile = Buffer.from(await collect(parsed.packfile));

const index = await GitPackIndex.fromPack({
    pack: packfile
});
```

The parsed index object provides information about all the objects
encoded in the received packfile. Let’s peek inside:

```
{
  // ...
  "hashes": [
    "5f4f0a5367476fdb7c98ffa5fa35300ec4c3f48b",
    "950f5c8239b6e78e9051ec5e845bac5aa863c4cb",
    // ...
  ],
  "offsets": {
    "5f4f0a5367476fdb7c98ffa5fa35300ec4c3f48b": 12,
    "950f5c8239b6e78e9051ec5e845bac5aa863c4cb": 181,
    // ...
  },
  "offsetCache": {
    "12": {
      "type": "tree",
      "object": "100644 async-http-download.php\u0000��p4��\u0014�g\u0015i��\u0004��\\���100644 async-http.php\u0000�\n�8K�RT������F\u001b8�� (more binary data)"
    },
    // ...
  },
  "readDepth": 4,
  "externalReadDepth": 0
}
```

Each object has a type and some data. The decoder stored some objects in
the offsetCache, and kept track of others in form of a hash => offset in
packfile mapping.

Let’s read the details of the commit from our parsed index:

```ts
> const commit = await index.read({
    oid: '950f5c8239b6e78e9051ec5e845bac5aa863c4cb'
  });

{
  "type": "commit",
  "object": "tree c7b8440c83b8c987895f9a1949650eb60bccd2ec\nparent b6132f2d381865353e09edf88aa64a0dd042811a\nauthor Adam Zieliński <[email protected]> 1717689108 +0200\ncommitter Adam Zieliński <[email protected]> 1717689108 +0200\n\nUpdate rebuild workflow\n"
}
```

It’s the object type, the hash, and the uncompressed object bytes which,
in this case, provide us commit details in a specific microformat. From
here, we can get the tree hash and look for its details in the same
index we’ve already downloaded:

```ts
> const tree = await index.read({ oid: "c7b8440c83b8c987895f9a1949650eb60bccd2ec" })

{
  "type": "tree",
  "object": "40000 .github\u0000_O\nSgGo�|����50\u000e���40000 (... binary data ...)"
}
```

The contents of the tree object is a list of files in the repository.
Just like with commit, tree details are encoded in their own
microformat. Luckily, isomorphic-git ships relevant decoders:

```ts
> GitTree.from(result.object).entries()
[
  {
    "mode": "040000",
    "path": ".github",
    "oid": "ece277ec006eb517d5c5399d7a5c00b7e61018f1",
    "type": "blob"
  },
  {
    "mode": "100644",
    "path": "readme.txt",
    "oid": "3fe6e3aaf1dc4df204be575041383fc8e2e1e070",
    "type": "blob"
  },
  {
    "mode": "040000",
    "path": "src",
    "oid": "dbc84f20ee64fbd924617b41ee0e66128c9a8d97",
    "type": "tree"
  },
  // ...
]
```

Yay! That’s the list of files and directories in the repository root
with there hashes! From here we can recursively retrieve the ones
relevant for our sparse checkout.

### Fetching full files from specific paths

We’re finally ready to checkout a few particular paths. Let’s ask for a
blob at readme.txt and a tree at docs/tools:

```ts
const response = fetch("https://github.com/wordpress/gutenberg/git-upload-pack", {
  "headers": {
    "accept": "application/x-git-upload-pack-advertisement",
    "content-type": "application/x-git-upload-pack-request",
  },
  "body": [
      `0081want 28facb763312f40c9ab3251fb91edb87c8476cf9 multi_ack_detailed no-done side-band-64k thin-pack ofs-delta agent=git/2.37.3\n`,
      `0081want 3fe6e3aaf1dc4df204be575041383fc8e2e1e070 multi_ack_detailed no-done side-band-64k thin-pack ofs-delta agent=git/2.37.3\n`,
      `00000009done`
  ].join(""),
  "method": "POST"
});
```

The response is another index, but this time each blob comes with binary
contents. Some decoding and recursive processing later, we finally get
this:

```ts
{
    "readme.txt": "=== Gutenberg ===\nContri (...)",
    "docs/tool": {
        "index.js": "/**\n * External depe (...)",
        "manifest.js": "/* eslint no-console (...)"
    }
}
```

Yay! It took some effort, but it was worth it!

###  Cors proxy and other notes

You’ll still need to run a CORS proxy. The fetch() examples above will
work if you try them in devtools on github.com, but you won’t be able to
just use them on your site. Git API typically does not expose the
Access-Control-* headers required by the browser to run these requests.

So we need a server after all. Was this a failure, then? No! A CORS
proxy is cheaper, simpler, and safer to maintain than a Git service.
Also, it can fetch all the files in 3 fetch() requests instead of two
requests per file like the GitHub REST API requires.

#### Try it yourself

I’ve shared a functional demo that includes a CORS proxy in this
repository on GitHub:
https://github.com/adamziel/git-sparse-checkout-in-js


## Testing instructions

* Start two terminals
* Run `nx dev playground-components` in the first one
* Run `nx start playground-php-cors-proxy` in the second one to start
the PHP Cors proxy
* Go to http://localhost:5173/ and play with the UI
* Play with an early demo of git repository browser shipped in this PR:



https://github.com/user-attachments/assets/731b2a89-8004-4d0b-8c6f-8646d4840a29
  • Loading branch information
adamziel authored Oct 7, 2024
1 parent 1035a25 commit 8639616
Show file tree
Hide file tree
Showing 48 changed files with 1,733 additions and 453 deletions.
2 changes: 1 addition & 1 deletion .github/actions/prepare-playground/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ runs:
steps:
- name: Fetch trunk
shell: bash
run: git fetch origin trunk --depth=1
run: git fetch origin trunk --depth=1 --recurse-submodules
- uses: actions/setup-node@v4
with:
node-version: 18
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/build-website.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@ jobs:
environment:
name: playground-wordpress-net-wp-cloud
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
submodules: true
- uses: ./.github/actions/prepare-playground
- run: npm run build
- run: tar -czf wasm-wordpress-net.tar.gz dist/packages/playground/wasm-wordpress-net
Expand Down
14 changes: 14 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: true
- uses: ./.github/actions/prepare-playground
- run: npx nx affected --target=lint
- run: npx nx affected --target=typecheck
Expand All @@ -26,6 +28,8 @@ jobs:
needs: [lint-and-typecheck]
steps:
- uses: actions/checkout@v4
with:
submodules: true
- uses: ./.github/actions/prepare-playground
- run: node --expose-gc node_modules/nx/bin/nx affected --target=test --configuration=ci
test-e2e:
Expand All @@ -34,6 +38,8 @@ jobs:
# Run as root to allow node to bind to port 80
steps:
- uses: actions/checkout@v4
with:
submodules: true
- uses: ./.github/actions/prepare-playground
- run: sudo ./node_modules/.bin/cypress install --force
- run: sudo CYPRESS_CI=1 npx nx e2e playground-website --configuration=ci --verbose
Expand All @@ -49,6 +55,8 @@ jobs:
needs: [lint-and-typecheck]
steps:
- uses: actions/checkout@v4
with:
submodules: true
- uses: ./.github/actions/prepare-playground
- name: Install Playwright Browsers
run: sudo npx playwright install --with-deps
Expand All @@ -70,6 +78,8 @@ jobs:
part: ['chromium', 'firefox', 'webkit']
steps:
- uses: actions/checkout@v4
with:
submodules: true
- uses: ./.github/actions/prepare-playground
- name: Download dist
uses: actions/download-artifact@v4
Expand Down Expand Up @@ -104,6 +114,8 @@ jobs:
needs: [lint-and-typecheck]
steps:
- uses: actions/checkout@v4
with:
submodules: true
- uses: ./.github/actions/prepare-playground
- run: npx nx affected --target=build --parallel=3 --verbose

Expand All @@ -128,6 +140,8 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: true
- uses: ./.github/actions/prepare-playground
- run: npm run build:docs
- uses: actions/upload-pages-artifact@v1
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/publish-npm-packages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ jobs:
ref: ${{ github.event.pull_request.head.ref }}
clean: true
persist-credentials: false
submodules: true
- name: Config git user
run: |
git config --global user.name "deployment_bot"
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/refresh-sqlite-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,12 @@ jobs:
concurrency:
group: check-version-and-run-build
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.ref }}
clean: true
persist-credentials: false
submodules: true
- uses: ./.github/actions/prepare-playground
- name: 'Refresh the SQLite bundle'
shell: bash
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/refresh-wordpress-major-and-beta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,12 @@ jobs:
concurrency:
group: check-version-and-run-build
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.ref }}
clean: true
persist-credentials: false
submodules: true
- name: 'Install bun'
run: |
curl -fsSL https://bun.sh/install | bash
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/refresh-wordpress-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,12 @@ jobs:
environment:
name: wordpress-assets
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.ref }}
clean: true
persist-credentials: false
submodules: true
- uses: ./.github/actions/prepare-playground
- name: 'Install bun'
run: |
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/update-changelog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,15 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
submodules: true
ref: trunk
clean: true
persist-credentials: false
- name: Fetch trunk
shell: bash
run: git fetch origin trunk --depth=1
run: git fetch origin trunk --depth=1 --recurse-submodules
- name: 'Install bun (for the changelog)'
run: |
curl -fsSL https://bun.sh/install | bash
Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "isomorphic-git"]
path="isomorphic-git"
url=[email protected]:adamziel/isomorphic-git.git
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,15 +84,15 @@ The vanilla `git clone` command will take ages. Here's a faster alternative that
only pull the latest revision of the trunk branch:

```
git clone -b trunk --single-branch --depth 1 [email protected]:WordPress/wordpress-playground.git
git clone -b trunk --single-branch --depth 1 --recurse-submodules [email protected]:WordPress/wordpress-playground.git
```

## Running WordPress Playground locally

You also can run WordPress Playground locally as follows:

```bash
git clone -b trunk --single-branch --depth 1 [email protected]:WordPress/wordpress-playground.git
git clone -b trunk --single-branch --depth 1 --recurse-submodules [email protected]:WordPress/wordpress-playground.git
cd wordpress-playground
npm install
npm run dev
Expand Down
1 change: 1 addition & 0 deletions isomorphic-git
Submodule isomorphic-git added at cdca7e
80 changes: 80 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -61,21 +61,27 @@
"@types/react-transition-group": "4.4.11",
"@types/wicg-file-system-access": "2023.10.5",
"ajv": "8.12.0",
"async-lock": "1.4.1",
"axios": "1.6.1",
"classnames": "^2.3.2",
"comlink": "^4.4.1",
"crc-32": "1.2.2",
"diff3": "0.0.4",
"express": "4.19.2",
"file-saver": "^2.0.5",
"fs-extra": "11.1.1",
"ini": "4.1.2",
"octokit": "3.1.1",
"octokit-plugin-create-pull-request": "5.1.1",
"pako": "1.0.10",
"react": "^18.2.25",
"react-dom": "^18.2.25",
"react-hook-form": "7.53.0",
"react-modal": "^3.16.1",
"react-redux": "8.1.3",
"react-transition-group": "4.4.5",
"sha.js": "2.4.11",
"sha1": "1.1.1",
"unzipper": "0.10.11",
"vite-plugin-api": "1.0.4",
"wouter": "3.3.5",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ The most flexible and customizable method is to build the site locally.
Create a shallow clone of the Playground repository, or your own fork.

```sh
git clone -b trunk --single-branch --depth 1 [email protected]:WordPress/wordpress-playground.git
git clone -b trunk --single-branch --depth 1 --recurse-submodules [email protected]:WordPress/wordpress-playground.git
```

Enter the `wordpress-playground` directory.
Expand Down
2 changes: 1 addition & 1 deletion packages/docs/site/docs/main/contributing/code.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Be sure to review the following resources before you begin:
[Fork the Playground repository](https://github.com/WordPress/wordpress-playground/fork) and clone it to your local machine. To do that, copy and paste these commands into your terminal:

```bash
git clone -b trunk --single-branch --depth 1
git clone -b trunk --single-branch --depth 1 --recurse-submodules

# replace `YOUR-GITHUB-USERNAME` with your GitHub username:
[email protected]:YOUR-GITHUB-USERNAME/wordpress-playground.git
Expand Down
2 changes: 1 addition & 1 deletion packages/playground/components/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@
</head>
<body>
<div id="root"></div>
<script type="module" src="./src/demos.tsx"></script>
<script type="module" src="./src/demos/index.tsx"></script>
</body>
</html>
Loading

0 comments on commit 8639616

Please sign in to comment.