Tenstorrent Topology (TT-Topology) is a command line utility used to flash multiple NB cards on a system to use specific eth routing configurations.
It curretly supports three configurtions - mesh, linear and torus
https://github.com/tenstorrent/tt-topology/
Build and editing instruction are as follows -
Install and source rust for the luwen library
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
Generate and source a python environment. This is useful not only to isolate your environment, but potentially easier to debug and use. This environment can be shared if you want to use a single environment for all your Tenstorrent tools
python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
Install tt-topology.
pip3 install .
Generate and source a python3 environment
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install pre-commit
For users who would like to edit the code without re-building, install SMI in editable mode.
pip install --editable .
Recommended: install the pre-commit hooks so there is auto formatting for all files on committing.
pre-commit install
Command line arguments
usage: tt-topology [-h] [-v] [-l {linear,torus,mesh}] [-ls] [--log [log]] [-p [plot]]
Tenstorrent Topology (TT-Topology) is a command line utility to flash ethernet coordinates when multiple NB's are connected together.
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-l {linear,torus,mesh}, --layout {linear,torus,mesh}
Select the layout (linear, torus, mesh). Default is linear.
-ls, --list List out all the boards on host with their coordinates and layout.
--log [log] Change filename for the topology flash log. Default:
~/tt_topology_logs/<timestamp>_log.json
-p [plot], --plot_filename [plot]
Change the plot of the png that will have the graph layout of the chips. Default:
chip_layout.png
-o, --octopus octopus support in galaxy
-g [GENERATE_RESET_JSON], --generate_reset_json [GENERATE_RESET_JSON]
Generate default reset json file that reset consumes. Default stored at ~/.config/tenstorrent/reset_config.json. Update the generated file and use it as an
input for the --reset option
-r config.json, --reset config.json
Provide a json file with reset configs. Generate a default reset json file with the -g option.
TT-Topology does the following when calculating and flashing the coordinates -
- Flash all the boards to default - set all eth port disables to 0 and reset coordinates to (0,0) for local chips and (1,0) for n300 remote chips.
- Issue a board level reset to apply the new flash to the chips.
- Generate a mapping of all possible connections and their type between the available chips.
- Using a graph algorithm generate coordinates for each chip based on user input. These layouts are discussed in detail in the sections below.
- Write the new coordinates to the chips.
- Issue a board level reset to apply the new flash to the chips.
- Return a png with a graphic representation of the layout and a .json log file with details of the above steps.
TT-topology can be used to flash one of the three chip layouts - mesh, linear and torus.
In the mesh layout is a trivalent graph where each node can have a max of 3 connection. A BFS algorithm is used to assign the coordinates. Command to generate a mesh layout
$ tt-topology -l mesh -p mesh_layout.png
For a host with 2 n300 cards and 4 n300 cards, the command will generate a layouts that look as follows -
The linear layout, as the name suggests is a layout where all chips are connected by a single line. The coordinates are assigned by finding a cycle in the graph and then assigning coordinates in order. Command to generate a linear layout
$ tt-topology -l linear -f linear_layout.png
For a host with 2 n300 cards and 4 n300 cards, the command will generate a layouts that look as follows -
The torus layout is a cyclic graph where all chips have a single line connecting all nodes. The coordinates are assigned by finding a cycle in the graph and then assigning coordinates in order. Command to generate a torus layout
$ tt-topology -l torus -p torus_layout.png
For a host with four n300 cards, the command will generate a layout that looks as follows
- TGG setting: 8 n150s connected to 2 galaxies
- TG setting: 4 n150s connected to 1 galaxy
-
Generate a default mobo reset json file saved at
~/.config/tenstorrent/reset_config.json
by running the following command$ tt-topology -g
-
Fill in "mobo", "credo", and "disabled_ports" under "wh_mobo_reset"
Here is an example of what your reset_config.json file may look like:
{ "time": "2024-03-06T20:12:27.640859", "host_name": "yyz-lab-212", "gs_tensix_reset": { "pci_index": [] }, "wh_link_reset": { "pci_index": [ 0, 1, 2, 3 ] }, "re_init_devices": true, "wh_mobo_reset": [ { "nb_host_pci_idx": [ 0, 1, 2, 3 ], "mobo": "mobo-ce-44", "credo": [ "6:0", "6:1", "7:0", "7:1" ], "disabled_ports": [ "0:2", "1:2", "6:2", "7:2" ] } ] }
-
Flashing multiple NB cards to use specific eth routing configurations by running the following command
$ tt-topology -o -r ~/.config/tenstorrent/reset_config.json
- Setup
mobo_eth_en
on every local n150 to train with the Galaxy - Program the shelf/rack of the Galaxies
- Program all local n150s to rack 0, shelf 0, x 0, y 0
- Reset with the following
retimer_sel
anddisable_sel
and wait for trainingretimer_sel
: From thecredo
field of the reset json file for the specific Galaxydisable_sel
: All the other ports not specified by theretimer_sel
- Check QSFP link and change shelf number for each n150 according to the shelf on the connected Galaxy
- Program the x, y coords of the local n150s based on the other side of the link
- Reset again with the
retimer_sel
anddisable_sel
and wait for training, and verify all chips show upretimer_sel
: From thecredo
field of the reset json file for the specific Galaxydisable_sel
: From thedisabled_ports
field of the reset json file for the specific Galaxy
TT-Topology records the pre and post flash relevant SPI registers, connection map and coordinates of the chips in a .json file for record keeping and debugging.
By default it is stored at ~/tt_topology_logs/<timestamp>_log.json
. This can be changed by using the log command line argument as follows
$ tt-topology -log new_log.json ...
Apache 2.0 - https://www.apache.org/licenses/LICENSE-2.0.txt