Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/Xilinx/mlir-aie into comput…
Browse files Browse the repository at this point in the history
…e-tile-repeat
  • Loading branch information
abisca committed Oct 21, 2024
2 parents 5634023 + b2e372a commit 532f016
Show file tree
Hide file tree
Showing 23 changed files with 750 additions and 65 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/buildAndTestRyzenAISw.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
branches:
- main
- ryzen-ai-sw-test
# pull_request:
pull_request:
workflow_dispatch:
inputs:
AIE_COMMIT:
Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/buildRyzenWheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,7 @@ jobs:
fail-fast: false
matrix:
python_version: [
# "3.8", "3.9",
"3.10",
# "3.11", "3.12"
"3.10", "3.12",
]

steps:
Expand Down
205 changes: 205 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,211 @@ This repository contains an [MLIR-based](https://mlir.llvm.org/) toolchain for A

This project is primarily intended to support the open-source community, particularly tool builders, with low-level access to AIE devices and enable the development of a wide variety of programming models from higher level abstractions. We provide an example programming flow: Interface Representation for hands-ON (IRON) close-to-metal programming of the AIE-array. IRON is an open access toolkit enabling performance engineers to build fast and efficient, often specialized designs through a set of Python language bindings around the mlir-aie dialect. As such, it contains some examples, however this project is not intended to represent an end-to-end compilation flow for all application designs. If you're looking for an out-of-the-box experience for highly efficient machine learning, check out the [AMD Ryzen™ AI Software Platform](https://github.com/amd/RyzenAI-SW/).

# Getting Started for AMD Ryzen™ AI - Linux Quick Setup Instructions

These instructions will guide you through everything required for building and executing a program on the Ryzen™ AI NPU, starting from a fresh bare-bones **Ubuntu 24.10** install with Linux 6.11 kernel.

## Initial Setup

#### Update BIOS:

Be sure you have the latest BIOS for your laptop or mini PC, this will ensure the NPU (sometimes referred to as IPU) is enabled in the system. You may need to manually enable the NPU:
```Advanced → CPU Configuration → IPU```

> **NOTE:** Some manufacturers only provide Windows executables to update the BIOS, please do this before installing Ubuntu.
#### BIOS Settings:

Turn off SecureBoot (Allows for unsigned drivers to be installed):
```BIOS → Security → Secure boot → Disable```

## Prerequisites

### Install AIETools

#### Supporting AMD Ryzen™ AI with AMD XDNA™/AIE-ML (AIE2) and AMD XDNA™ 2 (AIE2P): Install AMD Vitis™ AIE Essentials

1. Install Vitis™ AIE Essentials from [Ryzen AI Software 1.3 Early Accesss](https://account.amd.com/en/member/ryzenai-sw-ea.html#tabs-a5e122f973-item-4757898120-tab). We will assume you use the installation directory, `/tools/ryzen_ai-1.3.0/vitis_aie_essentials`.

> This is an early access lounge, you must register and be granted access at this time.
1. Download VAIML Installer for Linux based compilation: `ryzen_ai-1.3.0ea1.tgz`

1. Extract the required tools:

``` bash
tar -xzvf ryzen_ai-1.3.0ea1.tgz
cd ryzen_ai-1.3.0
mkdir vitis_aie_essentials
mv vitis_aie_essentials*.whl vitis_aie_essentials
cd vitis_aie_essentials
unzip vitis_aie_essentials*.whl
```

1. Set up an AI Engine license.

1. Get a local license for AI Engine tools from [https://www.xilinx.com/getlicense](https://www.xilinx.com/getlicense).

1. Copy your license file (Xilinx.lic) to your preferred location, e.g. `/opt/Xilinx.lic`:

1. Setup your environment using the following script for Vitis™ for AIETools:

```bash
#!/bin/bash
#################################################################################
# Setup Vitis AIE Essentials
#################################################################################
export AIETOOLS_ROOT=/tools/ryzen_ai-1.3.0/vitis_aie_essentials
export PATH=$PATH:${AIETOOLS_ROOT}/bin
export LM_LICENSE_FILE=/opt/Xilinx.lic
```

### Install the XDNA™ Driver

1. Install the following prerequisite packages.

```bash
sudo apt install \
libidn11-dev
```

1. Clone the XDNA™ driver repository and its submodules.
```bash
git clone https://github.com/amd/xdna-driver.git
export XDNA_SRC_DIR=$(realpath xdna-driver)
cd xdna-driver
git reset --hard 3d5a8cf1af2adfbb6306ad71b45e5f3e1ffc5b37
git submodule update --init --recursive
```

> The submodules use SSH remotes. You will need a GitHub account and locally installed SSH keys to pull the submodules. Follow [these instructions](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent) to set up an SSH key. Alternatively, edit `.gitmodules` to use HTTPS instead of SSH.

1. Install XRT. (Below steps are adapted from [here](https://xilinx.github.io/XRT/master/html/build.html).)

1. Install XRT prerequisites.

```bash
cd $XDNA_SRC_DIR
sudo ./tools/amdxdna_deps.sh
```

2. Build XRT. Remember to source the aietools/Vitis setup script from [above](#install-xilinx-vitis-20232).

```bash
cd $XDNA_SRC_DIR/xrt/build
./build.sh -noert -noalveo
```

3. Install XRT.

```bash
cd $XDNA_SRC_DIR/xrt/build/Release
sudo apt reinstall ./xrt_202420.2.18.0_24.10-amd64-xrt.deb ./xrt_202420.2.18.0_24.10-amd64-xbflash.deb
```

> **An error is expected in this step.** Ignore it.



1. Build XDNA-Driver. Below steps are adapted from [here](https://github.com/amd/xdna-driver).

```bash
cd $XDNA_SRC_DIR/build
./build.sh -release
./build.sh -package
```

1. Install XDNA™.

```bash
cd $XDNA_SRC_DIR/build/Release
sudo apt reinstall ./xrt_plugin.2.18.0_ubuntu24.10-x86_64-amdxdna.deb
```

1. Check that the NPU is working if the device appears with xrt-smi:

```bash
source /opt/xilinx/xrt/setup.sh
xrt-smi examine
```

> At the bottom of the output you should see:
> ```
> Devices present
> BDF : Name
> ------------------------------------
> [0000:66:00.1] : RyzenAI-npu1
> ```

### Install IRON and MLIR-AIE Prerequisites

1. Install the following packages needed for MLIR-AIE:

```bash
sudo apt install \
build-essential clang clang-14 lld lld-14 cmake python3-venv python3-pip libxrender1 libxtst6 libxi6 virtualenv
```

1. Install g++13 and opencv needed for some programming examples:

```bash
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update
sudo apt install gcc-13 g++-13 -y
sudo apt install libopencv-dev python3-opencv
```

1. Remember to source the Vitis™ AIE Essentials setup script from [above](#install-aietools).

1. Remember to source the XRT setup script: `source /opt/xilinx/xrt/setup.sh`

## Install IRON for AMD Ryzen™ AI AIE Application Development

1. Clone [the mlir-aie repository](https://github.com/Xilinx/mlir-aie.git), best under /home/username for speed (yourPathToBuildMLIR-AIE):
```bash
git clone https://github.com/Xilinx/mlir-aie.git
cd mlir-aie
```

1. Source `utils/quick_setup.sh` to setup the prerequisites and
install the mlir-aie and llvm compiler tools from whls.

## Build an IRON Design for AIEs in the AMD Ryzen™ AI NPU

> Remember to set up your environment including Vitis™ AIE Essentials, your license, XRT, and IRON
> ```
> source yourVitisSetupScript.sh
> export LM_LICENSE_FILE=/opt/Xilinx.lic
> source /opt/xilinx/xrt/setup.sh
> source utils/env_setup.sh my_install/mlir_aie my_install/mlir my_install/llvm-aie
> ```

For your design of interest, for instance from [programming_examples](../programming_examples/), 2 steps are needed: (i) build the AIE design and then (ii) build the host code.

### Build Device AIE Part

1. Goto the design of interest and run `make`

### Build and Run Host Part

1. Build: Goto the same design of interest folder where the AIE design just was built (see above)
```bash
make <testName>.exe
```
> Note that the host code target has a `.exe` file extension even on Linux. Although unusual, this is an easy way for us to distinguish whether we want to compile device code or host code.


1. Run (program arguments are just an example for add_one design)
```bash
make run
```

## Learn more about NPU programming with IRON

1. Continue to the [IRON AIE Application Programming Guide](programming_guide)

# Detailed Getting Started Guides and Documentation:

[Getting Started on a Versal™ board](docs/Building.md)

[Running on a Versal™ board](docs/Platform.md)
Expand Down
8 changes: 4 additions & 4 deletions aie_kernels/aie2/cascade_mm.cc
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ void matmul_scalar_cascade_put_only(T_in *a, T_in *b, T_out *c) {
running_sum += a[row * colA + i] * b[i * colB + col];
}
v16int32 v16 = undef_v16int32();
v16 = upd_elem(v16, 0, running_sum);
v16 = upd_elem(v16, 0, (int)running_sum);
put_mcd(v16);
}
}
Expand All @@ -51,7 +51,7 @@ void matmul_scalar_cascade_get_only(T_in *a, T_in *b, T_out *c) {
running_sum += a[row * colA + i] * b[i * colB + col];
}
v16int32 v16 = get_scd_v16int32();
running_sum += ext_elem(v16, 0);
running_sum += ext_elem(v16, 0U);
c[row * colB + col] += running_sum;
}
}
Expand All @@ -68,8 +68,8 @@ void matmul_scalar_cascade_put_get(T_in *a, T_in *b, T_out *c) {
running_sum += a[row * colA + i] * b[i * colB + col];
}
v16int32 v16 = get_scd_v16int32();
running_sum += ext_elem(v16, 0);
v16 = upd_elem(v16, 0, running_sum);
running_sum += ext_elem(v16, 0U);
v16 = upd_elem(v16, 0, (int)running_sum);
put_mcd(v16);
}
}
Expand Down
10 changes: 5 additions & 5 deletions aie_kernels/aie2/reduce_add.cc
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,15 @@ static void _reduce_add_vector(int32_t *restrict in, int32_t *restrict out,
running_total = test;
}
after_vector = running_total;
v16int32 first = shift_bytes(after_vector, after_vector, 32);
v16int32 first = shift_bytes(after_vector, after_vector, 32U);
v16int32 second = add(after_vector, first);
v16int32 second_shift = shift_bytes(second, second, 16);
v16int32 second_shift = shift_bytes(second, second, 16U);
v16int32 third = add(second, second_shift);
v16int32 third_shift = shift_bytes(third, third, 8);
v16int32 third_shift = shift_bytes(third, third, 8U);
v16int32 fourth = add(third, third_shift);
v16int32 fourth_shift = shift_bytes(fourth, fourth, 4);
v16int32 fourth_shift = shift_bytes(fourth, fourth, 4U);
v16int32 fifth = add(fourth, fourth_shift);
int32_t last = extract_elem(fifth, 0);
int32_t last = extract_elem(fifth, 0U);
*(int32_t *)out = last;
return;
}
Expand Down
10 changes: 5 additions & 5 deletions aie_kernels/aie2/reduce_max.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@ void _reduce_max_vector(int32_t *restrict in, int32_t *restrict out,
running_max = test;
}
after_vector = running_max;
v16int32 first = shift_bytes(after_vector, after_vector, 32);
v16int32 first = shift_bytes(after_vector, after_vector, 32U);
v16int32 second = max(after_vector, first);
v16int32 second_shift = shift_bytes(second, second, 16);
v16int32 second_shift = shift_bytes(second, second, 16U);
v16int32 third = max(second, second_shift);
v16int32 third_shift = shift_bytes(third, third, 8);
v16int32 third_shift = shift_bytes(third, third, 8U);
v16int32 fourth = max(third, third_shift);
v16int32 fourth_shift = shift_bytes(fourth, fourth, 4);
v16int32 fourth_shift = shift_bytes(fourth, fourth, 4U);
v16int32 fifth = max(fourth, fourth_shift);
int32_t last = extract_elem(fifth, 0);
int32_t last = extract_elem(fifth, 0U);
*(int32_t *)out = last;
return;
}
Expand Down
10 changes: 5 additions & 5 deletions aie_kernels/aie2/reduce_min.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@ void _reduce_min_vector(int32_t *restrict in, int32_t *restrict out,
running_min = test;
}
after_vector = running_min;
v16int32 first = shift_bytes(after_vector, after_vector, 32);
v16int32 first = shift_bytes(after_vector, after_vector, 32U);
v16int32 second = min(after_vector, first);
v16int32 second_shift = shift_bytes(second, second, 16);
v16int32 second_shift = shift_bytes(second, second, 16U);
v16int32 third = min(second, second_shift);
v16int32 third_shift = shift_bytes(third, third, 8);
v16int32 third_shift = shift_bytes(third, third, 8U);
v16int32 fourth = min(third, third_shift);
v16int32 fourth_shift = shift_bytes(fourth, fourth, 4);
v16int32 fourth_shift = shift_bytes(fourth, fourth, 4U);
v16int32 fifth = min(fourth, fourth_shift);
int32_t last = extract_elem(fifth, 0);
int32_t last = extract_elem(fifth, 0U);
*(int32_t *)out = last;
return;
}
Expand Down
4 changes: 2 additions & 2 deletions lib/Targets/AIEVecToCpp/TranslateAIEVecToCpp.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1045,12 +1045,12 @@ static LogicalResult printOperation(CppEmitter &emitter,
os << emitter.getOrCreateName(lhs);
os << ", ";
os << emitter.getOrCreateName(rhs);
os << ", ";
os << ", static_cast<uint32_t>(";

if (!emitter.hasValueInScope(shift))
return failure();
os << emitter.getOrCreateName(shift);
os << ")";
os << "))";

return success();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ target_suffix=${M}x${K}x${N}_${m}x${k}x${n}_${n_aie_cols}c

include ${srcdir}/../makefile-common

CHESS=true

build/mm_${m}x${k}x${n}.o: ${kernels_dir}/cascade_mm.cc
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -DBIT_WIDTH=8 -DDIM_M=${m} -DDIM_K=${k} -DDIM_N=${n} -c $< -o ${@F}
cd ${@D} && ${PEANO_INSTALL_DIR}/bin/clang++ ${PEANOWRAP2_FLAGS} -fno-unroll-loops -DBIT_WIDTH=8 -DDIM_M=${m} -DDIM_K=${k} -DDIM_N=${n} -c $< -o ${@F}
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// (c) Copyright 2024 Advanced Micro Devices, Inc.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// REQUIRES: ryzen_ai, chess
// REQUIRES: ryzen_ai, peano
//
// RUN: make -f %S/Makefile clean
// RUN: make -f %S/Makefile
Expand Down
4 changes: 2 additions & 2 deletions programming_examples/basic/vector_reduce_add/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,14 @@ include ${srcdir}/../../makefile-common
targetname = reduce_add
devicename = npu
col = 0
CHESS_FLAGS=${CHESSCCWRAP2_FLAGS}

all: build/final.xclbin build/insts.txt

VPATH := ${srcdir}/../../../aie_kernels/aie2

build/%.cc.o: %.cc
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -c $< -o ${@F}
cd ${@D} && ${PEANO_INSTALL_DIR}/bin/clang++ ${PEANOWRAP2_FLAGS} -c $< -o ${@F}

build/aie.mlir: ${srcdir}/aie2.py
mkdir -p ${@D}
Expand All @@ -32,6 +31,7 @@ build/aie.mlir: ${srcdir}/aie2.py
build/final.xclbin: build/aie.mlir build/reduce_add.cc.o
mkdir -p ${@D}
cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
--no-xchesscc --no-xbridge --peano ${PEANO_INSTALL_DIR} \
--aie-generate-npu --npu-insts-name=insts.txt $(<:%=../%)

${targetname}.exe: ${srcdir}/test.cpp
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// (c) Copyright 2024 Advanced Micro Devices, Inc.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// REQUIRES: ryzen_ai, chess
// REQUIRES: ryzen_ai, peano
//
// RUN: make -f %S/Makefile clean
// RUN: make -f %S/Makefile
Expand Down
Loading

0 comments on commit 532f016

Please sign in to comment.