Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][TEST] Mat mul whole array implementation using tiler helper tools #1924

Draft
wants to merge 145 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
145 commits
Select commit Hold shift + click to select a range
747ca3a
First version of tensor tiler
hunhoffe Oct 21, 2024
e5518fa
Merge branch 'main' into tiler-helper
hunhoffe Oct 21, 2024
1f86f05
Add some tests for the tiler
hunhoffe Oct 21, 2024
39e0a5d
Some improvements
hunhoffe Oct 21, 2024
03a0741
Merge branch 'main' into tiler-helper
hunhoffe Oct 22, 2024
8d67307
Some small improvements to tensortiler
hunhoffe Oct 22, 2024
637e314
Stub out example
hunhoffe Oct 22, 2024
d18a2ef
Added simple tiling examples
hunhoffe Oct 22, 2024
3c8ffb3
Merge branch 'main' into tiler-helper
hunhoffe Oct 22, 2024
d293c96
Update programming_examples/basic/tiling_exploration/per_tile/aie2.py
hunhoffe Oct 22, 2024
9c2ce5f
Fix makefile typos
hunhoffe Oct 22, 2024
2a3a484
Add tensor tiler tests
hunhoffe Oct 22, 2024
a47df3a
a couple more tests
hunhoffe Oct 22, 2024
babf9e7
Add a few more tests, remove template
hunhoffe Oct 22, 2024
46a487c
Add one more test
hunhoffe Oct 22, 2024
1071ee0
make tensortile test formatting a bit more sane
hunhoffe Oct 22, 2024
192194d
More python formatting
hunhoffe Oct 22, 2024
4f9656a
A few more tests
hunhoffe Oct 22, 2024
34ea2d8
Merge branch 'main' into tiler-helper
hunhoffe Oct 22, 2024
e437776
add visualization example
hunhoffe Oct 22, 2024
c744299
caption more correctly
hunhoffe Oct 22, 2024
d51e5c8
A bit of progress towards matrix_vector
hunhoffe Oct 23, 2024
87df9a7
Merge branch 'main' into tiler-helper
hunhoffe Oct 23, 2024
3bb4d19
Merge branch 'main' into tiler-helper
hunhoffe Oct 25, 2024
c4d2071
Updates from erika-iron-brainstorming branch
hunhoffe Oct 25, 2024
7b08c1a
Merge branch 'main' into tiler-helper
hunhoffe Oct 25, 2024
b08eebf
python format
hunhoffe Oct 25, 2024
06716e1
Add some visualization of access count (in addition to existing acces…
hunhoffe Oct 25, 2024
7f91465
Merge branch 'main' into tiler-helper
hunhoffe Oct 28, 2024
64879eb
Fix up tests after access count visualization changes
hunhoffe Oct 28, 2024
9accb78
Try to simplify form of sizes/strides by collapses stride value which…
hunhoffe Oct 28, 2024
6ad0860
Rename chunk to tile_group; repeat working in initial tests
hunhoffe Oct 28, 2024
864c72e
Missed adding in previous commit
hunhoffe Oct 28, 2024
db8fe9b
Some refinement for repeat count
hunhoffe Oct 28, 2024
0cf026a
Complete matrix vector tiling sweep test
hunhoffe Oct 28, 2024
eeec0a9
npu_dma_memcpy_nd take TensorTile, some repeat tests
hunhoffe Oct 28, 2024
ec47d34
More tile repeat tests
hunhoffe Oct 28, 2024
516803b
Finish tile repeat test suite
hunhoffe Oct 28, 2024
0d93ac0
Fix bad change from a few commits ago
hunhoffe Oct 28, 2024
364b0a4
Fix another bad change from a few commits ago
hunhoffe Oct 28, 2024
854de4d
First attempt at tile step in tile helper
hunhoffe Oct 30, 2024
100b866
Saving progress
hunhoffe Oct 30, 2024
00e632e
Add test file, will remove later
hunhoffe Oct 30, 2024
108fa81
Merge branch 'main' into tiler-helper
hunhoffe Oct 30, 2024
6dfd1db
move scratch file to somewhere less disruptive
hunhoffe Oct 31, 2024
17f4125
Disable tensor tiler 2d mat mul whole array test (for now)
hunhoffe Oct 31, 2024
d758e6a
Merge branch 'main' into tiler-helper
hunhoffe Oct 31, 2024
b2685bc
Add notes for how to proceed with impl
hunhoffe Oct 31, 2024
f104e7a
First step of tiler cleanup
hunhoffe Nov 1, 2024
246df71
Saving progress
hunhoffe Nov 1, 2024
38d7465
plot size based on tensor dims
hunhoffe Nov 1, 2024
64f99c7
access order looks nice even with larger tensors
hunhoffe Nov 2, 2024
def3c0e
Add experimentation notebook, will probably delete later
hunhoffe Nov 2, 2024
a783ef7
simpler_tiler appears functional
hunhoffe Nov 2, 2024
bd22201
tile sequence access order visualization seems good
hunhoffe Nov 2, 2024
b95b1a6
forgot to add init file
hunhoffe Nov 2, 2024
7f242d9
Animation is working in notebook but not in visualization
hunhoffe Nov 2, 2024
821ace1
Saving progress
hunhoffe Nov 2, 2024
f2c70a3
Just starting to test tile groups
hunhoffe Nov 2, 2024
59836c2
update to tiling speed
hunhoffe Nov 2, 2024
b375cdb
some tile groups working
hunhoffe Nov 2, 2024
0cf2dea
Better sizes for partial
hunhoffe Nov 4, 2024
38f6105
fixed some bugs with partial tile groups
hunhoffe Nov 4, 2024
a83149a
Seems to be working for step iteration without partial and without re…
hunhoffe Nov 4, 2024
07462cb
Fix bug
hunhoffe Nov 4, 2024
32fc56d
Step partial not implemented yet, but the rest seems good
hunhoffe Nov 4, 2024
4153d8d
Remove old code
hunhoffe Nov 4, 2024
aeeebc1
Move new code over to prepare for testing
hunhoffe Nov 4, 2024
0c34136
Fix file paths
hunhoffe Nov 4, 2024
6e6c223
add first few new tests
hunhoffe Nov 4, 2024
b50a5e3
Tests for simple tiler
hunhoffe Nov 4, 2024
a7a5e1c
Remove notebook
hunhoffe Nov 4, 2024
16abd03
Merge branch 'main' into tiler-helper
hunhoffe Nov 4, 2024
827a1e0
Remove old tests, first group of group_tiler tests
hunhoffe Nov 4, 2024
f5039f8
Small iterations on tests
hunhoffe Nov 4, 2024
f9202c2
Small iterations on tests
hunhoffe Nov 4, 2024
48cb941
Add some partial tests
hunhoffe Nov 5, 2024
beb4478
Add more partial tests
hunhoffe Nov 5, 2024
ba191dd
Finish partial tests for now
hunhoffe Nov 5, 2024
f406042
Add checks for type of pattern_repeat
hunhoffe Nov 5, 2024
22b793f
some unchecked changes
hunhoffe Nov 5, 2024
9d02033
fix small bugs
hunhoffe Nov 6, 2024
a821ef5
code simplification is mostly done
hunhoffe Nov 7, 2024
27e7fa6
Start fixing up number of dimensions
hunhoffe Nov 7, 2024
8cea0a6
Try to reduce hard-coded dimensions in 2d tiler
hunhoffe Nov 7, 2024
a5813b5
reduce hardcoded dimensionality a little bit more
hunhoffe Nov 7, 2024
a125ea1
small improvements for testing
hunhoffe Nov 7, 2024
d86ac40
Add first tests for step tiler
hunhoffe Nov 7, 2024
fa803af
Finish step tiler without partial tests
hunhoffe Nov 7, 2024
eca45bf
stub out step tiler partial
hunhoffe Nov 7, 2024
86891f2
Access tensors from tile sequence
hunhoffe Nov 7, 2024
6a6ed95
Add access tensor checks to simple tiler tests
hunhoffe Nov 7, 2024
2b06819
Improving group tiler tests
hunhoffe Nov 7, 2024
cf130b6
Finished extensions to group_tiler test
hunhoffe Nov 7, 2024
c2b8b7c
add tensor checks for some of group tiler partial
hunhoffe Nov 7, 2024
4518109
Add tensor tests to more of the group tiler partial
hunhoffe Nov 7, 2024
6dd3ad6
finished adding tensor checks to group tiler partial tests
hunhoffe Nov 7, 2024
02191ae
add tensor checks to the step tiler
hunhoffe Nov 7, 2024
c853b1d
fix one bug, there is at least one more bug though
hunhoffe Nov 7, 2024
92744af
First partial step tiler test
hunhoffe Nov 7, 2024
46f8e80
more step tiler partial tests
hunhoffe Nov 7, 2024
3bf4088
add step tiler partial col test
hunhoffe Nov 7, 2024
7d6eb83
stub out step tiler partial row and both tests
hunhoffe Nov 7, 2024
05325df
More tiler partial row tests
hunhoffe Nov 7, 2024
c2303dc
finish step tiler partial row tests
hunhoffe Nov 7, 2024
b40af01
Merge branch 'main' into tiler-helper
hunhoffe Nov 8, 2024
ae301de
Start porting older tests to new tensortiler interface
hunhoffe Nov 8, 2024
9dc3a2b
Finish adding old tests (including matvec) to tensortiler lit tests
hunhoffe Nov 8, 2024
4f4671a
Stub out outline for matmul whole array tiling sweep test
hunhoffe Nov 8, 2024
cc58804
Mat mul sweep is working, clean up other tests, address test performance
hunhoffe Nov 11, 2024
cb62f2b
Merge branch 'main' into tiler-helper
hunhoffe Nov 11, 2024
570d455
Start updating some examples
hunhoffe Nov 12, 2024
9d0ab4e
fix format
hunhoffe Nov 12, 2024
26d14a1
update dma transpose example
hunhoffe Nov 12, 2024
4e198ac
Fix another example
hunhoffe Nov 12, 2024
e77567a
Merge branch 'main' into tiler-helper
hunhoffe Nov 12, 2024
c4027ca
Merge branch 'main' into tiler-helper
hunhoffe Nov 12, 2024
73d89e8
Continue fixing up examples
hunhoffe Nov 12, 2024
ca4f1ff
Make diagram clearer
hunhoffe Nov 12, 2024
13d072b
Finish documenting tiling exploration examples
hunhoffe Nov 12, 2024
c1f3d62
fix up readme for tile group example
hunhoffe Nov 12, 2024
9612dbc
Start adding visualization to matmul whole array
hunhoffe Nov 12, 2024
dc46323
Add visualization notebook for mat mul whole array
hunhoffe Nov 12, 2024
6d39b61
Fix py formatting
hunhoffe Nov 12, 2024
0ab222a
Remove unused tiling functions
hunhoffe Nov 12, 2024
7703aa2
Add tiling tools overview notebook
hunhoffe Nov 12, 2024
18476ec
Merge branch 'main' into tiler-helper
hunhoffe Nov 12, 2024
029010c
Merge branch 'main' into tiler-helper
hunhoffe Nov 13, 2024
acbb79d
Add tilerhelper how-to notebook to CI testing, add README
hunhoffe Nov 13, 2024
f9ecca1
Add documentation for the matrix multiplication visualization notebook
hunhoffe Nov 13, 2024
501cd4b
Merge branch 'main' into tiler-helper
hunhoffe Nov 13, 2024
3315246
Added comments to explain TensorTile additions in mat mul whole_array…
hunhoffe Nov 13, 2024
bb2a8c1
Merge branch 'main' into tiler-helper
hunhoffe Nov 13, 2024
3c683f3
Strip output from notebooks
hunhoffe Nov 13, 2024
d7fab49
Add top-level tiling_exploration README
hunhoffe Nov 14, 2024
ff2e876
Merge branch 'main' into tiler-helper
hunhoffe Nov 14, 2024
1f6d58f
Merge branch 'main' into tiler-helper
hunhoffe Nov 14, 2024
0ef29db
First version of tiled mat mul whole array design
hunhoffe Nov 15, 2024
0ec2f76
Fix small bug, I think it works now
hunhoffe Nov 15, 2024
270b935
Fix formatting in notebook
hunhoffe Nov 15, 2024
d5ec744
Merge branch 'main' into tiler-helper
hunhoffe Nov 15, 2024
6eeee26
Update whole array sweep test
hunhoffe Nov 15, 2024
9a961a5
Merge branch 'tiler-helper' into tiled-mat-mul-whole-array
hunhoffe Nov 15, 2024
e39bb52
Add col major, fix errors in lit files
hunhoffe Nov 15, 2024
e1466df
Fix more typos
hunhoffe Nov 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions programming_examples/basic/dma_transpose/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,9 @@ endif
run: ${targetname}.exe build/final.xclbin
${powershell} ./$< -x build/final.xclbin -i build/insts.txt -k MLIR_AIE --M ${M} --K ${K}

generate_access_map: ${srcdir}/aie2.py
mkdir -p ${@D}
python3 $< --generate-access-map ${M} ${K}

clean:
rm -rf build _build inst ${targetname}.exe
15 changes: 14 additions & 1 deletion programming_examples/basic/dma_transpose/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,24 @@ This reference design can be run on a Ryzen™ AI NPU.
In the [design](./aie2.py), a 2-D array in a row-major layout is read from external memory to `ComputeTile2` with a transposed layout,
by using an implicit copy via the compute tile's Data Movement Accelerator (DMA). The data is read from and written to external memory through the Shim tile (`col`, 0).

This data movement transformation can be visualized as a map which shows the order the data the data is streamed (e.g., in transposed layout):
<p align="center">
<img
src="transpose_data.png">
<h3 align="center"> Visualization of the Transpose Data Transformation for M=64, K=32.
</h3>
</p>

The implicit copy is performed using the `object_fifo_link` operation that specifies how input data arriving via `of_in` should be sent further via `of_out` by specifically leveraging the compute tile's DMA. This operation and its functionality are described in more depth in [Section-2b](../../../programming_guide/section-2/section-2b/README.md/#object-fifo-link) of the programming guide.


To compile and run the design for NPU:
```
```bash
make
make run
```

To generate a data visualization of the transpose (like that above), run:
```bash
make generate_access_map
```
48 changes: 34 additions & 14 deletions programming_examples/basic/dma_transpose/aie2.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,28 @@
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
#
# (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates
import argparse
import numpy as np
import sys

from aie.dialects.aie import *
from aie.dialects.aiex import *
from aie.extras.context import mlir_mod_ctx
from aie.helpers.dialects.ext.scf import _for as range_
from aie.helpers.tensortiler import TensorTile

N = 4096
M = 64
K = 64

if len(sys.argv) == 3:
M = int(sys.argv[1])
K = int(sys.argv[2])
N = M * K
def my_passthrough(M, K, N, generate_access_map=False):
tensor_ty = np.ndarray[(M, K), np.dtype[np.int32]]
data_transform = TensorTile(
(M, K), offset=0, sizes=[1, 1, K, M], strides=[1, 1, 1, K]
)
if generate_access_map:
data_transform.visualize(
show_arrows=True, plot_access_count=False, file_path="transpose_data.png"
)
return

tensor_ty = np.ndarray[(M, K), np.dtype[np.int32]]


def my_passthrough():
with mlir_mod_ctx() as ctx:

@device(AIEDevice.npu1_1col)
Expand Down Expand Up @@ -56,8 +57,7 @@ def sequence(A, B, C):
metadata=of_in,
bd_id=1,
mem=A,
sizes=[1, 1, K, M],
strides=[1, 1, 1, K],
tensor_tile=data_transform,
issue_token=True,
)
npu_dma_memcpy_nd(metadata=of_out, bd_id=0, mem=C, sizes=[1, 1, 1, N])
Expand All @@ -66,4 +66,24 @@ def sequence(A, B, C):
print(ctx.module)


my_passthrough()
if __name__ == "__main__":
p = argparse.ArgumentParser()
p.add_argument("dims", help="M K", type=int, nargs="*", default=[64, 64])
p.add_argument(
"--generate-access-map",
action="store_true",
help="Produce a file showing data access order",
)
args = p.parse_args()

if len(args.dims) != 2:
print(
"ERROR: Must provide either no dimensions or both M and K", file=sys.stderr
)
exit(-1)
my_passthrough(
M=args.dims[0],
K=args.dims[1],
N=args.dims[0] * args.dims[1],
generate_access_map=args.generate_access_map,
)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ kernels=mm_${m}x${k}x${n}
aieargs+=-m $m -k $k -n $n --n-aie-cols ${n_aie_cols} --b-col-maj ${b_col_maj}
runargs+=--b_col_maj ${b_col_maj}
target_suffix=${M}x${K}x${N}_${m}x${k}x${n}_${n_aie_cols}c
use_tiler?=0

ifeq (${use_tiler}, 1)
aie_py_src=aie2_tiler.py
endif

include ${srcdir}/../makefile-common

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@ include ${srcdir}/../makefile-common

CHESS=true

use_tiler?=0

ifeq (${use_tiler}, 1)
aie_py_src=aie2_tiler.py
endif

build/mm_b_row_maj_${m}x${k}x${n}.o: ${kernels_dir}/mm.cc
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -DBIT_WIDTH=8 -D${dtype_in}_${dtype_out}_ONLY -DDIM_M=${m} -DDIM_K=${k} -DDIM_N=${n} -c $< -o ${@F}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,30 @@ Of note is the `object_fifo_link()` operation. This operation establishes a conn

We assume our data are stored in **row-major format** in the host's memory. For processing on the AIE compute cores, we need to transform the data layouts, such the above listed *sub-matrix tiles* are laid out contiguously in AIE compute core memory. Thankfully, AIE hardware has extensive support for transforming data using the DMAs as it is received and sent with zero cost. In the following, we will explain how we make use of this hardware feature to transform our data.

#### Runtime Sequence Tiling and Data Layout Transformations Notebook

There is a notebook that includes visualization for the runtime sequence `npu_dma_memcpy_nd` operations use to transfer matrices A, B, and C.

To run the notebook:
* Start a jupyter server at the root directory of your clone of `mlir-aie`.
Make sure you use a terminal that has run the `utils/setup_env.sh` script
so that the correct environment variables are percolated to jupyter.
Below is an example of how to start a jupyter server:
```bash
python3 -m jupyter notebook --no-browser --port=8080
```
* In your browser, navigate to the URL (which includes a token) which is found
in the output of the above command.
* Navigate to `programming_examples/basic/matrix_multiplication/whole_array`
* Double click `mat_mul_whole_array_visualization.ipynb` to start the notebook; choose the ipykernel called `ironenv`.
* You should now be good to go! Note that generating the animations in the notebook can take several minutes.

#### Run the Notebook as a Script
```bash
make clean
make run
```

##### Tiling to Vector Intrinsic Size

The `memA_fifos` and `memB_fifos` receive sub-matrices of size `m`&times;`k` and `k`&times;`n`, respectively. The FIFOs translate those matrices from a row-major format (or, alternatively, column-major for `B` if `b_col_maj` is set) into the `r`&times;`s`-sized and `s`&times;`t`-sized blocks required by the hardware's vector instrinsics before sending them into the compute cores memory.
Expand Down
119 changes: 94 additions & 25 deletions programming_examples/basic/matrix_multiplication/whole_array/aie2.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from aie.dialects.aie import *
from aie.dialects.aiex import *
from aie.helpers.dialects.ext.scf import _for as range_
from aie.helpers.tensortiler import TensorTile, TensorTileSequence

dtype_map = {
"bf16": bfloat16,
Expand Down Expand Up @@ -47,9 +48,15 @@ def main():
default="i16",
)
argparser.add_argument("--trace_size", type=int, default=0)
argparser.add_argument(
"--generate-tiles",
action="store_true",
help="Generate TensorTiles, a Python object to represent each data transfer"
"of the input/output matrices. These objects can be used for visualization.",
)
args = argparser.parse_args()
with mlir_mod_ctx() as ctx:
my_matmul(
maybe_tiles = my_matmul(
args.M,
args.K,
args.N,
Expand All @@ -61,19 +68,33 @@ def main():
args.dtype_out,
args.b_col_maj,
args.trace_size,
args.generate_tiles,
)
# print(ctx.module.operation.verify())
print(ctx.module)

if args.generate_tiles:
return maybe_tiles


def ceildiv(a, b):
return (a + b - 1) // b


def my_matmul(
M, K, N, m, k, n, n_aie_cols, dtype_in_str, dtype_out_str, b_col_maj, trace_size
M,
K,
N,
m,
k,
n,
n_aie_cols,
dtype_in_str,
dtype_out_str,
b_col_maj,
trace_size,
generate_tiles=False,
):

n_aie_rows = 4
n_aie_cores = n_aie_rows * n_aie_cols

Expand Down Expand Up @@ -148,6 +169,12 @@ def my_matmul(
elif n_aie_cols == 4:
dev = AIEDevice.npu1_4col

# These will hold TensorTile objects that represent the runtime
# npu_dma_memcpy_nd operations of this design. They are only used if generate_tiles is true
A_tensor_tiles = []
B_tensor_tiles = []
C_tensor_tiles = []

@device(dev)
def device_body():
A_l2_ty = np.ndarray[(m * k * n_A_tiles_per_shim,), np.dtype[dtype_in]]
Expand Down Expand Up @@ -375,13 +402,26 @@ def sequence(A, B, C):
C_row_offset = row_base * m * n_aie_rows * N
C_col_offset = col * n
C_offset = C_col_offset + C_row_offset
C_sizes = [tb_n_rows, N // n // n_aie_cols, m * n_aie_rows, n]
C_strides = [m * n_aie_rows * N, n * n_aie_cols, N, 1]
npu_dma_memcpy_nd(
metadata=C_l2l3_fifos[col],
bd_id=bd_id_base,
mem=C,
offsets=[0, 0, 0, C_offset],
sizes=[tb_n_rows, N // n // n_aie_cols, m * n_aie_rows, n],
strides=[m * n_aie_rows * N, n * n_aie_cols, N, 1],
sizes=C_sizes,
strides=C_strides,
)
# Use the calculated sizes/strides/offsets to record the data movement
# caused by the above call to npu_dma_memcpy_nd.
# This line does not change MLIR output at all.
C_tensor_tiles.append(
TensorTile(
(M, N),
offset=C_offset,
sizes=C_sizes,
strides=C_strides,
)
)

for tile_row in range(tb_n_rows):
Expand Down Expand Up @@ -411,18 +451,31 @@ def sequence(A, B, C):
col * n_A_tiles_per_shim * m * K
) # base address for the shim in this column
A_offset = A_block_offset + A_row_offset
A_sizes = [
N // n // n_aie_cols,
K // k,
m * n_A_tiles_per_shim,
k,
]
A_strides = [0, k, K, 1]
npu_dma_memcpy_nd(
metadata=A_l3l2_fifos[col],
bd_id=bd_id_base + 2 * tile_row + 1,
mem=A,
offsets=[0, 0, 0, A_offset],
sizes=[
N // n // n_aie_cols,
K // k,
m * n_A_tiles_per_shim,
k,
],
strides=[0, k, K, 1],
sizes=A_sizes,
strides=A_strides,
)
# Use the calculated sizes/strides/offsets to record the data movement
# caused by the above call to npu_dma_memcpy_nd.
# This line does not change MLIR output at all.
A_tensor_tiles.append(
TensorTile(
(M, K),
offset=A_offset,
sizes=A_sizes,
strides=A_strides,
)
)

# B input transfer:
Expand All @@ -444,29 +497,45 @@ def sequence(A, B, C):
# |0011 0011 |
# ----------------
B_col_offset = col * n if not b_col_maj else col * n * K
if not b_col_maj:
B_sizes = [N // n // n_aie_cols, K // k, k, n]
B_strides = [n * n_aie_cols, k * N, N, 1]
else:
B_sizes = [N // n // n_aie_cols, K // k, n, k]
B_strides = [n * n_aie_cols * K, k, K, 1]

npu_dma_memcpy_nd(
metadata=B_l3l2_fifos[col],
bd_id=bd_id_base + 2 * tile_row + 2,
mem=B,
offsets=[0, 0, 0, B_col_offset],
sizes=(
[N // n // n_aie_cols, K // k, k, n]
if not b_col_maj
else [N // n // n_aie_cols, K // k, n, k]
),
strides=(
[n * n_aie_cols, k * N, N, 1]
if not b_col_maj
else [n * n_aie_cols * K, k, K, 1]
),
sizes=B_sizes,
strides=B_strides,
)
# Use the calculated sizes/strides/offsets to record the data movement
# caused by the above call to npu_dma_memcpy_nd.
# This line does not change MLIR output at all.
B_tensor_tiles.append(
TensorTile(
(K, N),
offset=B_col_offset,
sizes=B_sizes,
strides=B_strides,
)
)
if tb > 0 or (tb == 0 and pingpong > 0):
dma_wait(*C_l2l3_fifos)
dma_wait(*C_l2l3_fifos)

if generate_tiles:
# If generate tiles is true, return a representation of tensor tiles
# representing all the npu_dma_memcpy_nd runtime sequence operations per input/ouput tensor.
return (
TensorTileSequence.from_tiles(A_tensor_tiles),
TensorTileSequence.from_tiles(B_tensor_tiles),
TensorTileSequence.from_tiles(C_tensor_tiles),
)


if __name__ == "__main__":
main()
else:
print("Not meant to be imported")
sys.exit(1)
Loading
Loading