This article gives you a quick tour to RISC-V ISA format and slots so that the basic sense and implementation of extending the RISC-V ISA are covered.
These external links are involved, you can open them in your browser tabs in advance:
All the RISCV instructions are 32-bit vectors. Please refer page 130 (# on the upper right of each page, not PDF reader page) of the RISC-V manual for the format of these vectors.
To choose a proper format for your extended instruction, the number of src/dst
operands, the type of operands (imm/reg), and the number of bits to be encoded
should be considered. R, I, and S are recommended --- B is just a complicated
version of S, and U and J have no funct3
field, which may occupy the whole
line of instruction slots (refer to next section
for more detials). Therefore, we have 32 bits in total: 7 bits are occupied
by the opcode; 3 bits are occupied for funct3
; each register occupies 5 bits
(
The instruction format are described in the risc-v opcodes
repo, and you can open opcodes-rv32i
, the most basic module of the RISC-V ISA,
for examples. To understand this file, we use
addi
instruction as an example, and a correspondence to the ADDI
row in page 130 can be made.
Both the figure and the text description are little endian format.
addi rd rs1 imm12 14..12=7 6..2=0x04 1..0=3
rd
, rs1
, and imm12
describe the operands of this instruction; 14..12
describes funct3
;
6..2
describes the opcode. According to Table 24.1 on page 129 (no page number on the PDF),
the first two bits are always 11
.
For more information on the operand tokens appear in this file, refer to this for more details. This Python dict declares the bit range this token occupies. The semantics of each token id can be understood by knowing their bit range, acompanied with the figure of the instruction format.
To understand the constraints of extending new instructions, we need to know:
- where are the available slots for the extended instructions, and
- some additional rules/standards/contraints of each slot.
Refer custom-0/1/2/3 cells in Table 24.1 on page 129. These four slots are reserved for instruction extension.
Refer this file for the operand constraints of each instruction. The operand signature of each instruction should be exactly the same as their corresponding slot in custom.
Do read this!! If you do not want to refactor the ISA aggresively when already having a large project!!
You cannot give funct3
random values. The meanings of the 3 bits of funct3
are critical:
- 1: Send
rs2
from host to the accelerator. - 2: Send
rs1
from host to the accelerator. - 4: Receive a value from the accelerator to
rd
.
rs2
only appears when rs1
appears. Therefore, the bit of 1
cannot be enabled alone. Therefore,
1
and 5
cannot be funct3
. Also, as mentioned above, if we want to use 0
as funct3
, we cannot
use U-type format.
After designing how instructions look like in your mind, we need to integrate them to the compiler, both the
binary encoding and the text mnemonic. This is done by hacking the subrepo,
riscv-gnu-toolchain/riscv-binutils
.
To integrate the binary encoding of the extended instruction, we want to replace the code segments (1, 2) related to customized opcodes by the extended encoding.
In risc-v opcodes, scripts are provided to generate these encoding codes. Use the following command:
cat opcodes-custom | ./parse-opcode -c > snippet
Edit opcodes-custom
to name the extended instructions, and define the operands.
Not every line of snippet
is useful, open the file and find the corresponding lines.
Copy those lines and use them to replace the code segments mentioned above.
To integrate the mnemonic (text) format of the extended instruction, we want to add additional rules below this line. The meaning of each column is:
- Name string;
- The default data width; zero means the same as machine bits; here I suggest to give 0;
- The module of the instruction belongs to; here I suggest just give "I", the most basic module;
- The operand description; there is no document for the meaning of each letter, but you can refer to
this git issue and read the source code for more
details; typically, knowing
s
,t
,j
,d
, andq
are enough; - For instructions without aliasing and pesudo representation, the next two columns can just give the
MASK_*
andMATCH_*
generated insnippet
. - I believe it should be something about the aliasing and pseudo thing too, and giving
0
should also suffice.
This section includes some design descision I made. Though subjective, I hope this may more or less help your
development experience. I adopt an auto-patcher in my project.
This patcher should be in the same folder as riscv-gnu-toolchain
. Say you have a directory stack/
1, then
both riscv-gnu-toolchain
and patcher
should be in stack/
.
I made this design decision for the following reason:
- I want to minimize the invasion to the gnu toolchain so that the cost of rebasing gnu toolchain can also be minimized;
opcode-custom
andriscv-opc.c
should be updated together to comply to the same format of the extended instructions, so it is highly desirable to have a unified programming interface to generate and integrate both. For further development, this interface can also be used to generate ISA extension to LLVM toolchain.- Last but not the least, it involves many manual "select, copy, and paste" stuff, which is error-prone and breaks the automaticity of the compilation flow of the infrastructure stack.
Refer to isa.ext, I have a text format to describe how the extended instructions look like. Then refer to the Makefile and auto-patch.py for how the involved files are modified to integrate the extended instructions.