Skip to content
This repository has been archived by the owner on Feb 15, 2024. It is now read-only.

Latest commit

 

History

History
610 lines (401 loc) · 25.6 KB

Zfinx_spec.adoc

File metadata and controls

610 lines (401 loc) · 25.6 KB

ZFinx Specification v0.41 DEPRECATED - SEE THIS DOCUMENT INSTEAD

1. Spec update history

version

change

0.41

change misa.F to be hardwired to zero, note that we are waiting for a standard CSR discovery mechanism

2. Overview

Every current, and future extension which uses F registers will also have a ZFinx version which is incompatible with the extension. The ZFinx version reuses the integer register file (i.e. X registers) for all scalar floating point instructions. It gives an area saving, and also faster context switching as there are fewer registers to save and restore.

ZFinx is the version of the F extension which uses the X registers, hence the name F-in-x, and it also used as the general term for the -inx version of floating point extensions.

If any ZFinx extension is implemented, then all floating point extensions must use their respective Zfinx version, i.e. the F registers must not be referred to by any extension.

The ZFinx version of an extension additionally removes all:

  • floating point load instructions (e.g. FLW)

  • floating point store instructions (e.g. FSW)

  • integer to/from floating point register move instructions (e.g. FMV.X.W)

In all cases the integer versions of these are required instead.

The ZFinx version also changes the assembler syntax of floating point instructions changes so that they only refer to X registers. Therefore on an RV32F core, this is legal syntax:

FLW     fa4, 12(sp)        //load floating point data
FMADD.S fa1, fa2, fa3, fa4 //floating point arithmetic for RV32F

On a RV32_ZFinx core, this syntax must be used as the F registers are not implemented, and FLW is not a supported instruction:

LW      a4, 12(sp)     //load integer data
FMADD.S a1, a2, a3, a4 //floating point arithmetic for RV32F ZFinx

Note that only the assembler syntax differs between the two FMADD.S instructions, the encoding is the same.

The assembler syntax changes to avoid code-porting bugs, so that the registers must be updated and not just reused from non-ZFinx code.

If any ZFinx version is implemented then the core is a referred to as a Zfinx core, otherwise it’s a non-Zfinx core.

3. Configurations

Any RISC-V extension that uses F registers has an equivalent ZFinx version. The ZFinx version

  1. inherits all details from the base extension

  2. remaps the F registers to X registers as set out in this specification

  3. deletes all floating point load and stores from the extension

  4. deletes F to X / X to F moves from the extension

Table 1. ZFinx versions of extensions
Base Extension ZFinx version Comment

F

ZFinx

Base ZFinx extension

D

ZDinx

Implies ZFinx

Q

ZQinx

Implies ZDinx, ZFinx

Zfh

Z[FDQ]inx_Zfh{,min}

ZFinx version of [FDQ] must also be implemented

V

Z[FDQ]inx_V

V supporting vector FP implies [FDQ]

In Table 1 all the existing extensions require ZFinx to be present, i.e. the modified version of the F extension. Such a rule is set by the FP extension, and not by this specification.

The ZFinx version may be used with I or E integer variants. The number of integer registers directly affects the ZFinx version as it refers to the X registers. This is no different to the effect on (e.g.) the M extension on an RV32I or RV32E core.

ZFinx may be used with any extensions that uses F registers. The relative sizes of XLEN and FLEN affect the ZFinx specification.

Table 2. supported ZFinx configurations
Architecture ZFinx version Comment

RV32D

RV32_ZDinx

XLEN<FLEN

RV32D Zfh

RV32_ZDinx_Zfh

XLEN<FLEN

RV32F

RV32_ZFinx

XLEN==FLEN

RV32F Zfh

RV32_ZFinx_Zfh

XLEN==FLEN

RV64D

RV64_ZDinx

XLEN==FLEN

RV64D_Zfh

RV64_ZDinx_Zfh

XLEN==FLEN

RV64F

RV64_ZFinx

XLEN>FLEN

RV64F_Zfh

RV64_ZFinx_Zfh

XLEN>FLEN

Note
RV32D_ZFinx{_Zfh} requires register pairs so is more complex than the other cases, see Chapter 7.
Note
ZQinx is not considered in the table, or further in this specification.

4. Semantic Differences

The NaN-boxing behaviour of floating point arithmetic instructions is modified to suppress checking of sources only. Floating point results are always NaN-boxed to XLEN bits.

NaN-boxing checking is removed as integer loads do not NaN-box their result, and so loading fewer than XLEN bits (for example using LW to load floating point data on an RV64 core) would otherwise require NaN-boxing in software that wastes performance and code-size.

There are no other semantic differences for floating point instruction behaviour for ZFinx versions, but there are some differences for special cases (such as x0 handling) as listed later in this specification.

5. Discovery

If any ZFinx extension is specified then the compiler will have the following #define set:

__riscv_zfinx

So software can use this to choose between ZFinx or normal versions of floating point code.

Privileged code can detect whether any ZFinx extension is implemented by checking if:

  • mstatus.FS is hardwired to zero, and

  • misa.F is hardwired to zero, and

  • CSR indicating `Z[FDQ]inx has yet to be specified, I’m waiting for a standard approach for extensions

Non-privileged code can detect whether ZFinx is implemented as follows:

li a0, 0 # set a0 to zero

#ifdef __riscv_zfinx

fneg.s a0, a0 # this will invert a0

#else

fneg.s fa0, fa0 # this will invert fa0

#endif

If a0 is non-zero then it’s a ZFinx core, otherwise it’s a non-ZFinx core. Both branches result in the same encoding, but the assembly syntax is different for each variant.

6. mstatus.fs

For ZFinx cores mstatus.fs is hardwired to zero, because all the integer registers already form part of the current context. Note however that fcsr still eds to be saved and restored. This gives a performance advantage when saving/restoring contexts.

Floating point instructions and fcsr accesses do not trap if mstatus.fs=0. This is different to non-ZFinx cores.

7. Register pair handling for XLEN < FLEN

For RV32_ZDinx, all D-extension instructions that are implemented will access register pairs:

  1. The specified register must be even, odd registers will cause an illegal instruction exception.

  2. Even registers will cause an even/odd pair to be accessed.

    1. Accessing Xn will cause the {Xn+1, Xn} pair to be accessed, which is consistent for big and little endian modes. For example if n = 2:

      1. X2 is the least significant half (bits [31:0])

      2. X3 the most significant half (bits [63:32])

  3. X0 has special handling:

    1. Reading {X1, X0} will read all zeros.

    2. Writing {X1, X0} will discard the entire result, it will not write to X1.

The register pairs are only used by the floating point arithmetic instructions. All integer loads and stores will only access XLEN bits, not FLEN.

Note
Zp64 from the P-extension will specify consistent register pair handling, but at the time of writing swaps the registers in the pair in big endian mode.
Note
The decision was taken not to swap the order of registers in the pair for big endian mode to reduce read-muxing in the register file, or in the ALU. If big-endian pair swapping is required it will need to be done in software or by a future load-pair instruction.
Note
Big endian mode is enabled in M-mode if mstatus.MBE=1, in S-mode if mstatus.SBE=1, or in U-mode if mstatus.UBE=1.

8. x0 register target

If a floating point instruction targets x0 then it will still execute, and will set any required flags in fcsr. It will not write to a target register. This matches the standard F extension behaviour for:

fcvt.w.s x0, f0

If the floating point source is invalid then it will set the fflags.NV bit, regardless of whether F or ZFinx is implemented. The target register is not written as it is x0.

If fcsr.RM is in an illegal state then floating point instruction behaviour is the same whether the target register is x0 or not, i.e. targetting x0 doesn’t disable any execution side effects.

In the case of RV32_ZDinx, register pairs are used. See above for x0 handling.

9. NaN-boxing

For ZFinx cores the NaN-boxing is limited to XLEN bits, not FLEN bits. Therefore an FADD.S executed on an RV64D core will write a 64-bit value (the MSH will be all 1’s). On an RV32_ZDinx core it will write a 32-bit register, i.e. a single X register only. This means there is semantic difference between these code sequences:

#ifdef __riscv_zfinx

fadd.s x2, x3, x4 # only write x2 (32-bits), x3 is not written

#else

fadd.s f2, f3, f4 # NaN-box 64-bit f2 register to 64-bits

#endif

NaN-box generation is supported by ZFinx cores. NaN-box checking is not supported by scalar floating point instructions. For example for RV64F:

#ifdef __riscv_zfinx

lw[u] x1, 0(sp)   # load 32-bits into x1 and sign / zero extend upper 32-bits
fadd.s x1, x1, x1 # use x1 but do not check source is Nan-boxed, NaN-box output

#else

flw.s  f1, 0(sp)  # load 32-bits into f1 and NaN-box to 64-bits (set upper 32-bits to 0xFFFFFFFF)
fadd.s f1, f1, f1 # check f1 is NaN-boxed, NaN-box output

#endif

Floating point loads are not supported on ZFinx cores so x1 is not NaN-boxed in the example above, therefore the FADD.S instruction does not check the input for NaN-boxing. The result of FADD.S is NaN-boxed, that means setting the upper half of the output register to all 1’s.

The table shows the effect of writing each possible width of value to the register file for all supported combinations. Note that Verilog syntax is used in the final column.

Table 3. NaN-boxing for supports configurations
XLEN FP output width Xreg writeback value

functional description

implementation

64

16

NaN_box_to_XLEN(result[15:0])

{48{1’b1}, result[15:0]}

32

16

NaN_box_to_XLEN(result[15:0])

{16{1’b1}, result[15:0]}

64

32

NaN_box_to_XLEN(result[31:0])

{32{1’b1}, result[31:0]}

32

32

NaN_box_to_XLEN(result[31:0])

result[31:0]

64

64

NaN_box_to_XLEN(result[63:0])

result[63:0]

Little or big endian (special handling Xreg={0, 1})

32

64

EvenXreg: NaN_box_to_XLEN(result[31:0])

OddXreg: NaN_box_to_XLEN(result[63:32])

EvenXreg: result[31:0]

OddXreg: result[63:32]

Therefore, for example, if an FADD.S instruction is issued on an RV64_ZFinx core then the upper 32-bits will be set to one in the target integer register, or an FADD.H (floating point add half-word) instruction will set the upper 48-bits to one.

9.1. misa.mxl

misa.mxl can be programmed to change the current value of XLEN.

The combination of ZFinx and programming misa.mxl to reduce XLEN from the maximum implemented value gives addition cases to consider as shown in the table.

The result from the floating point instruction is NaN-boxed to the current value of XLEN, and then sign extended to the maximum value of XLEN.

Table 4. NaN-boxing for supports configurations with varying misa.mxl
XLEN FP output width Xreg writeback value

maximum

misa.mxl

functional description

implementation

128

64

16

SignExt_to_128(NaN_box_to_64(result[15:0]))

{112{1’b1}, result[15:0]}

128

32

16

SignExt_to_128(NaN_box_to_32(result[15:0]))

{112{1’b1}, result[15:0]}

64

32

16

SignExt_to_64(NaN_box_to_32(result[15:0]))

{48{1’b1}, result[15:0]}

128

64

32

SignExt_to_128(NaN_box_to_64(result[31:0]))

{96{1’b1}, result[31:0]}

128

32

32

SignExt_to_128(result[31:0])

{96{result[31]}, result[31:0]}

64

32

32

SignExt_to_64(result[31:0])

{32{result[31]}, result[31:0]}

128

64

64

SignExt_to_128(result[63:0])

(64{result[63]}, result[63:0]}

Little or big endian (special handling Xreg={0, 1})

128

32

64

EvenXreg: SignExt_to_128(result[31:0])

OddXreg: SignExt_to_128(result[63:32])

EvenXreg: {96{result[31]}, result[31:0]}

OddXreg: {96{result[63]}, result[63:32]}

64

32

64

EvenXreg: SignExt_to_64(result[31:0])

OddXreg: SignExt_to_64(result[63:32])

EvenXreg: {32{result[31]}, result[31:0]}

OddXreg: {32{result[63]}, result[63:32]}

10. Assembly Syntax and Code Porting

Any references to F registers, or removed instructions will cause assembler errors.

For example, the encoding for:

FMADD.S <1>, <2>, <3>, <4>

will disassemble and execute as:

FMADD.S f1, f2, f3, f4

on a non-ZFinx core, or:

FMADD.S x1, x2, x3, x4

on a ZFinx core.

We considered allowing pseudo-instructions for the deleted instructions for easier code porting. For example allowing FLW to be a pseudo-instruction for LW, but decided not to. Because the register specifiers must change to integer registers, it makes sense to also remove the use of FLW etc. In this way the user is forced to rewrite their code for a ZFinx core, reducing the chance of undiscovered porting bugs. This only affects assembly code, high level language code is unaffected as the compiler will target the correct architecture.

11. Modifications from extensions to the ZFinx versions

All floating point loads, stores and floating point to/from integer moves are removed on ZFinx cores. The following sections show the deleted instructions and give suggested replacements to get the same semantics.

Note
Where a floating point load loads fewer than XLEN bits then software NaN-boxing in software is required to get the same semantics as a non-ZFinx core. This is specified for consistency but is unlikely to be necessary. The compiler should not NaN-box in software as there is no reason to do so. Assembly writers can choose whether to NaN-box in software to give better error detection.
Note
Where a floating point move moves fewer than XLEN bits then either sign extension (if the target is an X register) or NaN-boxing (if the target is an F register) is required in software to get the same semantics.

11.1. Modifications from F to ZFinx

The modifications to the ISA of the F extension are shown in Table 5.

Table 5. replacements for F extension floating point load/store/move instructions
Instruction RV32_ZFinx RV64_ZFinx

suggested replacements

FLW frd, offset(xrs1)

LW

LW[U] and NaN-box in software

C.FLW frd, offset(xrs1)

C.LW

C.LW and NaN-box in software

C.FLWSP frd, uimm(x2)

C.LWSP

C.LWSP and NaN-box in software

FSW frd, offset(xrs1)

SW

SW

C.FSW frd, offset(xrs1)

C.SW

C.SW

C.FSWSP frd, uimm(x2)

C.SWSP

C.SWSP

FMV.X.W xrd, frs1

MV

MV and sign extend in software

FMV.W.X frd, xrs1

MV

MV and NaN-box in software

11.2. Modifications from D to ZDinx

The modifications to the ISA of the D extension are shown in Table 6.

Table 6. replacements for D extension floating point load/store/move instructions
Instruction RV32_ZDinx RV64_ZDinx

suggested replacements

FLD frd, offset(xrs1)

LW,LW

LD

C.FLD frd, offset(xrs1)

C.LW, C.LW

C.LD

C.FLDSP frd, uimm(x2)

C.LWSP, C.LWSP

C.LDSP and NaN-box in software

FSD frd, offset(xrs1)

SW,SW

SD

C.FSD frd, offset(xrs1)

C.SW,C.SW

C.SD

C.FSDSP frd, uimm(x2)

C.SWSP,C.SWSP

C.SDSP

FMV.X.D xrd, frs1

MV,MV

MV

FMV.D.X frd, xrs1

MV,MV

MV

11.3. Modifications from Zfh to ZFinx_Zfh

The modifications to the ISA of the Zfh extension are shown in Table 7, in addition to Table 5.

Table 7. replacements for D floating point load/store/move instructions
Instruction RV32_ZFinx_Zfh RV64_ZFinx_Zfh

suggested replacements

FLH frd, offset(xrs1)

LH[U] and NaN-box in software

FSH frd, offset(xrs1)

SH

FMV.X.H xrd, frs1

MV and sign extend in software

FMV.H.X frd, xrs1

MV and NaN-box in software

11.3.1. Use of the B-extension

The B-extension is useful for sign extending and NaN-boxing.

To sign-extend using the B-extension:

FMV.X.H rd, rs1

is replaced by:

SEXT.H rd, rs1

Without the B-extension two instructions are required: shift left 16 places, then arithmetic shift right 16 places.

NaN boxing in software is more involved, as the upper part of the register must be set to 1. The B-extension is also helpful in this case.

FMV.H.X a0, a1

is replaced by:

C.ADDI a2, zero, -1

PACK a0, a1, a2

12. Modifications from V to Z[FD]inxV

The following instructions are deleted, and the integer version is to be used instead.

Table 8. replacements for scalar floating point instructions
Instruction Integer version

vfmv.v.f

vmv.v.x

vfmv.f.s

vmv.x.s

vfmv.s.f

vmv.s.x

vfmerge.vfm

vmerge.vxm

Additionally, all instructions with funct3=OPFVF take the scalar floating point source from either a single or pair of X registers instead of a single F register.

13. GDB

When using GDB on a ZFinx core, GDB must report x-registers instead of f-registers when disassembling floating point opcodes. No other changes are required.

14. ABI

For details of the current calling conventions see:

The ABI when using ZFinx must be one of the the standard integer calling conventions as listed below:

  • ilp32e

  • ilp32

  • lp64

Note
Currently the ELF header is using a temporary flag to denote ZFinx so that the disassembler knows whether to decode e.g. FADD.S x0, x1, x2 or FADD.S f0, f1, f2

15. Floating Point Configurations To Reduce Area

To reduce the area overhead of FPU hardware new configurations will make the F[N]MADD.*, F[N]MSUB.* and FDIV.*, FSQRT.*` instructions optional in hardware. This then gives the choice of implementing them in software instead by:

  1. Taking an illegal instruction trap, and calling the required software routine in the trap handler. This requires that the opcodes are not reallocated and gives binary compatibility between cores with/without hardware support for F[N]MADD.*, F[N]MSUB.* and FDIV.*, FSQRT.*, but is lower performance than option 2.

  2. Use the GCC options below so that a software library is used to execute them

This argument already exists for RISCV:

gcc -mno-fdiv

This argument exists for other architectures (e.g. MIPs) but not for RISCV, so it needs to be added:

gcc -mno-fused-madd

To achieve this we break all current and future floating point extensions into four parts: Z*base, Z*ma, Z*div and Z*ldstmv. There is an -inx version of the first three.

Table 9. floating point configurations
Options Meaning

base ISA

Zfhbase

Support half precision base instructions

Zfbase

Support single precision base instructions

Zdbase

Support double precision base instructions

Zqbase

Support quad precision base instructions

base ISA-in-x

Zfhbaseinx

Support ZFinx half precision base instructions

Zfbaseinx

Support ZFinx single precision base instructions

Zdbaseinx

Support ZFinx double precision base instructions

Zqbaseinx

Support ZFinx quad precision base instructions

FMA

Zfhma

Support half precision multiply-add

Zfma

Support single precision multiply-add

Zdma

Support double precision multiply-add

Zqma

Support quad precision multiply-add

FMA-in-x

Zfhmainx

Support ZFinx half precision multiply-add

Zfmainx

Support ZFinx single precision multiply-add

Zdmainx

Support ZFinx double precision multiply-add

Zqmainx

Support ZFinx quad precision multiply-add

FDIV

Zfhdiv

Support half precision divide/square-root

Zfdiv

Support single precision divide/square-root

Zddiv

Support double precision divide/square-root

Zqdiv

Support quad precision divide/square-root

FDIV-in-x

Zfhdivinx

Support ZFinx half precision divide/square-root

Zfdivinx

Support ZFinx single precision divide/square-root

Zddivinx

Support ZFinx double precision divide/square-root

Zqdivinx

Support ZFinx quad precision divide/square-root

load/store/move, incompatible with -inx options

Zfhldstmv

Support load,store and integer to/from FP move

Zfldstmv

Support load,store and integer to/from FP move

Zdldstmv

Support load,store and integer to/from FP move

Zqldstmv

Support load,store and integer to/from FP move

Therefore:

  • RV32F can be expressed as rv32_Zfbase_Zfma_Zfdiv_Zfldstmv.

  • RV32D can be expressed as rv32_Zfbase_Zfma_Zfdiv_fldstmv_Zdbase_Zdma_Zddiv_Zdldstmv.

  • RV32_ZFinx can be expressed as rv32_Zfbaseinx_Zfmainx_Zfdivinx.

  • RV32_ZDinx can be expressed as rv32_Zfbaseinx_Zfmainx_Zfdivinx_Zdbaseinx_Zdmainx_Zddivinx.

If any -inx extension is specified, then all extensions from Table 9 must have an -inx suffix. The options are all additive, none of them remove or change instructions.

16. Rationale, why implement ZFinx?

Small embedded cores that need to implement floating point extensions have some options:

  • Use software emulation of floating point instructions, so don’t implement a hardware FPU that gives minimum core area:

    • The floating point library can be large, and expensive in terms of ROM or flash storage, costing power and energy consumption.

    • The performance of this solution is very low.

  • Low core area floating point implementations:

    • Share the integer registers for floating point instructions (ZFinx).

      • Will cause more register spills/fills than having a separate register file, but the effect of this is application dependant.

      • No need for special instructions such as load and stores to access floating point registers, and moves between integer and floating point registers.

    • There are still performance/area tradeoffs to make for the FPU design itself.

      • e.g. pipelined versus iterative.

    • Optionally remove multiply-add instructions to save area in the FPU and a register file read port.

    • Optionally remove divide/square root instructions to to save area in the FPU.

  • Dedicated FPU registers, and higher performance FPU implementations use the most area:

    • Separate floating point registers allow fewer register spills/fills, and can also be used for integer code to prevent spilling to memory.

    • There are the same performance/area tradeoffs for the FPU design.

ZFinx is implemented to allow core area reduction as the area of the F register file is significant, for example:

  • RV32I_ZFinx saves 1/2 the register file state compared to RV32IF.

  • RV32E_ZFinx saves 2/3 the register file state compared to RV32EF.

Therefore ZFinx should allow small embedded cores to support floating point with:

  • Minimal area increase

  • Similar context switch time as an integer only core

    • there are no F registers to save/restore

  • Reduced code size by removing the floating point library