Skip to content

Commit

Permalink
blake256: Add _asm note about AVX2 attempts.
Browse files Browse the repository at this point in the history
  • Loading branch information
davecgh committed Jul 16, 2024
1 parent 6cf5cfd commit fb17c36
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions crypto/blake256/internal/_asm/gen_compress_asm_amd64.go
Original file line number Diff line number Diff line change
Expand Up @@ -1102,6 +1102,18 @@ func blocksAVX() {
}

func main() {
// -------------------------------------------------------------------------
// NOTE: Various attempts to optimize using the larger 256-bit registers
// provided by AVX2 were made, but since only 4 columns can be computed in
// parallel, it turns out that the extra overhead of shuffling data around
// offsets any gains made by the few places that the larger registers are
// able to speed up. That includes things such as converting the message to
// big endian using 2x256-bit registers and freeing up registers by packing
// more data into the larger registers and then making use of the extra
// freed up registers to cache the results of xoring the message and
// constants to reuse in final rounds where they are the same.
// -------------------------------------------------------------------------

// Ideally this would just reference the compress package with the struct
// definition, but avo doesn't seem to have a way to specify a build tag
// for this statement and the compress package is unable to build before the
Expand Down

0 comments on commit fb17c36

Please sign in to comment.