Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a API to check SVE Length support on ARM CPU. #255

Merged
merged 3 commits into from
Aug 7, 2024

Conversation

maajidkhann
Copy link
Contributor

@maajidkhann maajidkhann commented Aug 6, 2024

This pull request introduces a new feature to the cpuinfo library that adds an API to return the maximum supported Scalable Vector Extension (SVE) vector length on the given ARM CPU. This enhancement will allow users to query and determine the maximum SVE vector lengths on a given ARM CPU, providing better insights and flexibility for optimizing applications that utilize SVE.

Key Features:
New API Function:

Introduces a single API function - cpuinfo_get_max_arm_sve_length() that returns the maximum SVE Vector Length supported on the given ARM CPU.

The function is designed to be easy to integrate with existing code in other projects like PyTorch (pytorch/pytorch#119571) and provides a straightforward interface for querying SVE VL.

Here's the sample output on SVE supported instance:

Query:
test_cpp_aug7

Output on SVE256 supported Hardware - Graviton3:
test_cpp_output

@maajidkhann
Copy link
Contributor Author

@malfet @digantdesai Please review this PR. This will make it easy for people to use API's from CPUINFO into PyTorch to get SVE vector length info from the hardware.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix formatting (see lint failure in CI)
Also, all links to the images seems to be broken
And last but not least, please explain why boolean has_sve128 are better than single get_max_sve_len API?

Comment on lines 160 to 161
#define PR_SVE_GET_VL 51
#define PR_SVE_VL_LEN_MASK 0xffff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to add something like #ifndef PR_SVE_GET_VL guard here? And also, please mention which system headers they should be defined in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

tools/isa-info.c Outdated
Comment on lines 176 to 178
printf("\tSVE 128-bit: %s\n", cpuinfo_support_arm_sve128() ? "Yes" : "No");
printf("\tSVE 256-bit: %s\n", cpuinfo_support_arm_sve256() ? "Yes" : "No");
printf("\tSVE 512-bit: %s\n", cpuinfo_support_arm_sve512() ? "Yes" : "No");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developer.arm.com/documentation/101458/2404/Compiler-options/-msve-vector-bits- seems to indicate, that SVE length can be up to 2048? If this is the case, wouldn't it be better to add cpuinfo_get_arm_sve_lengh() method rather than enum? Also, if it's always a power of two, wouldn't it be easier if it returned a bitshifts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Done.
My intention in the first place was to introduce functions like how AVX has done for AVX2 and AVX512 which returns a bool and for SVE as only VL=128, 256 and 512 are relevant in ARM ecosystem, I added three different functions.

But it makes sense to just add a single function that returns maximum VL supported on that hardware and now the user can work out the logic required with the value returned.

@@ -1670,6 +1670,7 @@ struct cpuinfo_arm_isa {
bool sve;
bool sve2;
bool i8mm;
int svelen;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why signed type? Is -1 used to indicate unsupported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As that was the initial thought, -1 = unsupported. Now I have changed it to unsigned int as we don't deal with negative values and svelen value always remains >=0 in our context.

#define PR_SVE_VL_LEN_MASK 0xffff
int ret = prctl(PR_SVE_GET_VL);
if (ret < 0) {
perror("prctl(PR_SVE_GET_VL) failed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
perror("prctl(PR_SVE_GET_VL) failed");
cpuinfo_log_error("prctl(PR_SVE_GET_VL) failed");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

->Addresses review comments and adds a function
that directly returns the max VL on ARM supported
CPU's.

->Fixes Lint Issues.

Signed-off-by: maajidkhann <[email protected]>
@maajidkhann
Copy link
Contributor Author

Please fix formatting (see lint failure in CI) Also, all links to the images seems to be broken And last but not least, please explain why boolean has_sve128 are better than single get_max_sve_len API?

Addresses all the comments. Lint issues also have been now fixed.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the updates, looks good, but please consider addressing few minor remarks

static inline int cpuinfo_get_max_arm_sve_length(void) {
#if CPUINFO_ARCH_ARM64
if ((cpuinfo_isa.sve || cpuinfo_isa.sve2) && (cpuinfo_isa.svelen > 0))
return cpuinfo_isa.svelen;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood your previous diff correctly, svelen is returned in bytes, but API developers usually think about it in bitlength, therefore it must be multiplied by 8, shouldn't it?

Suggested change
return cpuinfo_isa.svelen;
return cpuinfo_isa.svelen * 8;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes its better to return bit length directly compared to bytes to avoid confusion to the end users.
Changes done.

@@ -1670,6 +1670,7 @@ struct cpuinfo_arm_isa {
bool sve;
bool sve2;
bool i8mm;
unsigned int svelen;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned int svelen;
uint32_t svelen;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 2047 to 2056
static inline int cpuinfo_get_max_arm_sve_length(void) {
#if CPUINFO_ARCH_ARM64
if ((cpuinfo_isa.sve || cpuinfo_isa.sve2) && (cpuinfo_isa.svelen > 0))
return cpuinfo_isa.svelen;
else
return -1; // Returns -1 on Non SVE supported ARM CPU's.
#else
return -1; // Returns -1 on Non ARM CPU's.
#endif
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep it simple? One does not need a check, on whether or not svelen is supported, as it should be initialized to 0, shouldn't it?

Suggested change
static inline int cpuinfo_get_max_arm_sve_length(void) {
#if CPUINFO_ARCH_ARM64
if ((cpuinfo_isa.sve || cpuinfo_isa.sve2) && (cpuinfo_isa.svelen > 0))
return cpuinfo_isa.svelen;
else
return -1; // Returns -1 on Non SVE supported ARM CPU's.
#else
return -1; // Returns -1 on Non ARM CPU's.
#endif
}
static inline uint32_t cpuinfo_get_max_arm_sve_length(void) {
#if CPUINFO_ARCH_ARM64
return cpuinfo_isa.svelen;
#else
return 0;
#endif
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, changes done.

// Mask out the SVE vector length bits
isa->svelen = ret & PR_SVE_VL_LEN_MASK;
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#endif
#else
isa->svelen = 0;
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not required anymore as we have removed the following check - #if defined(linux) (Not required as the code is compiled for Linux anyways).

@@ -151,4 +151,25 @@ void cpuinfo_arm64_linux_decode_isa_from_proc_cpuinfo(
if (features & CPUINFO_ARM_LINUX_FEATURE_ASIMDFHM) {
isa->fhm = true;
}

#if defined(__linux__)
#include <sys/prctl.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding #include directive in the body of the function feels very fragile, can you please move it to the top level of the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -151,4 +151,25 @@ void cpuinfo_arm64_linux_decode_isa_from_proc_cpuinfo(
if (features & CPUINFO_ARM_LINUX_FEATURE_ASIMDFHM) {
isa->fhm = true;
}

#if defined(__linux__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it needs #ifdef __linux__, as this file should only be compiled for Linux anyway, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Removed this macro.

@maajidkhann
Copy link
Contributor Author

Thank you for all the updates, looks good, but please consider addressing few minor remarks

All changes made as suggested.

Here's the test output, it now returns 256 on Graviton 3 EC2 instance.

test_output_cpuifo_aug7

@maajidkhann
Copy link
Contributor Author

I was also able to call this new API from Pytorch source code and access the vector length correctly.

image

@malfet malfet merged commit 16bfc16 into pytorch:main Aug 7, 2024
11 checks passed
ng-05 added a commit to ng-05/pytorch that referenced this pull request Sep 12, 2024
Regression PR : pytorch/cpuinfo#255

Signed-off-by: Nikhil Gupta <[email protected]>
Change-Id: I56cec061072be11ec33ccb661114360b979fc7aa
pytorch-bot bot pushed a commit to pytorch/pytorch that referenced this pull request Sep 17, 2024
Regression PR : pytorch/cpuinfo#255

Signed-off-by: Nikhil Gupta <[email protected]>
Change-Id: I56cec061072be11ec33ccb661114360b979fc7aa
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Sep 17, 2024
…re (#135857)

Regression PR : pytorch/cpuinfo#255

Change-Id: I56cec061072be11ec33ccb661114360b979fc7aa

Pull Request resolved: #135857
Approved by: https://github.com/digantdesai, https://github.com/malfet
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
…re (pytorch#135857)

Regression PR : pytorch/cpuinfo#255

Change-Id: I56cec061072be11ec33ccb661114360b979fc7aa

Pull Request resolved: pytorch#135857
Approved by: https://github.com/digantdesai, https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants