-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intgemm is not compiling with icc (>16). #82
Comments
Does 65276ad fix this? |
@sidkashyap-at-Intel Can you setup CI for icc? |
I cherry-picked the intgemm, compiled it and got the following output: compile.log. |
Sigh there's a bug in the Intel compiler in which constructors with target attributes compile but fail to link. Minimal example:
Command:
Whereas a regular method compiles and links fine:
@akarbown it looks like you work for Intel? Can you help with icc bugs? We collaborate with @sidkashyap-at-Intel. |
@XapaJIaMnu says he tried to report this to Intel two years ago via @sidkashyap-at-Intel but couldn't figure out how to raise it to the team. |
It compiles when it looks as follows:
I get to know that it is because the compiler creates a resolver function something like this:
It seems to be pretty interesting that for a regular function it works without the attribute default. |
Target attributes pre-date function multiversioning. Function multiversioning is a feature that should only be activated if there are two function declarations with the same signature that differ only in target attributes. The bug in the Intel compiler is that it activates function multiversioning for constructors when there is only one version. We wanted to use function multi-versioning in intgemm but it's actually quite bad for performance due to adding the CPUID dispatch to all the functions, even in contexts where the target architecture is already known: https://hannes.hauswedell.net/post/2017/12/09/fmv/ . So we do target attributes inside the code then a call-by-function-pointer on the external APIs. |
I tried to find documentation on icc. To quote page 2212 of https://software.intel.com/content/dam/develop/external/us/en/documents/cpp_compiler_classic.pdf :
So it would seem the defined behavior for icc is to match whatever gcc does. And the gcc documentation https://gcc.gnu.org/wiki/FunctionMultiVersioning states
It further defines this as
I don't see anything about a constructor being treated differently. And gcc empirically handles a single constructor with a target attribute correctly. Moreover, icc is moving to a clang-based version of target attributes, which is what we are doing: https://software.intel.com/content/www/us/en/develop/articles/porting-guide-for-icc-users-to-dpcpp-or-icx.html I tried But anyway try 2647d6c ; I just removed the target attributes entirely from icc. Possibly less efficient via icc but deservedly so. |
Thanks for the temporary solution! I'll try to poke a compiler team in that case. |
@akarbown: Have forwarded the meeting minutes with the MKL where we discussed these issues, let us sync offline. |
I've created jira issue internally. If you have any more questions connected with that issue I think that @mibintc could give you more information or direct you to the proper person. |
When the icc compiler sees a target attribute, it thinks you are using gnu function multiversioning. Since your program didn't provide a "default" function, the multiversion definition wasn't complete. The icc compiler really doesn't support the other meaning of target attribute. As I understand it from reading the gcc documentation, __attribute((target("avx")) which is not gnu function multiversioning means that the compiler is entitled to generate avx2 instructions for the function body. If the code was executed on a cpu that didn't support avx2 then you would get an instruction fault. I believe that you could get the effect of avx2 instructions by separating out Foo()'s definition into a separate compilation unit and choose target architecture avx2 using the command line option. Since icc doesn't support the other meaning of target attribute, we could ignore it, I don't know if that would be sufficient to meet your needs. I haven't studied your code base, I just looked at the small example provided. |
If that is your policy then this should be an error. But it compiles fine.
Ok. Did I miss some documentation?
Correct. The goal here is to avoid CPUID dispatch for every little internal function call and only do it once at the top-level interface. If every function call had multiversioning, it would be very slow without a compiler that realizes it's already in a achitecture-dependent function and can directly call / inline the next architecture-dependent function.
We have user-defined C++ template arguments. Separate compilation units for each architecture don't really work for that, because it would require our users to separately compile all their code separately for each architecture. Suppose you had some architecture-dependent optimization for std::vector.
It appears that icc allows any intrinsic inside any function, regardless of architecture consistency. Therefore this would work for us and be compatible with gcc's behavior, albeit with the possibility of missed optimizations. |
|
@mibintc we intended to make use of function multiversioning initially, but it appears compiler support for it is extremely patchy. Here's a list of bugs against clang, gcc and icc: from GCC: from clang: For ICC (inline code snippets, with examples).
#include <xmmintrin.h>
#include <emmintrin.h>
#include <immintrin.h>
//clang and gcc expect the precise instruction set name
//(eg avx512bw and avx512f)/ icc only works with avx512f
__attribute__((target("avx512bw"))) static inline __m512 max_ps(__m512 first, __m512 second) {
return _mm512_max_ps(first, second);
}
// Technically __AVX512DQ__
__attribute__((target("avx512dq"))) static inline __m512 and_ps(__m512 first, __m512 second) {
return _mm512_and_ps(first, second);
}
int main()
{
__m512 first, second, res1, res2;
res1 = max_ps(first, second);
res2 = and_ps(first, second);
return 2;
}
class foo {
public:
int a;
foo() {
a = 1;
}
__attribute__ ((target("avx2"))) int operator()() {
return a+2;
}
};
int main() {
foo a = foo();
return a();
}
template <class T>
__attribute__ ((target("avx2"))) T foo(T num) {
return num + 1;
}
template <class T>
__attribute__ ((target("default"))) T foo(T num) {
return num + 2;
}
int __attribute__ ((target("avx2"))) main() {
return foo(1);
}
__attribute__ ((target("avx2"))) int foo(int num) {
return num + 1;
}
__attribute__ ((target("default"))) double foo(double num) {
return num + 2;
}
int __attribute__ ((target("avx2"))) main() {
return foo(1);
} |
I think markdown indentation ate your post. Can you space it out more? While it's nice that call sites can be patched at load time, function multiversioning still adds an optimization barrier and prevents inlining. See https://godbolt.org/z/E9o7hP . Input code:
We can see in the output that there's code in
This might be because the compiler doesn't know if I've made a third Due to the performance overhead, we've abandoned function multiversioning in favor of separate namespaces for each architecture that call within the same namespace using functions with a single target attribute defined. The compilers are then able to properly inline functions with the same target attribute into each other and we only do dispatch outside. |
@mibintc I really appreciate you stopping by! |
BTW I asked the technical support team to help out. I downloaded intgemm and skimmed the source for target attribute. It looks like you are using conditional compilation to avoid using target attribute for MSVC. You could lump the __INTEL_COMPILER into that, thereby avoiding the use of the target attribute, since the target attribute isn't adding anything in the way that you are using it and only causing pain. Even if we fix this issue in icc, (i.e. by more or less ignoring the target attribute) due to the release cadence there would be a frustrating wait. I also want to draw your attention to the next generation Intel C++ compiler (macro __INTEL_LLVM_COMPILER) which is clang based and available for installation without cost. In contrast to icc, the icx compiler does pay attention to target attribute and it does force you to match the target to the intrinsics being used. icx/icpx is in this package: https://software.intel.com/content/www/us/en/develop/tools/oneapi.html |
We are actually compiling with the (old) Intel compiler as of 5 January 2647d6c (as noted above) and you can see the CI here https://github.com/kpu/intgemm/runs/1919056224 . It works by removing all target attributes from the Intel compiler, like it does for MSVC. In theory the lack of target attributes prevents the compiler from generating its own instructions. In practice, we don't leave much room for the compiler to do that. Though it might happen for some loads where we just use * instead of an intrinsic. We're aware of the llvm-based compiler and already support clang-type behavior. By the way though why doesn't the Intel compiler support |
While compiling intgemm with one of the latest versions of the ICC (icpc (ICC) 19.1.3.304) I got the following result:
as far as I know the attribute ((target ("sse2/sse3/avx2"))) that is used here is a "gnu extension" to the c/c++ programming language and it is not working with ICC compiler.
The text was updated successfully, but these errors were encountered: