Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-interpreting a value of a large type as an array of a smaller type #28

Open
ergawy opened this issue Apr 13, 2016 · 8 comments
Open

Comments

@ergawy
Copy link

ergawy commented Apr 13, 2016

I want to re-interpret a value as an array of values of a smaller type. For example, an i32 as an array of 4 i8s. Also, the source (big) and target (small) types are polymorphic so the size of the target array is not fixed (in terms of the number of elements rather than the total size in bytes). Can I do that?

@leissa
Copy link
Member

leissa commented Apr 14, 2016

No, bitcasting i32 to [i8 * 4] is not allowed. Doing stuff like this is not even allowed in C. For example, the following C code is illegal:

int32_t* p = /*...*/;
int16_t* q = (int16_t*) p;
int16_t i = q[1]; // the C standard does not allow this.

I don't understand the second part of your question. Can you give me some more context or an example?

@ergawy
Copy link
Author

ergawy commented Apr 14, 2016

EDIT:
I just wanted to clarify my intent with a simpler example. Here is a Rust code to do it:

fn main() {
    use std::mem;
    let x : i32 = 200;
    let one: [i8 ; 4] = unsafe { mem::transmute_copy(&x) };

    assert_eq!(-56, one[0]);
    assert_eq!(0, one[1]);
}

ORIGINAL COMMENT:
Sorry for not giving an example from the start.

I want to use the PTX warp shuffle instruction (here).

fn shuffle_words[T, T2](input: T, src_lane: i32) -> () {
   let num_words = sizeof[T]() / sizeof[T2]();
   let in_words : [T2 * num_words] = bitcast[[T2 * num_words]](&input); // I know that I can't use an expr for allocation here, but I am just trying to explain my point.
   let out_words : [T2 * num_words];
   for i in range(0, num_words) {
      out_words(i) = shuffle(in_words(i), src_lane);
   }
}

For more context, I am implementing parallel operations equivalent to what is provided by CUDA's CUB library. Here is an implementation of the shuffle operation from CUB's code:

template <typename T>
__device__ __forceinline__ T ShuffleDown(
    T               input,                                  ///< [in] The value to broadcast
    int             src_offset,                             ///< [in] The relative up-offset of the peer to read from
    int             last_lane = CUB_PTX_WARP_THREADS - 1)   ///< [in] Index of first lane in segment
{
    typedef typename UnitWord<T>::ShuffleWord ShuffleWord;

    const int       WORDS           = (sizeof(T) + sizeof(ShuffleWord) - 1) / sizeof(ShuffleWord);

    T               output;
    ShuffleWord     *output_alias   = reinterpret_cast<ShuffleWord *>(&output);
    ShuffleWord     *input_alias    = reinterpret_cast<ShuffleWord *>(&input);

    unsigned int shuffle_word;
    asm volatile("shfl.down.b32 %0, %1, %2, %3;"
        : "=r"(shuffle_word) : "r"((unsigned int) input_alias[0]), "r"(src_offset), "r"(last_lane));
    output_alias[0] = shuffle_word;

    #pragma unroll
    for (int WORD = 1; WORD < WORDS; ++WORD)
    {
        asm volatile("shfl.down.b32 %0, %1, %2, %3;"
            : "=r"(shuffle_word) : "r"((unsigned int) input_alias[WORD]), "r"(src_offset), "r"(last_lane));
        output_alias[WORD] = shuffle_word;
    }

//    ShuffleDown(input_alias, output_alias, src_offset, last_lane, Int2Type<WORDS - 1>());

    return output;
}

@slusallek
Copy link

Hi,

We should be careful here, line 3 is certainly legal (indexing an
array). The only issue is the pointer_cast, which is also not "illegal"
but probably means that the results are undefined (subsequent accesses
might give a "bus error" or such due to misalignement).

However, if handled properly (alignment and such) the results can be
well defined and are actually really useful. So, it makes sense to think
about"How can we design the system to allow this (after enough
safeguards)". For example, a pointer cast between 32 bit and 16 bit ints
should always be fine on x86, as far as I know, as long as the input
pointer is OK.

Best,

Philipp

Am 14.04.2016 um 11:50 schrieb Roland Leißa:

No, bitcasting i32 to [i8 * 4] is not allowed. Doing stuff like this is
not even allowed in C. For example, the following C code is illegal:

int32_t* p = /.../;
int16_t* q = (int16_t*) p;
int16_t i = q[1]; // the C standard does not allow this.

I don't understand the second part of your question. Can you give me
some more context or an example?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#28 (comment)


Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschäftsführung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Sitz der Gesellschaft: Kaiserslautern (HRB 2313)

USt-Id.Nr.: DE 148646973, Steuernummer: 19/673/0060/3

@leissa
Copy link
Member

leissa commented Apr 15, 2016

No, line 3 is illegal (i.e. undefined behavior). According to the C standard (§6.5 - 7):

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object

[...]

And the effective type of q[1] is int32_t. Line 2 is legal as long as the implementation-defined alignment requirements are obeyed.

But I agree that we want to support such things in the long run. I'm just saying, that there are good reasons for C to prohibit such things. Allowing such things will give us performance problems (missed optimization opportunities) at other locations.

@KareemErgawy: I will check out your example at the weekend.

@slusallek
Copy link

OK, I see. This is certainly not how C has been used for ages but it
seems to be undefined, at last.

Thanks,

Philipp

Am 15.04.2016 um 18:43 schrieb Roland Leißa:

No, line 3 is illegal (i.e. /undefined behavior/). According to the C
standard (§6.5 - 7):

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:

  * a type compatible with the effective type of the object

[...]

And the effective type of |q[1]| is |int32_t|. Line 2 is legal as long
as the implementation-defined alignment requirements are obeyed.

But I agree that we want to support such things in the long run. I'm
just saying, that there are good reasons for C to prohibit such things.
Allowing such things will give us performance problems (missed
optimization opportunities) at other locations.

@KareemErgawy https://github.com/KareemErgawy: I will check out your
example at the weekend.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#28 (comment)


Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschäftsführung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Sitz der Gesellschaft: Kaiserslautern (HRB 2313)

USt-Id.Nr.: DE 148646973, Steuernummer: 19/673/0060/3

@leissa
Copy link
Member

leissa commented Apr 15, 2016

Yes, it's a big problem. Many C programmers don't understand the standard and there are many myths regarding the language. And then programmers curse the compiler engineers when the C compiler apparently emits "incorrect" code. But actually the input program was broken to begin with... That's why icc, for instance, has very conservative optimization flags regarding aliasing rules - even with -O3.

But anyway, we're getting off-topic :)

@leissa
Copy link
Member

leissa commented Apr 18, 2016

@KareemErgawy: I think the easiest solution would be to simply use Rust's idea with this transmute_copy. Would this be fine for you?

@ergawy
Copy link
Author

ergawy commented Apr 18, 2016

Yes, that would be totally fine. Also for now, I ended up using bitwise operators as suggested by Arsene. Which do the same trick in a less elegant way.

The down side is this will enable us to only read the words of a large variable and not be able to write them (because it is just a copy not an actual alias). But I understand this goes against the design choice for more type safety.

But again, this is totally fine.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants