Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating kernel panics on Mac OS X #65

Closed
michael-p opened this issue Apr 20, 2017 · 20 comments
Closed

Creating kernel panics on Mac OS X #65

michael-p opened this issue Apr 20, 2017 · 20 comments

Comments

@michael-p
Copy link
Contributor

Dear all,
I tried running the "trivial.rs" example on my Mac (OS X 10.11, pre-installed OpenCL 1.2, ocl version 0.13.0) put it panics when creating the kernel because Kernel::new("add", &program) returns Err([NONE]). It does this in the versions in main(), main_explained(), and main_exploded(), but interestingly NOT in main_cored() which uses ocl-core...

In main_cored I also added a call to core::get_kernel_info(&kernel, KernelInfo::NumArgs) immediately after creating the kernel (because that's what the other non-working examples do behind the scenes) but that's working fine.

I tested this using both the CPU and GPU devices in the platform, both do not work. Interestingly, when using just the GPU device and printing that one out using println!("Device: {:?}", device); I get the following suspiciously looking output:

Device: Device { Type: DEVICE_TYPE_GPU, VendorId: 16925952, MaxComputeUnits: 48, MaxWorkItemDimensions: 3, MaxWorkGroupSize: 256, MaxWorkItemSizes: [256, 256, 256], PreferredVectorWidthChar: 1, PreferredVectorWidthShort: 1, PreferredVectorWidthInt: 1, PreferredVectorWidthLong: 1, PreferredVectorWidthFloat: 1, PreferredVectorWidthDouble: 0, MaxClockFrequency: 1050, AddressBits: 64, MaxReadImageArgs: 128, MaxWriteImageArgs: 8, MaxMemAllocSize: 402653184, Image2dMaxWidth: 16384, Image2dMaxHeight: 16384, Image3dMaxWidth: 2048, Image3dMaxHeight: 2048, Image3dMaxDepth: 2048, ImageSupport: true, MaxParameterSize: 1024, MaxSamplers: 16, MemBaseAddrAlign: 1024, MinDataTypeAlignSize: 128, SingleFpConfig: FP_INF_NAN | FP_ROUND_TO_NEAREST | FP_ROUND_TO_ZERO | FP_ROUND_TO_INF | FP_FMA | FP_CORRECTLY_ROUNDED_DIVIDE_SQRT, GlobalMemCacheType: None, GlobalMemCachelineSize: 0, GlobalMemCacheSize: 0, GlobalMemSize: 1610612736, MaxConstantBufferSize: 65536, MaxConstantArgs: 8, LocalMemType: Local, LocalMemSize: 65536, ErrorCorrectionSupport: false, ProfilingTimerResolution: 80, EndianLittle: true, Available: true, CompilerAvailable: true, ExecutionCapabilities: EXEC_KERNEL, QueueProperties: QUEUE_PROFILING_ENABLE, Name: Intel(R) Iris(TM) Graphics 6100, Vendor: Intel Inc., DriverVersion: 1.2(Feb 17 2017 12:40:05), Profile: FULL_PROFILE, Version: OpenCL 1.2 , Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images cl_khr_3d_image_writes , Platform: PlatformId(0x7fff0000), DoubleFpConfig: , HalfFpConfig: 

################################ OPENCL ERROR ############################### 

Error executing function: clGetDeviceInfo  

Status error code: CL_INVALID_OPERATION (-59)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetDeviceInfo.html#errors  

############################################################################# 
, PreferredVectorWidthHalf: 0, HostUnifiedMemory: true, NativeVectorWidthChar: 1, NativeVectorWidthShort: 1, NativeVectorWidthInt: 1, NativeVectorWidthLong: 1, NativeVectorWidthFloat: 1, NativeVectorWidthDouble: 0, NativeVectorWidthHalf: 0, OpenclCVersion: OpenCL C 1.2 , LinkerAvailable: true, BuiltInKernels: , ImageMaxBufferSize: 25165824, ImageMaxArraySize: 2048, ParentDevice: None, PartitionMaxSubDevices: 0, PartitionProperties: [], PartitionAffinityDomain: , PartitionType: [], ReferenceCount: 1, PreferredInteropUserSync: true, PrintfBufferSize: 1048576, ImagePitchAlignment: 32, ImageBaseAddressAlignment: 4 }

Please let me know if you need further information (like the output of clinfo command) or if there is anything else I could try!

Thank you very much!
Michael

@c0gent
Copy link
Member

c0gent commented Apr 20, 2017

Michael,
Thank you for the thorough bug report. I have a good idea what may be the problem. I'll look into this asap and get back to you.

@c0gent
Copy link
Member

c0gent commented Apr 20, 2017

Could you please provide the stack trace from the trivial.rs example?

Would you mind also running device_check.rs and including the stack trace from that if it panics?

I have a strong suspicion of the cause but I don't have access to a Mac so I can't be sure. I appreciate your help getting to the bottom of this.

@c0gent
Copy link
Member

c0gent commented Apr 20, 2017

Also, you may as well include the output of the info_core.rs example as well. Thanks :)

@michael-p
Copy link
Contributor Author

michael-p commented Apr 20, 2017

Sure, here's the backtrace from trivial.rs: (no line numbers on OS X, sorry.... It's the line let kernel = Kernel::new("add", &program).unwrap();)

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: [NONE]', /Users/rustbuild/src/rust-buildbot/slave/stable-dist-rustc-mac/build/src/libcore/result.rs:868
stack backtrace:
   1:        0x1053f6fbc - std::sys::imp::backtrace::tracing::imp::write::h21ca2762819c7ae8
   2:        0x1053f886e - std::panicking::default_hook::{{closure}}::h38f99a37d00bb19b
   3:        0x1053f8510 - std::panicking::default_hook::ha2186ee24b50729c
   4:        0x1053f8cc7 - std::panicking::rust_panic_with_hook::h979db19ee91d2a53
   5:        0x1053f8b74 - std::panicking::begin_panic::h6a69f5b54391c64d
   6:        0x1053f8a92 - std::panicking::begin_panic_fmt::h9de2343580b3c2c4
   7:        0x1053f89f7 - rust_begin_unwind
   8:        0x10541fa90 - core::panicking::panic_fmt::haa2997386017a96f
   9:        0x10531db7e - core::result::unwrap_failed::hce586d3f0e25aed9
  10:        0x105313b64 - <core::result::Result<T, E>>::unwrap::h2c867b3dc6d3815f
  11:        0x10533a91b - opencl_test::main_exploded::h25f10bc223d6e76a
  12:        0x10533a38c - opencl_test::main::hbe49819fd96bc4d9
  13:        0x1053f9afa - __rust_maybe_catch_panic
  14:        0x1053f9096 - std::rt::lang_start::hfc9882558f9403bf
  15:        0x10533b0f9 - main

Note that I commented out everything in main() except the call to main_exploded() (otherwise it already panics in the corresponding Kernel::new(...).unwrap() call in main)

Output of device_check.rs:

Platform: Apple
Device: Intel Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: 

################################ OPENCL ERROR ############################### 

Error executing function: clCreateCommandQueue  

Status error code: CL_INVALID_VALUE (-30)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clCreateCommandQueue.html#errors  

############################################################################# 
', /Users/rustbuild/src/rust-buildbot/slave/stable-dist-rustc-mac/build/src/libcore/result.rs:868
stack backtrace:
   1:        0x103ce090c - std::sys::imp::backtrace::tracing::imp::write::h21ca2762819c7ae8
   2:        0x103ce271e - std::panicking::default_hook::{{closure}}::h38f99a37d00bb19b
   3:        0x103ce23c0 - std::panicking::default_hook::ha2186ee24b50729c
   4:        0x103ce2bd7 - std::panicking::rust_panic_with_hook::h979db19ee91d2a53
   5:        0x103ce2a84 - std::panicking::begin_panic::h6a69f5b54391c64d
   6:        0x103ce29a2 - std::panicking::begin_panic_fmt::h9de2343580b3c2c4
   7:        0x103ce2907 - rust_begin_unwind
   8:        0x103d099c0 - core::panicking::panic_fmt::haa2997386017a96f
   9:        0x103bbab9e - core::result::unwrap_failed::ha4276462ba934998
  10:        0x103ba93eb - <core::result::Result<T, E>>::unwrap::hfef6b6a65c687181
  11:        0x103c019be - device_check::create_queues::ha24c9c810beaf220
  12:        0x103c028d7 - device_check::check::hf01c3b6062324d22
  13:        0x103c0e26a - device_check::main::h5a898c1a49a1c9cc
  14:        0x103ce3a0a - __rust_maybe_catch_panic
  15:        0x103ce2fa6 - std::rt::lang_start::hfc9882558f9403bf
  16:        0x103c0f859 - main

Output of info_core.rs:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: [NONE]', /Users/rustbuild/src/rust-buildbot/slave/stable-dist-rustc-mac/build/src/libcore/result.rs:868
stack backtrace:
   1:        0x100bdfccc - std::sys::imp::backtrace::tracing::imp::write::h21ca2762819c7ae8
   2:        0x100be157e - std::panicking::default_hook::{{closure}}::h38f99a37d00bb19b
   3:        0x100be1220 - std::panicking::default_hook::ha2186ee24b50729c
   4:        0x100be19d7 - std::panicking::rust_panic_with_hook::h979db19ee91d2a53
   5:        0x100be1884 - std::panicking::begin_panic::h6a69f5b54391c64d
   6:        0x100be17a2 - std::panicking::begin_panic_fmt::h9de2343580b3c2c4
   7:        0x100be1707 - rust_begin_unwind
   8:        0x100c04540 - core::panicking::panic_fmt::haa2997386017a96f
   9:        0x100ad278e - core::result::unwrap_failed::h12ff1147fb33b283
  10:        0x100ac6be4 - <core::result::Result<T, E>>::unwrap::h00f4c7e69811a1f0
  11:        0x100af7af1 - info_core::print_platform_device::hf24eff3aed640303
  12:        0x100af75c6 - info_core::print_platform::h5260af10885e9dec
  13:        0x100af7443 - info_core::main::h5e67230b07b13f62
  14:        0x100be280a - __rust_maybe_catch_panic
  15:        0x100be1da6 - std::rt::lang_start::hfc9882558f9403bf
  16:        0x100b01589 - main

Let me know if I can test anything for you! Debugging this without a access to a Mac is probably kinda challenging ;)

@c0gent
Copy link
Member

c0gent commented Apr 20, 2017

I see we also have a secondary error produced by device_check.rs.

I have what should be a fix to the initial issue up on the master branch. Please clone/pull this repo and try running trivial.rs again if you would.

If it works, we'll have a look at the command queue issue.

@michael-p
Copy link
Contributor Author

trivial.rs is working now, device_info.rs produces the same output as above, and info_core.rs has the following longish output with quite some error messages:

############### OpenCL Platform-Device Full Info ################

Platform:
    Profile: FULL_PROFILE
    Version: OpenCL 1.2 (Nov  1 2016 21:34:57)
    Name: Apple
    Vendor: Apple
    Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

Device[0]: 
    Type: DEVICE_TYPE_CPU
    VendorId: 4294967295
    MaxComputeUnits: 4
    MaxWorkItemDimensions: 3
    MaxWorkGroupSize: 1024
    MaxWorkItemSizes: [1024, 1, 1]
    PreferredVectorWidthChar: 16
    PreferredVectorWidthShort: 8
    PreferredVectorWidthInt: 4
    PreferredVectorWidthLong: 2
    PreferredVectorWidthFloat: 4
    PreferredVectorWidthDouble: 2
    MaxClockFrequency: 2700
    AddressBits: 64
    MaxReadImageArgs: 128
    MaxWriteImageArgs: 8
    MaxMemAllocSize: 2147483648
    Image2dMaxWidth: 8192
    Image2dMaxHeight: 8192
    Image3dMaxWidth: 2048
    Image3dMaxHeight: 2048
    Image3dMaxDepth: 2048
    ImageSupport: true
    MaxParameterSize: 4096
    MaxSamplers: 16
    MemBaseAddrAlign: 1024
    MinDataTypeAlignSize: 128
    SingleFpConfig: FP_DENORM | FP_INF_NAN | FP_ROUND_TO_NEAREST | FP_ROUND_TO_ZERO | FP_ROUND_TO_INF | FP_FMA | FP_CORRECTLY_ROUNDED_DIVIDE_SQRT
    GlobalMemCacheType: ReadWriteCache
    GlobalMemCachelineSize: 3145728
    GlobalMemCacheSize: 64
    GlobalMemSize: 8589934592
    MaxConstantBufferSize: 65536
    MaxConstantArgs: 8
    LocalMemType: Global
    LocalMemSize: 32768
    ErrorCorrectionSupport: false
    ProfilingTimerResolution: 1
    EndianLittle: true
    Available: true
    CompilerAvailable: true
    ExecutionCapabilities: EXEC_KERNEL | EXEC_NATIVE_KERNEL
    QueueProperties: QUEUE_PROFILING_ENABLE
    Name: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
    Vendor: Intel
    DriverVersion: 1.1
    Profile: FULL_PROFILE
    Version: OpenCL 1.2 
    Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority
    Platform: PlatformId(0x7fff0000)
    DoubleFpConfig: FP_DENORM | FP_INF_NAN | FP_ROUND_TO_NEAREST | FP_ROUND_TO_ZERO | FP_ROUND_TO_INF | FP_FMA
    HalfFpConfig: [UNAVAILABLE (CL_INVALID_VALUE)]
    PreferredVectorWidthHalf: 0
    HostUnifiedMemory: true
    NativeVectorWidthChar: 16
    NativeVectorWidthShort: 8
    NativeVectorWidthInt: 4
    NativeVectorWidthLong: 2
    NativeVectorWidthFloat: 4
    NativeVectorWidthDouble: 2
    NativeVectorWidthHalf: 0
    OpenclCVersion: OpenCL C 1.2 
    LinkerAvailable: true
    BuiltInKernels: 
    ImageMaxBufferSize: 65536
    ImageMaxArraySize: 2048
    ParentDevice: None
    PartitionMaxSubDevices: 0
    PartitionProperties: []
    PartitionAffinityDomain: 
    PartitionType: []
    ReferenceCount: 1
    PreferredInteropUserSync: true
    PrintfBufferSize: 1048576
    ImagePitchAlignment: 1
    ImageBaseAddressAlignment: 1

Context:
    Reference Count: 8
    Devices: [DeviceId(0xffffffff)]
    Properties: ContextProperties({Platform: Platform(PlatformId(0x7fff0000))})
    Device Count: 1

Command Queue:
    Context: Context(0x7f8e73e03a10)
    Device: DeviceId(0xffffffff)
    ReferenceCount: 6
    Properties: QUEUE_PROFILING_ENABLE

Buffer Memory:
    Type: Buffer
    Flags: MEM_READ_WRITE
    Size: 4194304
    HostPtr: no mem info available
    MapCount: 0
    ReferenceCount: 2
    Context: Context(0x7f8e73e03a10)
    AssociatedMemobject: None
    Offset: 0

Image: 
    ElementSize: 4
    RowPitch: 4096
    SlicePitch: 0
    Width: 1024
    Height: 0
    Depth: 0
    ArraySize: 0
    Buffer: None
    NumMipLevels: 0
    NumSamples: 0

    Image Memory:
        Type: Buffer
        Flags: MEM_READ_WRITE
        Size: 4194304
        HostPtr: no mem info available
        MapCount: 0
        ReferenceCount: 2
        Context: Context(0x7f8e73e03a10)
        AssociatedMemobject: None
        Offset: 0

Sampler:
    ReferenceCount: 1
    Context: Context(0x7f8e73e03a10)
    NormalizedCoords: false
    AddressingMode: None
    FilterMode: Nearest

Program:
    ReferenceCount: 2
    Context: Context(0x7f8e73e03a10)
    NumDevices: 1
    Devices: [DeviceId(0xffffffff)]
    Source: 


    __kernel void multiply(float coeff, __global float* buffer) {
        buffer[get_global_id(0)] *= coeff;
    }

    BinarySizes: [4396]
    Binaries: n/a
    NumKernels: 1
    KernelNames: multiply

Program Build:
    BuildStatus: Success
    BuildOptions: 
    BuildLog: 



    BinaryType: PROGRAM_BINARY_TYPE_EXECUTABLE

Kernel Info:
    FunctionName: multiply
    NumArgs: 2
    ReferenceCount: 1
    Context: Context(0x7f8e73e03a10)
    Program: Program(0x7f8e73e04cd0)
    Attributes: 

Kernel Argument [0]:
    AddressQualifier: Private
    AccessQualifier: None
    TypeName: no kernel argument info available
    TypeQualifier: 
    Name: no kernel argument info available

Kernel Work Group:
    WorkGroupSize: 128
    CompileWorkGroupSize: [0, 0, 0]
    LocalMemSize: 0
    PreferredWorkGroupSizeMultiple: 1
    PrivateMemSize: 0
    GlobalWorkSize: only available for custom devices or built-in kernels

Event:
    CommandQueue: CommandQueue(0x7f8e73e04720)
    CommandType: WriteBuffer
    ReferenceCount: 1
    CommandExecutionStatus: Complete
    Context: Context(0x7f8e73e03a10)

Event Profiling:
    Queued: 

################################ OPENCL ERROR ############################### 

Error executing function: clGetEventProfilingInfo  

Status error code: CL_INVALID_VALUE (-30)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetEventProfilingInfo.html#errors  

############################################################################# 

    Submit: 

################################ OPENCL ERROR ############################### 

Error executing function: clGetEventProfilingInfo  

Status error code: CL_INVALID_VALUE (-30)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetEventProfilingInfo.html#errors  

############################################################################# 

    Start: 

################################ OPENCL ERROR ############################### 

Error executing function: clGetEventProfilingInfo  

Status error code: CL_INVALID_VALUE (-30)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetEventProfilingInfo.html#errors  

############################################################################# 

    End: 

################################ OPENCL ERROR ############################### 

Error executing function: clGetEventProfilingInfo  

Status error code: CL_INVALID_VALUE (-30)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetEventProfilingInfo.html#errors  

############################################################################# 



############### OpenCL Platform-Device Full Info ################

Platform:
    Profile: FULL_PROFILE
    Version: OpenCL 1.2 (Nov  1 2016 21:34:57)
    Name: Apple
    Vendor: Apple
    Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

Device[0]: 
    Type: DEVICE_TYPE_GPU
    VendorId: 16925952
    MaxComputeUnits: 48
    MaxWorkItemDimensions: 3
    MaxWorkGroupSize: 256
    MaxWorkItemSizes: [256, 256, 256]
    PreferredVectorWidthChar: 1
    PreferredVectorWidthShort: 1
    PreferredVectorWidthInt: 1
    PreferredVectorWidthLong: 1
    PreferredVectorWidthFloat: 1
    PreferredVectorWidthDouble: 0
    MaxClockFrequency: 1050
    AddressBits: 64
    MaxReadImageArgs: 128
    MaxWriteImageArgs: 8
    MaxMemAllocSize: 402653184
    Image2dMaxWidth: 16384
    Image2dMaxHeight: 16384
    Image3dMaxWidth: 2048
    Image3dMaxHeight: 2048
    Image3dMaxDepth: 2048
    ImageSupport: true
    MaxParameterSize: 1024
    MaxSamplers: 16
    MemBaseAddrAlign: 1024
    MinDataTypeAlignSize: 128
    SingleFpConfig: FP_INF_NAN | FP_ROUND_TO_NEAREST | FP_ROUND_TO_ZERO | FP_ROUND_TO_INF | FP_FMA | FP_CORRECTLY_ROUNDED_DIVIDE_SQRT
    GlobalMemCacheType: None
    GlobalMemCachelineSize: 0
    GlobalMemCacheSize: 0
    GlobalMemSize: 1610612736
    MaxConstantBufferSize: 65536
    MaxConstantArgs: 8
    LocalMemType: Local
    LocalMemSize: 65536
    ErrorCorrectionSupport: false
    ProfilingTimerResolution: 80
    EndianLittle: true
    Available: true
    CompilerAvailable: true
    ExecutionCapabilities: EXEC_KERNEL
    QueueProperties: QUEUE_PROFILING_ENABLE
    Name: Intel(R) Iris(TM) Graphics 6100
    Vendor: Intel Inc.
    DriverVersion: 1.2(Feb 17 2017 12:40:05)
    Profile: FULL_PROFILE
    Version: OpenCL 1.2 
    Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images cl_khr_3d_image_writes 
    Platform: PlatformId(0x7fff0000)
    DoubleFpConfig: 
    HalfFpConfig: 

################################ OPENCL ERROR ############################### 

Error executing function: clGetDeviceInfo  

Status error code: CL_INVALID_OPERATION (-59)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetDeviceInfo.html#errors  

############################################################################# 

    PreferredVectorWidthHalf: 0
    HostUnifiedMemory: true
    NativeVectorWidthChar: 1
    NativeVectorWidthShort: 1
    NativeVectorWidthInt: 1
    NativeVectorWidthLong: 1
    NativeVectorWidthFloat: 1
    NativeVectorWidthDouble: 0
    NativeVectorWidthHalf: 0
    OpenclCVersion: OpenCL C 1.2 
    LinkerAvailable: true
    BuiltInKernels: 
    ImageMaxBufferSize: 25165824
    ImageMaxArraySize: 2048
    ParentDevice: None
    PartitionMaxSubDevices: 0
    PartitionProperties: []
    PartitionAffinityDomain: 
    PartitionType: []
    ReferenceCount: 1
    PreferredInteropUserSync: true
    PrintfBufferSize: 1048576
    ImagePitchAlignment: 32
    ImageBaseAddressAlignment: 4

Context:
    Reference Count: 8
    Devices: [DeviceId(0x1024500)]
    Properties: ContextProperties({Platform: Platform(PlatformId(0x7fff0000))})
    Device Count: 1

Command Queue:
    Context: Context(0x7f8e73e04720)
    Device: DeviceId(0x1024500)
    ReferenceCount: 6
    Properties: QUEUE_PROFILING_ENABLE

Buffer Memory:
    Type: Buffer
    Flags: MEM_READ_WRITE
    Size: 4194304
    HostPtr: no mem info available
    MapCount: 0
    ReferenceCount: 2
    Context: Context(0x7f8e73e04720)
    AssociatedMemobject: None
    Offset: 0

Image: 
    ElementSize: 4
    RowPitch: 4194304
    SlicePitch: 0
    Width: 1024
    Height: 0
    Depth: 0
    ArraySize: 0
    Buffer: None
    NumMipLevels: 0
    NumSamples: 0

    Image Memory:
        Type: Buffer
        Flags: MEM_READ_WRITE
        Size: 4194304
        HostPtr: no mem info available
        MapCount: 0
        ReferenceCount: 2
        Context: Context(0x7f8e73e04720)
        AssociatedMemobject: None
        Offset: 0

Sampler:
    ReferenceCount: 1
    Context: Context(0x7f8e73e04720)
    NormalizedCoords: false
    AddressingMode: None
    FilterMode: Nearest

Program:
    ReferenceCount: 2
    Context: Context(0x7f8e73e04720)
    NumDevices: 1
    Devices: [DeviceId(0x1024500)]
    Source: 


    __kernel void multiply(float coeff, __global float* buffer) {
        buffer[get_global_id(0)] *= coeff;
    }

    BinarySizes: [1991]
    Binaries: n/a
    NumKernels: 1
    KernelNames: multiply

Program Build:
    BuildStatus: Success
    BuildOptions: 
    BuildLog: 



    BinaryType: PROGRAM_BINARY_TYPE_EXECUTABLE

Kernel Info:
    FunctionName: multiply
    NumArgs: 2
    ReferenceCount: 1
    Context: Context(0x7f8e73e04720)
    Program: Program(0x7f8e73c04c80)
    Attributes: 

Kernel Argument [0]:
    AddressQualifier: Private
    AccessQualifier: None
    TypeName: no kernel argument info available
    TypeQualifier: 
    Name: no kernel argument info available

Kernel Work Group:
    WorkGroupSize: 256
    CompileWorkGroupSize: [0, 0, 0]
    LocalMemSize: 0
    PreferredWorkGroupSizeMultiple: 32
    PrivateMemSize: 0
    GlobalWorkSize: only available for custom devices or built-in kernels

Event:
    CommandQueue: CommandQueue(0x7f8e73e04c80)
    CommandType: WriteBuffer
    ReferenceCount: 1
    CommandExecutionStatus: Complete
    Context: Context(0x7f8e73e04720)

Event Profiling:
    Queued: 

################################ OPENCL ERROR ############################### 

Error executing function: clGetEventProfilingInfo  

Status error code: CL_INVALID_VALUE (-30)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetEventProfilingInfo.html#errors  

############################################################################# 

    Submit: 

################################ OPENCL ERROR ############################### 

Error executing function: clGetEventProfilingInfo  

Status error code: CL_INVALID_VALUE (-30)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetEventProfilingInfo.html#errors  

############################################################################# 

    Start: 

################################ OPENCL ERROR ############################### 

Error executing function: clGetEventProfilingInfo  

Status error code: CL_INVALID_VALUE (-30)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetEventProfilingInfo.html#errors  

############################################################################# 

    End: 

################################ OPENCL ERROR ############################### 

Error executing function: clGetEventProfilingInfo  

Status error code: CL_INVALID_VALUE (-30)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetEventProfilingInfo.html#errors  

############################################################################# 

@c0gent
Copy link
Member

c0gent commented Apr 20, 2017

Ok great.

Those errors in the info output are just features that your platform doesn't support and is working as intended. I may add an extra check and squelch/condense that error message.

As to the secondary issue, the device_check.rs example creates command queues with the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE flag which allows for out-of-order queues (which can provide performance improvements). Apparently the OpenCL implementation you are using (Apple/Intel) does not support that feature. I'll update that example (and the other new async_*.rs examples) to accommodate platforms which don't support that. Let's leave this issue open until I have time to do that (probably won't be today).

I appreciate your help testing this :)

@michael-p
Copy link
Contributor Author

That's good to hear! :) There is also one test tests::async::rw_vec which seems to be using that feature and hence fails on Mac OS.

Thank you very much for your work and your super-fast response! :)

@c0gent
Copy link
Member

c0gent commented Apr 21, 2017

f992ab3 should fix issues with queue creation.

When you have a chance, please verify that:

  • tests::async::rw_vec passes
  • all 3 async_*.rs examples work
  • the device_check.rs fails with an appropriate error message (let me know if you think the presentation of this error message is clear enough... I can't actually test it myself)

cogciprocate/ocl-core@87167ca should make output from the info_core.rs example a bit less scary looking. Let me know what you think.

No hurry on any of this. Thanks a lot for your help.

@michael-p
Copy link
Contributor Author

I tried it using Master, here are the results:

  • tests::async::rw_vec seems to hang, I cancelled it after something like 5 minutes. This not only happens on OSX but also on a Linux System I tested it on (kernel 4.4.41-yocto-standard; Intel HD Graphics with Intel drivers+runtime). On the latter platform device_check.rs also hangs with the following output:
Platform: Intel(R) OpenCL
Device: Intel(R) Corporation Intel(R) HD Graphics
  • async_cycles.rs and async_process.rs work, async_menagerie.rs fails:
Platform: Apple
Device: Intel Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: 

################################ OPENCL ERROR ############################### 

Error executing function: clCreateCommandQueue  

Status error code: CL_INVALID_VALUE (-30)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clCreateCommandQueue.html#errors  

############################################################################# 
', /Users/rustbuild/src/rust-buildbot/slave/stable-dist-rustc-mac/build/src/libcore/result.rs:868
stack backtrace:
   1:        0x100b0931c - std::sys::imp::backtrace::tracing::imp::write::h21ca2762819c7ae8
   2:        0x100b0b21e - std::panicking::default_hook::{{closure}}::h38f99a37d00bb19b
   3:        0x100b0aec0 - std::panicking::default_hook::ha2186ee24b50729c
   4:        0x100b0b6d7 - std::panicking::rust_panic_with_hook::h979db19ee91d2a53
   5:        0x100b0b584 - std::panicking::begin_panic::h6a69f5b54391c64d
   6:        0x100b0b4a2 - std::panicking::begin_panic_fmt::h9de2343580b3c2c4
   7:        0x100b0b407 - rust_begin_unwind
   8:        0x100b32500 - core::panicking::panic_fmt::haa2997386017a96f
   9:        0x1009a8ebe - core::result::unwrap_failed::hfe876ad4a633cc55
  10:        0x10098c49b - <core::result::Result<T, E>>::unwrap::h12877bbcfd7df23f
  11:        0x1009f5102 - async_menagerie::main::h2a6e245f96ade4ea
  12:        0x100b0c56a - __rust_maybe_catch_panic
  13:        0x100b0baa6 - std::rt::lang_start::hfc9882558f9403bf
  14:        0x1009f6799 - main
  • device_check.rs hangs after some time, output until then is:
Platform: Apple
Device: Intel Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
    Device does not support out of order queues.
    Out-of-order MW/Async-CB:       <failure>
    Device does not support out of order queues.
    Out-of-order MW/Async-CB+AHP:   <failure>
    Device does not support out of order queues.
    Out-of-order MW/ASync+CB/MR:    <failure>
    In-order MW/ASync+CB:           <success>
    Device does not support out of order queues.
    Out-of-order MW/ELOOP:          <failure>
    Device does not support out of order queues.
    Out-of-order MW/ELOOP+CB:       <failure>
    In-order RwVec Multi-thread:    <success>
Platform: Apple
Device: Intel Inc. Intel(R) Iris(TM) Graphics 6100
    Device does not support out of order queues.
    Out-of-order MW/Async-CB:       <failure>
    Device does not support out of order queues.
    Out-of-order MW/Async-CB+AHP:   <failure>
    Device does not support out of order queues.
    Out-of-order MW/ASync+CB/MR:    <failure>
  • info_core.rs works but reports that event profiling is unavailable (which I'm not sure this is really unsupported on OSX, at least PyOpenCL does support it but I'm not sure whether they are doing some kind of magic behind the scenes to make it work...)
  • img_formats.rs fails:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error converting to 'ImageChannelDataType'.', /Users/rustbuild/src/rust-buildbot/slave/stable-dist-rustc-mac/build/src/libcore/result.rs:868
stack backtrace:
   1:        0x10162912c - std::sys::imp::backtrace::tracing::imp::write::h21ca2762819c7ae8
   2:        0x10162a8de - std::panicking::default_hook::{{closure}}::h38f99a37d00bb19b
   3:        0x10162a580 - std::panicking::default_hook::ha2186ee24b50729c
   4:        0x10162acd7 - std::panicking::rust_panic_with_hook::h979db19ee91d2a53
   5:        0x10162ab84 - std::panicking::begin_panic::h6a69f5b54391c64d
   6:        0x10162aaa2 - std::panicking::begin_panic_fmt::h9de2343580b3c2c4
   7:        0x10162aa07 - rust_begin_unwind
   8:        0x10164d5e0 - core::panicking::panic_fmt::haa2997386017a96f
   9:        0x1015d7a2e - core::result::unwrap_failed::hb87977514c463e1a
  10:        0x1015d5d82 - <core::result::Result<T, E>>::unwrap::h02e1b0985c36c175
  11:        0x1015dbc8f - img_formats::main::h5d4ed350b81fabe2
  12:        0x10162bb0a - __rust_maybe_catch_panic
  13:        0x10162b0a6 - std::rt::lang_start::hfc9882558f9403bf
  14:        0x1015dbdf9 - main

I apologize for causing so much trouble ;)

@michael-p
Copy link
Contributor Author

Let me add that none of the above failures are critical for me personally and I know that debugging this is next to impossible for you, so feel free to close this issue as wontfix.

The only thing that would be nice to have are the profiling events, I'll check tomorrow with a minimal C example whether they are supported on OSX and if yes, I'll open a new issue here with more details.

@c0gent
Copy link
Member

c0gent commented Apr 23, 2017

No trouble at all. I appreciate all of this useful feedback. I want things to work 100% on all platforms (or at least fail for clear reasons).

Some of the hangs and errors related to the new asynchronous features I've added in 0.13 are somewhat expected. I even get them when running on older Intel processors (Sandy Bridge) due to elements of the processor design which aren't totally compatible with OpenCL (and is why OpenCL isn't officially supported on that family). By running the async tests and examples with the --features async_block flag (i.e. cargo test --features async_block) we should at least alleviate the hanging (if it doesn't - let's fix that). async_block is a sort of compatibility mode for platforms that have trouble of one sort or another when doing things fully asynchronously.

I'm confused as to why we're still getting clCreateCommandQueue errors. I changed queue creation to fallback to creating a default, in-order queue in the event of error. i.e.:

let common_queue = Queue::new(&context, device, queue_flags).or_else(|_|
        Queue::new(&context, device, None)).unwrap();

Is Queue::new(&context, device, None) somehow also causing an error? If so I'm at a loss as to why and that needs looking into. I must have made a mistake somewhere I'm not taking into account. Perhaps you could fiddle around with those queue-creation lines within the examples (for example async_cycles.rs::440) and maybe shed some light on this.

I'll respond to the profiling issues in the new thread :)

@michael-p
Copy link
Contributor Author

I think we're finally about to get to the bottom of this :) Tests run fine when using --features async_block, the failing async-example async_menagerie.rs is just missing your queue creation fallback on line 627.

So what's left:

  • device_check.rs still hangs, even when run with cargo run --features async_block --example device_check
  • the problems with img_formats.rs. Maybe open a new issue for this one?

@c0gent
Copy link
Member

c0gent commented Apr 23, 2017

Ok great. I'm changing the async::rw_vec test and device_check.rs so that they fail instead of using the queue fallback because they are only designed to work with out-of-order queues and will deadlock otherwise. Most of the other ones should work fine either way (hopefully you can confirm that before we close this issue -- I'll make changes to async_menagerie.rs shortly).

For now, please try out the async::rw_vec test and the device_check.rs example both with and without --features async_block. They should fail with a reasonably clear failure messages in both cases.

I'll fix the other example and let's confirm that they're all working / failing properly.

Please create another new issue for img_formats.rs. I'm not at all sure what's going on there.

@c0gent
Copy link
Member

c0gent commented Apr 23, 2017

The async_cycles.rs, async_menagerie.rs, and async_process.rs examples should all now be using queue fallbacks and should definitely work with async_block and might work without it (will deadlock if not). If they any of them do deadlock I will remove the queue fallback and allow them to fail when out-of-order queues are unavailable.

Let me know :)

@michael-p
Copy link
Contributor Author

So I just tested this with current master, async::rw_vec test still deadlocks when run without --features async_block, works fine with that feature.

device_check.rs hangs (even when using --features async_block), I added a println!("Creating queue with flags {:?}", flags); to the beginning of create_queue(...) to see whether an out-of-order queue is actually used just before it deadlocks and it seems this is not the case:

Platform: Apple
Device: Intel Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
Creating queue with flags Some(QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | QUEUE_PROFILING_ENABLE)
    Device does not support out of order queues.
    Out-of-order MW/Async-CB:       <failure>
Creating queue with flags Some(QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | QUEUE_PROFILING_ENABLE)
    Device does not support out of order queues.
    Out-of-order MW/Async-CB+AHP:   <failure>
Creating queue with flags Some(QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | QUEUE_PROFILING_ENABLE)
    Device does not support out of order queues.
    Out-of-order MW/ASync+CB/MR:    <failure>
Creating queue with flags Some(QUEUE_PROFILING_ENABLE)
Creating queue with flags Some(QUEUE_PROFILING_ENABLE)
Creating queue with flags Some(QUEUE_PROFILING_ENABLE)
    In-order MW/ASync+CB:           <success>
Creating queue with flags Some(QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | QUEUE_PROFILING_ENABLE)
    Device does not support out of order queues.
    Out-of-order MW/ELOOP:          <failure>
Creating queue with flags Some(QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | QUEUE_PROFILING_ENABLE)
    Device does not support out of order queues.
    Out-of-order MW/ELOOP+CB:       <failure>
Creating queue with flags Some(QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE)
    Device does not support out of order queues.
    In-order RwVec Multi-thread:    <failure>
Platform: Apple
Device: Intel Inc. Intel(R) Iris(TM) Graphics 6100
Creating queue with flags Some(QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | QUEUE_PROFILING_ENABLE)
    Device does not support out of order queues.
    Out-of-order MW/Async-CB:       <failure>
Creating queue with flags Some(QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | QUEUE_PROFILING_ENABLE)
    Device does not support out of order queues.
    Out-of-order MW/Async-CB+AHP:   <failure>
Creating queue with flags Some(QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | QUEUE_PROFILING_ENABLE)
    Device does not support out of order queues.
    Out-of-order MW/ASync+CB/MR:    <failure>
Creating queue with flags Some(QUEUE_PROFILING_ENABLE)
Creating queue with flags Some(QUEUE_PROFILING_ENABLE)
Creating queue with flags Some(QUEUE_PROFILING_ENABLE)

async_cycles.rs and async_process.rs work, even when run without the async_block feature.

async_menagerie.rs works WITHOUT async_block but deadlocks when run with that feature:

Platform: Apple
Device: Intel Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
Creating and enqueuing tasks...
Task [1] (simple): Buffer initialized.
Task [2] (complex): Buffer initialized.
Buffer pool is now full.
Waiting on 18 tasks to complete...
Task [0] (simple): Buffer initialized.
Task [1] (simple): Verify successful: 42927 values correct.
Task [3] (simple): Buffer initialized.
Task [0] (simple): Verify successful: 414836 values correct.
Task [2] (complex): Verify successful: 312748 values correct.
Task [3] (simple): Verify successful: 418757 values correct.

(output varies between runs, sometimes it completes some more tasks, sometimes less).

@c0gent
Copy link
Member

c0gent commented Apr 25, 2017

Ok interesting... I'll make some more tweaks here in the next day or so.

@c0gent
Copy link
Member

c0gent commented Apr 28, 2017

d239b1a: The async::rw_vec test should correctly error.

device_check appears to deadlock on the in-order test when run on the integrated GPU but runs fine on the CPU. I have no idea what's going on here and I'm inclined to essentially consider a deadlock a test failure. I may one day add code to detect a deadlock so that it can fail and continue but it won't be anytime soon so we'll consider this one working as intended for now.

That leaves us with async_menagerie.rs... If you would, please try running the following commands and let me know what happens/output on each:

OCL_DEFAULT_DEVICE_TYPE=CPU cargo run --example async_menagerie --features async_block

and

OCL_DEFAULT_DEVICE_TYPE=GPU cargo run --example async_menagerie --features async_block

If the CPU device runs it fine but the GPU one deadlocks just like device_check, we at least know it's isolated to that device.

@michael-p
Copy link
Contributor Author

I just checked this on current master, both the CPU and GPU version of async_menagerie with async_block feature still deadlock.

Tests do not compile:

   Compiling ocl-extras v0.1.0 (file:///Users/michaelp/Downloads/ocl-master/ocl-extras)
   Compiling ocl v0.13.1 (file:///Users/michaelp/Downloads/ocl-master)
error: expected one of `!`, `.`, `::`, `;`, `?`, `{`, `}`, or an operator, found `,`
   --> src/tests/async.rs:405:21
    |
405 |                 None,
    |                     ^

error[E0308]: if and else have incompatible types
   --> src/tests/async.rs:404:31
    |
404 |               let queue_flags = if cfg!(feature = "async_block") {
    |  _______________________________^ starting here...
405 | |                 None,
406 | |             } else {
407 | |                 Some(CommandQueueProperties::new().out_of_order());
408 | |             };
    | |_____________^ ...ending here: expected enum `std::option::Option`, found ()
    |
    = note: expected type `std::option::Option<_>`
               found type `()`

After fixing that error (just a comma and semicolon too much) tests run fine with --features=async_block and without that feature the async::rw_vec fails but with an appropriate message, so all is good here!

@c0gent
Copy link
Member

c0gent commented May 4, 2017

Ok so if I'm not missing anything only one last issue to go. Let's move the discussion to #73.

@c0gent c0gent closed this as completed May 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants