Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Numerical Errors in Mish Activation with FLOAT16 Precision on Neural Engine #2359

Open
ChinChangYang opened this issue Oct 5, 2024 · 2 comments
Labels
bug Unexpected behaviour that should be corrected (type)

Comments

@ChinChangYang
Copy link
Contributor

🐞Describing the bug

The built-in Mish activation function in coremltools introduces significant numerical errors in Core ML models when using 16-bit floating point precision (FLOAT16) on configurations with ComputeUnit=CPU_AND_NE. Specifically, converting models that utilize the Mish activation results in substantial discrepancies in output predictions compared to the original model, leading to high error rates across various metrics.

Stack Trace

N/A

To Reproduce

Follow the steps below to reproduce the high numerical errors using the built-in Mish activation function:

  1. Clone the KataGo repository:

    git clone --branch v1.15.3-coreml1 https://github.com/ChinChangYang/KataGo.git KataGo-v1.15.3-coreml1
    cd KataGo-v1.15.3-coreml1/python
  2. Download a KataGo model in RAW checkpoint format:

    wget https://media.katagotraining.org/uploaded/networks/zips/kata1/kata1-b18c384nbt-s9996604416-d4316597426.zip
    unzip kata1-b18c384nbt-s9996604416-d4316597426.zip
    ln -s kata1-b18c384nbt-s9996604416-d4316597426/model.ckpt model.ckpt
  3. Install Python Modules:

    pip install torch coremltools matplotlib
  4. Evaluate the high error using the built-in Mish implementation:

    wget https://gist.githubusercontent.com/ChinChangYang/529ccdffb90b60d307550b067f2fbab8/raw/abc3050cfad77e1ec87c92f61bd4b8c1b4f6cc28/testcoremlerror_original.py
    python testcoremlerror_original.py

    Expected Output:

    Mean Absolute Errors Across Samples:
      var_2572:
        FLOAT16: 1.042287
        FLOAT32: 0.000095
      linear_9:
        FLOAT16: 3.587491
        FLOAT32: 0.000245
      linear_10:
        FLOAT16: 2.812497
        FLOAT32: 0.000182
      linear_11:
        FLOAT16: 2.498940
        FLOAT32: 0.000269
      var_2631:
        FLOAT16: 0.079012
        FLOAT32: 0.000011
    
  5. Evaluate the lower error using the alternative Mish implementation:

    wget https://gist.githubusercontent.com/ChinChangYang/b9d45f13a40ff738baa607a265a0b2c3/raw/8bf3ae8e66946451be7dbd0d6debdae9d8e82fcf/testcoremlerror_workaround.py
    python testcoremlerror_workaround.py

    Expected Output:

    Mean Absolute Errors Across Samples:
      var_2572:
        FLOAT16: 0.008898
        FLOAT32: 0.000395
      linear_9:
        FLOAT16: 0.018509
        FLOAT32: 0.000812
      linear_10:
        FLOAT16: 0.014011
        FLOAT32: 0.000628
      linear_11:
        FLOAT16: 0.016918
        FLOAT32: 0.000859
      var_2631:
        FLOAT16: 0.001414
        FLOAT32: 0.000036
    

System environment (please complete the following information):

  • coremltools version: 8.0
  • OS: MacOS 15.0
  • Any other relevant version information:
    • PyTorch: 2.4.1

Additional context

The issue arises specifically when using ComputeUnit=CPU_AND_NE with Precision=FLOAT16. The built-in Mish activation function in coremltools leads to high numerical errors, as evidenced by metrics such as winrateError, leadError, and others showing discrepancies upwards of 25%. Switching to an alternative Mish implementation drastically reduces these errors to below 1%, albeit with a 32% increase in inference time due to the additional operators introduced.

This problem is isolated to 16-bit floating point precision on the Neural Engine (NE), as experiments with other compute units and precision settings (e.g., FLOAT32) do not exhibit the same high error rates. The significant reduction in error using the alternative Mish implementation suggests that the built-in Mish operator may have implementation issues when used in this specific configuration.

This issue was generated based on a detailed analysis of numerical errors in Core ML models using the Mish activation function with 16-bit precision, as documented in the related blog post. Further investigation and collaboration from the coremltools engineering team would be greatly appreciated to resolve this matter.

@ChinChangYang ChinChangYang added the bug Unexpected behaviour that should be corrected (type) label Oct 5, 2024
@ChinChangYang
Copy link
Contributor Author

I write the alternative Mish implementation here:

def mish_torch_sigmoid(context, node):
    inputs = _get_inputs(context, node, expected=1)
    x = inputs[0]

    threshold = 10.39

    # Approximating conditional behavior using sigmoid function
    sigmoid_threshold = mb.sigmoid(x=mb.sub(x=x, y=threshold))
    
    # Approximate implementation of Softplus
    softplus_part = mb.softplus(x=mb.minimum(x=x, y=threshold))
    softplus = mb.add(x=mb.mul(x=x, y=sigmoid_threshold), 
                      y=mb.mul(x=softplus_part, y=mb.sub(x=1.0, y=sigmoid_threshold)))

    # Mish(x) = x * tanh(Softplus(x))
    tanh_softplus = mb.tanh(x=softplus)
    res = mb.mul(x=x, y=tanh_softplus, name=node.name)
    context.add(res)

@TobyRoseman
Copy link
Collaborator

For security reasons, I am not able to download and run your network. Please create a minimal example to demonstrate the issue. Ideally some small amount of self contained code that I can just copy and paste.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unexpected behaviour that should be corrected (type)
Projects
None yet
Development

No branches or pull requests

2 participants