High Numerical Errors in Mish Activation with FLOAT16 Precision on Neural Engine #2359

ChinChangYang · 2024-10-05T12:01:21Z

🐞Describing the bug

The built-in Mish activation function in coremltools introduces significant numerical errors in Core ML models when using 16-bit floating point precision (FLOAT16) on configurations with ComputeUnit=CPU_AND_NE. Specifically, converting models that utilize the Mish activation results in substantial discrepancies in output predictions compared to the original model, leading to high error rates across various metrics.

Stack Trace

N/A

To Reproduce

Follow the steps below to reproduce the high numerical errors using the built-in Mish activation function:

Clone the KataGo repository:

git clone --branch v1.15.3-coreml1 https://github.com/ChinChangYang/KataGo.git KataGo-v1.15.3-coreml1
cd KataGo-v1.15.3-coreml1/python

Download a KataGo model in RAW checkpoint format:

wget https://media.katagotraining.org/uploaded/networks/zips/kata1/kata1-b18c384nbt-s9996604416-d4316597426.zip
unzip kata1-b18c384nbt-s9996604416-d4316597426.zip
ln -s kata1-b18c384nbt-s9996604416-d4316597426/model.ckpt model.ckpt

Install Python Modules:

pip install torch coremltools matplotlib

Evaluate the high error using the built-in Mish implementation:

wget https://gist.githubusercontent.com/ChinChangYang/529ccdffb90b60d307550b067f2fbab8/raw/abc3050cfad77e1ec87c92f61bd4b8c1b4f6cc28/testcoremlerror_original.py
python testcoremlerror_original.py

Expected Output:

Mean Absolute Errors Across Samples:
  var_2572:
    FLOAT16: 1.042287
    FLOAT32: 0.000095
  linear_9:
    FLOAT16: 3.587491
    FLOAT32: 0.000245
  linear_10:
    FLOAT16: 2.812497
    FLOAT32: 0.000182
  linear_11:
    FLOAT16: 2.498940
    FLOAT32: 0.000269
  var_2631:
    FLOAT16: 0.079012
    FLOAT32: 0.000011

Evaluate the lower error using the alternative Mish implementation:

wget https://gist.githubusercontent.com/ChinChangYang/b9d45f13a40ff738baa607a265a0b2c3/raw/8bf3ae8e66946451be7dbd0d6debdae9d8e82fcf/testcoremlerror_workaround.py
python testcoremlerror_workaround.py

Expected Output:

Mean Absolute Errors Across Samples:
  var_2572:
    FLOAT16: 0.008898
    FLOAT32: 0.000395
  linear_9:
    FLOAT16: 0.018509
    FLOAT32: 0.000812
  linear_10:
    FLOAT16: 0.014011
    FLOAT32: 0.000628
  linear_11:
    FLOAT16: 0.016918
    FLOAT32: 0.000859
  var_2631:
    FLOAT16: 0.001414
    FLOAT32: 0.000036

System environment (please complete the following information):

coremltools version: 8.0
OS: MacOS 15.0
Any other relevant version information:
- PyTorch: 2.4.1

Additional context

The issue arises specifically when using ComputeUnit=CPU_AND_NE with Precision=FLOAT16. The built-in Mish activation function in coremltools leads to high numerical errors, as evidenced by metrics such as winrateError, leadError, and others showing discrepancies upwards of 25%. Switching to an alternative Mish implementation drastically reduces these errors to below 1%, albeit with a 32% increase in inference time due to the additional operators introduced.

This problem is isolated to 16-bit floating point precision on the Neural Engine (NE), as experiments with other compute units and precision settings (e.g., FLOAT32) do not exhibit the same high error rates. The significant reduction in error using the alternative Mish implementation suggests that the built-in Mish operator may have implementation issues when used in this specific configuration.

This issue was generated based on a detailed analysis of numerical errors in Core ML models using the Mish activation function with 16-bit precision, as documented in the related blog post. Further investigation and collaboration from the coremltools engineering team would be greatly appreciated to resolve this matter.

The text was updated successfully, but these errors were encountered:

ChinChangYang · 2024-10-05T12:05:17Z

I write the alternative Mish implementation here:

def mish_torch_sigmoid(context, node):
    inputs = _get_inputs(context, node, expected=1)
    x = inputs[0]

    threshold = 10.39

    # Approximating conditional behavior using sigmoid function
    sigmoid_threshold = mb.sigmoid(x=mb.sub(x=x, y=threshold))
    
    # Approximate implementation of Softplus
    softplus_part = mb.softplus(x=mb.minimum(x=x, y=threshold))
    softplus = mb.add(x=mb.mul(x=x, y=sigmoid_threshold), 
                      y=mb.mul(x=softplus_part, y=mb.sub(x=1.0, y=sigmoid_threshold)))

    # Mish(x) = x * tanh(Softplus(x))
    tanh_softplus = mb.tanh(x=softplus)
    res = mb.mul(x=x, y=tanh_softplus, name=node.name)
    context.add(res)

TobyRoseman · 2024-10-07T22:31:15Z

For security reasons, I am not able to download and run your network. Please create a minimal example to demonstrate the issue. Ideally some small amount of self contained code that I can just copy and paste.

ChinChangYang added the bug Unexpected behaviour that should be corrected (type) label Oct 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Numerical Errors in Mish Activation with FLOAT16 Precision on Neural Engine #2359

High Numerical Errors in Mish Activation with FLOAT16 Precision on Neural Engine #2359

ChinChangYang commented Oct 5, 2024

ChinChangYang commented Oct 5, 2024

TobyRoseman commented Oct 7, 2024

High Numerical Errors in Mish Activation with FLOAT16 Precision on Neural Engine #2359

High Numerical Errors in Mish Activation with FLOAT16 Precision on Neural Engine #2359

Comments

ChinChangYang commented Oct 5, 2024

🐞Describing the bug

Stack Trace

To Reproduce

System environment (please complete the following information):

Additional context

ChinChangYang commented Oct 5, 2024

TobyRoseman commented Oct 7, 2024