Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidProgramException while trying coyote on my library. #433

Open
Kuinox opened this issue Nov 23, 2022 · 13 comments
Open

InvalidProgramException while trying coyote on my library. #433

Kuinox opened this issue Nov 23, 2022 · 13 comments
Labels
area-binary-rewriting Issues related to binary rewriting bug Something isn't working

Comments

@Kuinox
Copy link

Kuinox commented Nov 23, 2022

Hello,
I wanted to test out coyote to try to find bugs in my MQTT Library.

Sadly it looks like the IL rewrite has a problematic bug:

Unhandled exception. System.InvalidProgramException: Common Language Runtime detected an invalid program.
   at CK.MQTT.Client.Tests.Helpers.ApplicationMessageExtensions.PublishAsync(TestMQTTClient this, ApplicationMessage message)
   at CK.MQTT.Client.Tests.CoyoteTests.simple_publish_qos2_works_Async()
   at Microsoft.Coyote.Runtime.CoyoteRuntime.<>c__DisplayClass59_0.<RunTestAsync>b__0() in D:\a\1\s\Source\Core\Runtime\CoyoteRuntime.cs:line 373
   at Microsoft.Coyote.Runtime.CoyoteRuntime.<>c__DisplayClass62_0.<ScheduleOperation>b__0() in D:\a\1\s\Source\Core\Runtime\CoyoteRuntime.cs:line 467

The corrupted method do something a bit unusual:

        public static ValueTask<Task> PublishAsync( this TestMQTTClient @this, ApplicationMessage message )
            => @this.PublishAsync( message.Topic, message.QoS, message.Retain, message.Payload );

And the IL Diff:

"FullName": "CK.MQTT.Client.Tests.Helpers.ApplicationMessageExtensions",
"Methods": [
  {
    "Name": "PublishAsync",
    "FullName": "System.Threading.Tasks.ValueTask`1<System.Threading.Tasks.Task> CK.MQTT.Client.Tests.Helpers.ApplicationMessageExtensions::PublishAsync(CK.MQTT.Client.TestMQTTClient,CK.MQTT.Client.ApplicationMessage)",
    "Instructions": [
      "[-] IL_001e: ret",
      "[+] IL_001e: ldstr \"CK.MQTT.SmallOutgoingApplicationMessageExtensions.PublishAsync\"",
      "[+] IL_0023: call System.Threading.Tasks.ValueTask`1<TResult>& Microsoft.Coyote.Runtime.ExceptionProvider::ThrowIfReturnedValueTaskNotControlled<System.Threading.Tasks.Task>(System.Threading.Tasks.ValueTask`1<TResult>&,System.String)",
      "[+] IL_0028: ret"
    ]
  }
]

If you need to reproduce it, I made a branch on my repo:
https://github.com/signature-opensource/CK-MQTT/tree/testing-coyote

@pdeligia
Copy link
Member

Thanks for reporting this @Kuinox! And sorry for the delay in responding, I was away last week and just saw your issue. Give me a few days to get on this & investigate, it might be some bug due to what you are showing me being an extension method and the rewriter might not be handling that properly. (At least this is my top hunch from what I see here.)

@pdeligia
Copy link
Member

pdeligia commented Dec 2, 2022

Hi @Kuinox, I tried setup a similar mini test to see if I can repro this, but was not able to:

public static class Extension
{
  public static ValueTask<Task> PublishAsync(this Program @this, int value) => @this.PublishAsync(value);
}

public class Program
{
  public ValueTask<Task> PublishAsync(int value) => ValueTask.FromResult((Task)Task.FromResult(value));

  [Test]
  public static async Task Test()
  {
    var program = new Program();
    await program.PublishAsync(0);
  }
}

I run dotnet's ilverify in the end, which should catch these kind of bugs, and all passed:

dotnet ilverify Source\Example\bin\net6.0\Example.dll -r "C:\Program Files\dotnet\shared\Microsoft.NETCore.App\6.0.11\*.dll" -r "Source\Example\bin\net6.0\*.dll"
All Classes and Methods in D:\coyote-example\Source\Example\bin\net6.0\Example.dll Verified.

What does ilverify give you if you run it from your side on your rewritten DLL?

Are you able to send me a minimal snippet that exhibits this bug outside of your own repo? Perhaps similar to the one I wrote above, but that actually triggers the bug. Perhaps your code does something more intricate and its not just an extension method that looks like this. That will make it much easier for me to test it & debug without having to clone & build your repo.

Thanks!

@Kuinox
Copy link
Author

Kuinox commented Dec 2, 2022

What does ilverify give you if you run it from your side on your rewritten DLL?

[IL]: Error [StackUnexpected]: [C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6\CK.MQTT.Client.Tests.dll : CK.MQTT.Client.Tests.BytePerBytePipeReader::CompleteAsync([S.P.CoreLib]System.Exception)][offset 0x00000011][found value '[S.P.CoreLib]System.Threading.Tasks.ValueTask'][expected address of '[S.P.CoreLib]System.Threading.Tasks.ValueTask'] Unexpected type on the stack.
[IL]: Error [StackUnexpected]: [C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6\CK.MQTT.Client.Tests.dll : CK.MQTT.Client.Tests.BytePerBytePipeReader::CompleteAsync([S.P.CoreLib]System.Exception)][offset 0x00000016][found address of '[S.P.CoreLib]System.Threading.Tasks.ValueTask'][expected value '[S.P.CoreLib]System.Threading.Tasks.ValueTask'] Unexpected type on the stack.
[IL]: Error [StackUnexpected]: [C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6\CK.MQTT.Client.Tests.dll : CK.MQTT.Client.Tests.Helpers.PipeReaderCop::CompleteAsync([S.P.CoreLib]System.Exception)][offset 0x00000011][found value '[S.P.CoreLib]System.Threading.Tasks.ValueTask'][expected address of '[S.P.CoreLib]System.Threading.Tasks.ValueTask'] Unexpected type on the stack.
[IL]: Error [StackUnexpected]: [C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6\CK.MQTT.Client.Tests.dll : CK.MQTT.Client.Tests.Helpers.PipeReaderCop::CompleteAsync([S.P.CoreLib]System.Exception)][offset 0x00000016][found address of '[S.P.CoreLib]System.Threading.Tasks.ValueTask'][expected value '[S.P.CoreLib]System.Threading.Tasks.ValueTask'] Unexpected type on the stack.
[IL]: Error [StackUnexpected]: [C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6\CK.MQTT.Client.Tests.dll : CK.MQTT.Client.Tests.Helpers.ApplicationMessageExtensions::PublishAsync([CK.MQTT.Client.Tests]CK.MQTT.Client.TestMQTTClient, [CK.MQTT.Client]CK.MQTT.Client.ApplicationMessage)][offset 0x00000023][found value '[S.P.CoreLib]System.Threading.Tasks.ValueTask`1<System.Threading.Tasks.Task>'][expected address of '[S.P.CoreLib]System.Threading.Tasks.ValueTask`1<System.Threading.Tasks.Task>'] Unexpected type on the stack.
[IL]: Error [StackUnexpected]: [C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6\CK.MQTT.Client.Tests.dll : CK.MQTT.Client.Tests.Helpers.ApplicationMessageExtensions::PublishAsync([CK.MQTT.Client.Tests]CK.MQTT.Client.TestMQTTClient, [CK.MQTT.Client]CK.MQTT.Client.ApplicationMessage)][offset 0x00000028][found address of '[S.P.CoreLib]System.Threading.Tasks.ValueTask`1<System.Threading.Tasks.Task>'][expected value '[S.P.CoreLib]System.Threading.Tasks.ValueTask`1<System.Threading.Tasks.Task>'] Unexpected type on the stack.
6 Error(s) Verifying C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6\CK.MQTT.Client.Tests.dll

I will try to do a minimal reproduction.

@Kuinox
Copy link
Author

Kuinox commented Dec 2, 2022

Do you mind a minimal reproduction with one of my DLL as a dependency ?

@pdeligia
Copy link
Member

pdeligia commented Dec 2, 2022

That is fine too, as long as you can give me instructions how to build and run it :)

@Kuinox
Copy link
Author

Kuinox commented Dec 2, 2022

I made a repo here:
https://github.com/Kuinox/Coyote-Repro
There is a script Repo.ps1 that run dotnet publish call coyote to rewrite the dll, and run ilverify.

PS C:\dev\Coyote-Repro> .\Repro.ps1
MSBuild version 17.4.0+18d5aef85 for .NET
  Determining projects to restore...
  Restored C:\dev\Coyote-Repro\Coyote-Repro\Coyote-Repro.csproj (in 155 ms).
  Coyote-Repro -> C:\dev\Coyote-Repro\Coyote-Repro\bin\Debug\net6.0\Coyote-Repro.dll
  Coyote-Repro -> C:\dev\Coyote-Repro\Coyote-Repro\bin\Debug\net6.0\publish\
Microsoft (R) Coyote version 1.7.1.0 for .NET 6.0.11
Copyright (C) Microsoft Corporation. All rights reserved.

. Rewriting C:\dev\Coyote-Repro\Coyote-Repro\bin\Debug\net6.0\publish\Coyote-Repro.dll.
... Rewriting the 'Coyote-Repro.dll' assembly (Coyote-Repro, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null)
..... Writing the modified 'Coyote-Repro.dll' assembly to C:\dev\Coyote-Repro\Coyote-Repro\bin\Debug\net6.0\publish\Coyote-Repro.dll
... Elapsed 0.5188698 sec.
[IL]: Error [StackUnexpected]: [C:\dev\Coyote-Repro\Coyote-Repro\bin\Debug\net6.0\publish\Coyote-Repro.dll : Coyote_Repro.ApplicationMessageExtensions::PublishAsync([CK.MQTT.Client]CK.MQTT.Client.MQTTClientAgent, [CK.MQTT.Client]CK.MQTT.Client.ApplicationMessage)][offset 0x00000023][found value '[S.P.CoreLib]System.Threading.Tasks.ValueTask`1<System.Threading.Tasks.Task>'][expected address of '[S.P.CoreLib]System.Threading.Tasks.ValueTask`1<System.Threading.Tasks.Task>'] Unexpected type on the stack.
[IL]: Error [StackUnexpected]: [C:\dev\Coyote-Repro\Coyote-Repro\bin\Debug\net6.0\publish\Coyote-Repro.dll : Coyote_Repro.ApplicationMessageExtensions::PublishAsync([CK.MQTT.Client]CK.MQTT.Client.MQTTClientAgent, [CK.MQTT.Client]CK.MQTT.Client.ApplicationMessage)][offset 0x00000028][found address of '[S.P.CoreLib]System.Threading.Tasks.ValueTask`1<System.Threading.Tasks.Task>'][expected value '[S.P.CoreLib]System.Threading.Tasks.ValueTask`1<System.Threading.Tasks.Task>'] Unexpected type on the stack.
2 Error(s) Verifying C:\dev\Coyote-Repro\Coyote-Repro\bin\Debug\net6.0\publish\Coyote-Repro.dll

@pdeligia
Copy link
Member

Hi @Kuinox, your repro was very helpful! Sorry took me a bit to get on this as it is year end and I am in the middle of a lot of other work, but I managed to dive into this issue and figure out what is going on. It is some very weird rewriting corner case that is not handled properly by the tool right now, but I have a workaround.

Can you try changing:

public static ValueTask<Task> PublishAsync(this MQTTClientAgent @this, ApplicationMessage message)
  => @this.PublishAsync(message.Topic, message.QoS, message.Retain, message.Payload);

To simply use explicit brackets:

public static ValueTask<Task> PublishAsync(this MQTTClientAgent @this, ApplicationMessage message)
{
  return @this.PublishAsync(message.Topic, message.QoS, message.Retain, message.Payload);
}

This should do the trick (at least it does for me) as it interestingly generates different IL. The IL generated in the simpler => version causes as issue with the injected by Coyote Microsoft.Coyote.Runtime.ExceptionProvider::ThrowIfReturnedValueTaskNotControlled call, corrupting the IL stack, leading to that error you were getting. Can you please let me know if the workaround works for you?

I will try fix the issue ASAP in the meantime (but as a lower priority since there is a workaround, hopefully in the next couple of releases), so leaving this issue open to track it.

Also note that right now your coyote test will still not work properly even with this workaround in place. This has to do with the ThrowIfReturnedValueTaskNotControlled injected call. Coyote injects this call when the program is partially rewritten, and a method that returns a task comes from a DLL that has not been rewritten. This is explained in more detail here in the documentation, but basically Coyote can only control the concurrency in DLLs that are rewritten. It uses heuristics to deal with uncontrolled or partially-controlled concurrency, but it is really best effort and mostly geared for not having to rewrite "external" DLLs (i.e. DLLs that you do not own and assume they are working correctly). In your case, you most likely want to also rewrite the CK.* DLLs as they seem to be your core logic. You can rewrite multiple DLLs using a JSON file as explained here.

Hope this helps? Let me know if you need more help/clarifications!

@pdeligia pdeligia added bug Something isn't working area-binary-rewriting Issues related to binary rewriting labels Dec 12, 2022
@Kuinox
Copy link
Author

Kuinox commented Dec 13, 2022

Hi @Kuinox, your repro was very helpful! Sorry took me a bit to get on this as it is year end and I am in the middle of a lot of other work,

No worries, thanks for looking into this :D

This should do the trick (at least it does for me) as it interestingly generates different IL. The IL generated in the simpler => version causes as issue with the injected by Coyote

I speculate it will only work in debug mode, since release mode compile to the same IL. Not a problem for me.

Also note that right now your coyote test will still not work properly even with this workaround in place.

Hope this helps? Let me know if you need more help/clarifications!

Yep, I managed to run tests, and it found a bug !

PS C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6.0\rewritten> coyote test .\CK.MQTT.Client.Tests.dll -i 100
Microsoft (R) Coyote version 1.7.1.0 for .NET 6.0.11
Copyright (C) Microsoft Corporation. All rights reserved.

. Testing C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6.0\rewritten\CK.MQTT.Client.Tests.dll.
..... Anonymized telemetry is enabled, see https://aka.ms/coyote-telemetry.
... Setting up the 'CK.MQTT.Client.Tests.CoyoteTests.simple_publish_qos2_works_Async' test:
..... Using the portfolio[fair,seed:1074707696] exploration strategy.
... Running test iterations:
..... Iteration #1
..... Iteration #2
..... Iteration #3
..... Iteration #4
..... Iteration #5
..... Iteration #6
..... Iteration #7
..... Iteration #8
..... Iteration #9
..... Iteration #10
..... Iteration #16 found bug #1
Deadlock detected. Op(0) is paused on a dependency, but no other controlled operations are enabled.
... Emitting trace-related reports:
..... Writing C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6.0\rewritten\Output\CK.MQTT.Client.Tests.dll\CoyoteOutput\CK.MQTT.Client.Tests_10.txt
..... Writing C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6.0\rewritten\Output\CK.MQTT.Client.Tests.dll\CoyoteOutput\CK.MQTT.Client.Tests_10.trace
... Emitting coverage reports:
..... No coverage reports available.
... Testing statistics:
..... Found 1 bug.
... Scheduling statistics:
..... Explored 16 schedules: 16 fair and 0 unfair.
..... Found 6.25% buggy schedules.
..... Controlled 167 operations: 3 (min), 10 (avg), 11 (max).
..... Degree of concurrency: 2 (min), 2 (avg), 3 (max).
..... Degree of operation grouping: 2 (min), 3 (avg), 4 (max).
..... Number of scheduling decisions in fair terminating schedules: 16 (min), 61 (avg), 118 (max).
... Elapsed 0.1814259 sec.

Bur it looks like it fails to reproduce the bug ?

PS C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6.0\rewritten> coyote replay .\CK.MQTT.Client.Tests.dll ./Output/CK.MQTT.Client.Tests.dll/CoyoteOutput/CK.MQTT.Client.Tests_9.trace
Microsoft (R) Coyote version 1.7.1.0 for .NET 6.0.11
Copyright (C) Microsoft Corporation. All rights reserved.

. Reproducing trace in C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6.0\rewritten\CK.MQTT.Client.Tests.dll.
..... Anonymized telemetry is enabled, see https://aka.ms/coyote-telemetry.
... Setting up the 'CK.MQTT.Client.Tests.CoyoteTests.simple_publish_qos2_works_Async' test:
..... Using the probabilistic[bound:0,seed:766327259'] exploration strategy.
... Running test.
Failed to reproduce the bug.
... Elapsed 0.0453896 sec.

@pdeligia
Copy link
Member

pdeligia commented Dec 14, 2022

I speculate it will only work in debug mode, since release mode compile to the same IL. Not a problem for me.

Oh, I was not aware that this was the case with this pattern, thanks for the heads up, very helpful once I get to fix this issue properly!

Yep, I managed to run tests, and it found a bug !

This is awesome!

Bur it looks like it fails to reproduce the bug ?

This is interesting, could you copy paste here the JSON configuration file that you are using for rewriting? Which DLLs are you rewriting? Is it only a partial set of the test dependencies?

One very likely reason this might happen is due to the partially-controlled concurrency mode (enabled by default) that I mentioned earlier. Basically, to deterministically replay a bug using coyote replay, Coyote must be able to control all sources of nondeterminism (both schedule, such as tasks running or returned from methods from unrewritten DLLs, and data, e.g. random decisions, random GUIDs/DateTime influencing branch decisions), else any uncontrolled such sources can interfere with replaying causing the bug to be missed! In such cases you get coverage from coyote test, but sadly miss out on deterministic repro.

Could this be what is happening here? Do you see any logs mentioning "uncontrolled" if you enable -v debug (or Configuration.WithVerbosityEnabled(VerbosityLevel.Debug) if you are instead using the TestingEngine programmatic API)? (These logs are not exhaustive but catch most of the main sources of uncontrolled nondeterminism that the tool detects)

@Kuinox
Copy link
Author

Kuinox commented Dec 14, 2022

Do you see any logs mentioning "uncontrolled" if you enable -v debug

Doesn't looks like:

(base) PS C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6.0\rewritten> coyote test .\CK.MQTT.Client.Tests.dll -i 100 -v debug
Microsoft (R) Coyote version 1.7.1.0 for .NET 6.0.12
Copyright (C) Microsoft Corporation. All rights reserved.

. Testing C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6.0\rewritten\CK.MQTT.Client.Tests.dll.
[coyote::debug] Resolving assembly 'CK.MQTT.Client'.
[coyote::debug] Resolving assembly 'CK.MQTT.Abstractions'.
[coyote::debug] Resolving assembly 'CK.MQTT.LowLevelClient'.
..... Anonymized telemetry is enabled, see https://aka.ms/coyote-telemetry.
[coyote::debug] Resolving assembly 'nunit.framework'.
... Setting up the 'CK.MQTT.Client.Tests.CoyoteTests.simple_publish_qos2_works_Async' test:
..... Using the portfolio[fair,seed:996220488] exploration strategy.
... Running test iterations:
..... Iteration #1
[coyote::debug] Resolving assembly 'CK.ActivityMonitor'.
[coyote::debug] Resolving assembly 'CK.Core'.
[coyote::debug] Resolving assembly 'Microsoft.IO.RecyclableMemoryStream'.
[coyote::debug] Resolving assembly 'CK.PerfectEvent'.
[coyote::debug] Resolving assembly 'CK.ActivityMonitor.SimpleSender'.
[coyote::debug] Resolving assembly 'CommunityToolkit.HighPerformance'.
..... Iteration #2
..... Iteration #3
..... Iteration #3 found bug #1
Deadlock detected. Op(0) is paused on a dependency, but no other controlled operations are enabled.
[coyote::telemetry] Tracking event: test.
[coyote::telemetry] Tracking metric: test-time=0.1535273.
[coyote::telemetry] Tracking metric: test-bugs=1.
... Emitting trace-related reports:
..... Writing C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6.0\rewritten\Output\CK.MQTT.Client.Tests.dll\CoyoteOutput\CK.MQTT.Client.Tests_14.txt
..... Writing C:\dev\CK-MQTT\Tests\CK.MQTT.Client.Tests\bin\Debug\net6.0\rewritten\Output\CK.MQTT.Client.Tests.dll\CoyoteOutput\CK.MQTT.Client.Tests_14.trace
... Emitting coverage reports:
..... No coverage reports available.
... Testing statistics:
..... Found 1 bug.
... Scheduling statistics:
..... Explored 3 schedules: 3 fair and 0 unfair.
..... Found 33.33% buggy schedules.
..... Controlled 24 operations: 3 (min), 8 (avg), 11 (max).
..... Degree of concurrency: 2 (min), 2 (avg), 3 (max).
..... Degree of operation grouping: 2 (min), 3 (avg), 4 (max).
..... Number of scheduling decisions in fair terminating schedules: 16 (min), 63 (avg), 115 (max).
... Elapsed 0.1535273 sec.

This is interesting, could you copy paste here the JSON configuration file that you are using for rewriting?

It's available here: https://github.com/signature-opensource/CK-MQTT/blob/testing-coyote/Tests/CK.MQTT.Client.Tests/coyote.json

I have put all our companies library as to be rewritten, and the behavior doesn't change.

@Kuinox
Copy link
Author

Kuinox commented Dec 14, 2022

I retried by putting every single DLL in the "Assemblies" list(except the one on the exclude list), and it shows the same behavior

@pdeligia
Copy link
Member

That is strange! It must be some source of uncontrolled concurrency/data-nondeterminism that is not intercepted/logged by Coyote yet ... (if we figure what that is, I am happy to try support it, and at least intercept/log it first).

About the deadlock do you have a sense yet what might be causing it?

@Kuinox
Copy link
Author

Kuinox commented Dec 20, 2022

About the deadlock do you have a sense yet what might be causing it?

No, I need to investigate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-binary-rewriting Issues related to binary rewriting bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants