Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application get crash calling LDClientSDK_Identify right after system wake up from sleep on Windows 10. #425

Closed
ngangomsamananda opened this issue Jul 22, 2024 · 37 comments
Labels
bug Something isn't working package: sdk/client Issues affecting the C++ Client SDK waiting for feedback Indicates LaunchDarkly is waiting for customer feedback before issue is closed due to staleness.

Comments

@ngangomsamananda
Copy link

ngangomsamananda commented Jul 22, 2024

Is this a support request?
This issue tracker is maintained by LaunchDarkly SDK developers and is intended for feedback on the code in this library. If you're not sure whether the problem you are having is specifically related to this library, or to the LaunchDarkly service overall, it may be more appropriate to contact the LaunchDarkly support team; they can help to investigate the problem and will consult the SDK team if necessary. You can submit a support request by going here and clicking "submit a request", or by emailing [email protected].

Note that issues filed on this issue tracker are publicly accessible. Do not provide any private account information on your issues. If your problem is specific to your account, you should submit a support request as described above.

Describe the bug
Application gets crash calling LDClientSDK_Identify right after system wake up from sleep on Windows 10. There was a similar issue on MacOS but I was able to fix that by having a check for Initialize before calling LDClientSDK_Identify but on Windows the behavior looks different.

Difference notice on log between MacOS and Windows:

MacOS:
When system go sleep the internet got disconnected then the SDK reach a timeout after 5-10 minutes after that backing off log started. I also get the status LDDataSourceStatus_State::LD_DATASOURCESTATUS_STATE_SHUTDOWN. LDClientSDK_Initialized is false

Windows
When system go to sleep the internet get disconnected then backing off message started. I don't see reach to timeout log nor LDDataSourceStatus_State::LD_DATASOURCESTATUS_STATE_SHUTDOWN. LDClientSDK_Initialized is true. But calling LDClientSDK_Identify gets the crash.

Note: The crash does not happen all the time sometimes it just works fine but we get quite a number of crashes.

To reproduce

#include < iostream >
#include <launchdarkly/client_side/bindings/c/sdk.h>
#include <launchdarkly/bindings/c/context_builder.h>
#include <launchdarkly/client_side/bindings/c/config/builder.h>
#include <launchdarkly/bindings/c/object_builder.h>
#include <launchdarkly/bindings/c/memory_routines.h>



LDClientConfigBuilder ConfigBuilder = nullptr;
LDClientSDK g_pLDClient = nullptr;
LDListenerConnection m_listenerConnection = nullptr;

static bool enabled(const LDLogLevel level, void* user_data)
{
    return true;
}

void globalLogger(const LDLogLevel level, const char* text, void* user_data)
{
}
void initLDCallback(LDDataSourceStatus status, void* userData)
{
    LDDataSourceStatus_State statusState = LDDataSourceStatus_GetState(status);
    if (statusState == LDDataSourceStatus_State::LD_DATASOURCESTATUS_STATE_INITIALIZING)
    {
        // log message
    }
    else if (statusState == LDDataSourceStatus_State::LD_DATASOURCESTATUS_STATE_VALID)
    {
    }
    else if (statusState == LDDataSourceStatus_State::LD_DATASOURCESTATUS_STATE_INTERRUPTED)
    { 
    }
    else if (statusState == LDDataSourceStatus_State::LD_DATASOURCESTATUS_STATE_OFFLINE)
    {  
    }
    else
    {
    }
}
bool getPeerVerify()
{
    return false;
}

bool resetNewUserContext(const std::string& userAnalyticsId)
{
    if (g_pLDClient != nullptr && LDClientSDK_Initialized(g_pLDClient))
    {
        LDContextBuilder contextBuilder = nullptr;
        if (!userAnalyticsId.empty())
        {
            contextBuilder = LDContextBuilder_New();
            LDContextBuilder_AddKind(contextBuilder, "user", userAnalyticsId.c_str());
        }

        bool identified = false;
        unsigned int waitTime = 5000; // milliseconds
        LDContext updatedContext = LDContextBuilder_Build(contextBuilder);

        if (LDClientSDK_Identify(g_pLDClient, updatedContext, waitTime, &identified))
        {
            if (identified)
            {
                // log message
            }
            else
            {
                // log message
            }
        }
        else
        {
            // log message
        }
    }
  return true;
}


int main()
{
    const char* userid = "user-key";
    const char* mobile_key = "mob-abc-123";
	ConfigBuilder = LDClientConfigBuilder_New(mobile_key);

    bool bOffline = false;
    
    if (bOffline)
        LDClientConfigBuilder_Offline(ConfigBuilder, bOffline);

    // logging
    LDLogBackend backend;
    LDLogBackend_Init(&backend);
    backend.Write = globalLogger;
    backend.Enabled = enabled;
    LDLoggingCustomBuilder customLogging = LDLoggingCustomBuilder_New();
    LDLoggingCustomBuilder_Backend(customLogging, backend);
    LDClientConfigBuilder_Logging_Custom(ConfigBuilder, customLogging);


    LDClientConfigBuilder_Events_PrivateAttribute(ConfigBuilder, "hubId");

    LDClientHttpPropertiesTlsBuilder tlsBuilder = LDClientHttpPropertiesTlsBuilder_New();
    if (tlsBuilder)
    {
        // Disable peer verification
        LDClientHttpPropertiesTlsBuilder_SkipVerifyPeer(tlsBuilder, true);
        bool verifyPeer = getPeerVerify();
        if(verifyPeer)
        {
            LDClientHttpPropertiesTlsBuilder_SkipVerifyPeer(tlsBuilder, false);
        }
        LDClientConfigBuilder_HttpProperties_Tls(ConfigBuilder, tlsBuilder);
    }

    LDClientConfig config;
    LDStatus status = LDClientConfigBuilder_Build(ConfigBuilder, &config);
    if (!LDStatus_Ok(status)) {
      std::cout<<"ldcsdk client config builder failed";
    }

    LDContextBuilder context_builder = LDContextBuilder_New();
    LDContextBuilder_AddKind(context_builder, "user", userid);

    LDContextBuilder_Attributes_SetPrivate(context_builder, "user", "email", LDValue_NewString("[email protected]"));

    LDContext context = LDContextBuilder_Build(context_builder);
    g_pLDClient = LDClientSDK_New(config, context);
    unsigned int maxwait = 10 * 1000; /* 10 seconds */

    LDDataSourceStatusListener listener;
    LDDataSourceStatusListener_Init(&listener);
    listener.StatusChanged = initLDCallback;
    m_listenerConnection = LDClientSDK_DataSourceStatus_OnStatusChange(g_pLDClient, listener);

    bool initialized_successfully;
    if (LDClientSDK_Start(g_pLDClient, maxwait, &initialized_successfully)) 
    {
        if (initialized_successfully) 
        {
            std::cout << "LaunchDarkly: client Initialization Succeded." << std::endl;
        }
        else 
        {
            std::cout<<"LaunchDarkly: client Initialization failed\n";
        }
    } 
    else {
            std::cout<<"LaunchDarkly: The client is still initializing.\n";
    }

    const char* newUserid = "new-user-key";
    resetNewUserContext(newUserid);

}

resetNewUserContext(newUserid) is call right after its internet get connected. On MacOS the crash does not happen.

I have done what you have suggested for the MacOS.

Expected behavior
Calling LDClientSDK_Identify should not get crash similar to MacOS.

I am out of ideas why the crash happened. I am also wondering why there is behavior difference between MacOS and Windows.
Could you provide some suggestion on what I am doing wrong or what I should be doing to avoid the crash?

Logs
This is the call stack of the crash, but I don't get much useful information from it.
image

SDK version
3.5.0

Language version, developer tools
C++17

OS/platform
Windows 10

Additional context
Add any other context about the problem here.

@ngangomsamananda ngangomsamananda added bug Something isn't working package: sdk/client Issues affecting the C++ Client SDK labels Jul 22, 2024
@kinyoklion
Copy link
Member

kinyoklion commented Jul 22, 2024

Hello @ngangomsamananda,

Thank you for the report. I will see if there is anything I can ascertain from the information you have provided so far.

One thing that strikes me as interesting is that the callstack only seems to have information which is likely related to logging. I am curious what the implementation of the log back-end you are using looks like. There isn't anything in the SDK itself that calls LDLogLevel_Name, so I assume you may be calling it during logging?

(Though maybe the stack is symbolicating improperly)

Thank you,
Ryan

@ngangomsamananda
Copy link
Author

we have a global logger similar to

void globalLogger(const LDLogLevel level, const char* text, void* user_data)
{
}

which I have in the sample code. We are not calling any other API except LDClientSDK_Initialized

@kinyoklion
Copy link
Member

kinyoklion commented Jul 23, 2024

@ngangomsamananda In the global logger are you not converting the LDLogLevel, which is part of the signature, to a string?

@kinyoklion
Copy link
Member

If you are converting it, then it means that the calls in the stack could be related to real calls. If you are not, then it for sure means the stack is not providing actionable information.

@kinyoklion
Copy link
Member

To clarify you see that exact stack trace with the stub code, and that stack trace isn't from production code which does more actual work in those methods.

@ngangomsamananda
Copy link
Author

@ngangomsamananda In the global logger are you not converting the LDLogLevel, which is part of the signature, to a string?

LDLogLevel is not converting to a string. LDLogLevel in globalLogger is a constant.

void globalLogger(const LDLogLevel level, const char* text, void* user_data)
{
}

@kinyoklion
Copy link
Member

It is a constant, but a normal implementation would create a string from that constant for the purposes of putting it into a log message. So that your log message can have "info" for instance instead of a number. That is what the LDLogLevel_Name function in the callstack does.

@kinyoklion
Copy link
Member

It may be, due to the way the binary is built, that all the calls inside the SDK are being reported as LDLogLevel_Name. Which, discards the value of the stack. Are you using one of the pre-built DLLs, or are you building the SDK in any specific configuration? (For windows the variation of runtime library needs to match exactly if you are building a DLL independently of your executable.)

@ngangomsamananda
Copy link
Author

It is a constant, but a normal implementation would create a string from that constant for the purposes of putting it into a log message. So that your log message can have "info" for instance instead of a number. That is what the LDLogLevel_Name function in the callstack does.

We are using our logging level (info, warning and error). LDLogLevel in globalLogger is use only to get the log level by comparing with.
enum LDLogLevel {
LD_LOG_DEBUG = 0,
LD_LOG_INFO = 1,
LD_LOG_WARN = 2,
LD_LOG_ERROR = 3,
};

@ngangomsamananda
Copy link
Author

ngangomsamananda commented Jul 23, 2024

It may be, due to the way the binary is built, that all the calls inside the SDK are being reported as LDLogLevel_Name. Which, discards the value of the stack. Are you using one of the pre-built DLLs, or are you building the SDK in any specific configuration? (For windows the variation of runtime library needs to match exactly if you are building a DLL independently of your executable.)

Yes, I am using the pre-build DLLs available in the repository.

@kinyoklion
Copy link
Member

We provide 2 DLLs, one which is a multi-threaded debug DLL (/MDd) and one which is a multi-threaded DLL (/MD).

I want to verify which of these, and what build configuration your application and any other DLLs are using. One way that things become problematic is, for instance if you are using the non-debug DLL, which links to MSVCRT.lib, with an application linking to MSVCRTD.lib. (Or the opposite a debug DLL with a non-debug configuration).

@ngangomsamananda
Copy link
Author

We are using https://github.com/launchdarkly/cpp-sdks/releases/download/launchdarkly-cpp-client-v3.5.0/windows-msvc-x64-dynamic.zip

@ngangomsamananda
Copy link
Author

ngangomsamananda commented Jul 23, 2024

Difference notice on log between MacOS and Windows:

There is some behavior different in SDK between MacOS and Windows. I have added that in this ticket. I am wondering why that happened. On MacOS I am not getting crash anymore.

FYI, I build the binaries for MacOS myself.

@ngangomsamananda
Copy link
Author

There was a similar issue which gets on MacOS. But that got resolved by following the suggestion in #385 (comment)

@kinyoklion
Copy link
Member

I am able to produce a crash when a network is not available. Likely network availability may take some time to recover after sleep. I will look into it and let you know.

@ngangomsamananda
Copy link
Author

Thank you. It was not easy to reproduce for us too. Sometimes nothing happens and sometimes it gets the crash.

@kinyoklion
Copy link
Member

Hello @ngangomsamananda

I published a release with a fix for the crash I was able to find. I cannot be completely certain that this was your problem, but it seems likely.

https://github.com/launchdarkly/cpp-sdks/releases/tag/launchdarkly-cpp-client-v3.6.3

Please let me know if it resolves your problem.

Thank you,
Ryan

@kinyoklion kinyoklion added the waiting for feedback Indicates LaunchDarkly is waiting for customer feedback before issue is closed due to staleness. label Jul 25, 2024
@ngangomsamananda
Copy link
Author

@kinyoklion Thank you so much. We will try this fix and see if that resolves the issue.
I have a logged a feature request ticket #426 . We could not move to v3.x version due to that. It will be very helpful if you could take up that new feature ticket on priority.

@ngangomsamananda
Copy link
Author

Hi @kinyoklion, I was testing the sdk v3.6.3 which you had released for the fix. Now it gets crash when the new client-side SDK is construct. The crash happens at LDClientSDK_New. I use the exact same code which I use for v3.5.0. I am also able to reproduce in the sample code which I have provided in this ticket. Could you have a look what the issue is?

@kinyoklion
Copy link
Member

kinyoklion commented Jul 29, 2024

Hello @ngangomsamananda,

I ran your sample code, with just the mobile key changed, and it initialized the client successfully using both the debug and the release versions of the DLLs (when built in the correct configuration.).

I would again verify you are using a compatible compiler in exactly the same build configuration as the release DLLs, or to build the DLLs with your configuration.

I tested using Visual Studio Community 2022 version 17.10.1.

Thank you,
Ryan

@kinyoklion
Copy link
Member

I will do additional testing running it many times.

@kinyoklion
Copy link
Member

kinyoklion commented Jul 29, 2024

There may be some problem with the TLS configuration. I cannot produce a crash creating a single new client, but creating multiple clients encounters a problem with TLS configuration. So we will look into that.

TLS config implementation was fine. Usage was incorrect.

@kinyoklion
Copy link
Member

@ngangomsamananda

There is a problem with your example code. The configuration of TLS is happening after the configuration builder is already invalid.

It has been consumed by creating the config. You need to configure TLS before building the configuration.

Thank you,
Ryan

@kinyoklion
Copy link
Member

    LDClientConfig config;
    LDStatus status = LDClientConfigBuilder_Build(ConfigBuilder, &config);
    if (!LDStatus_Ok(status)) {
      std::cout<<"ldcsdk client config builder failed";
    }

	// !!!!!!!!!!! This code needs to happen before LDClientConfigBuilder_Build
    LDClientHttpPropertiesTlsBuilder tlsBuilder = LDClientHttpPropertiesTlsBuilder_New();
    if (tlsBuilder)
    {
        // Disable peer verification
        LDClientHttpPropertiesTlsBuilder_SkipVerifyPeer(tlsBuilder, true);
        bool verifyPeer = getPeerVerify();
        if(verifyPeer)
        {
            LDClientHttpPropertiesTlsBuilder_SkipVerifyPeer(tlsBuilder, false);
        }
        LDClientConfigBuilder_HttpProperties_Tls(ConfigBuilder, tlsBuilder);
    }

@ngangomsamananda
Copy link
Author

I have moved the TLS build as you have suggested. I still get the crash.
image

I am using Visual Studio Professional 2022 (64-bit) - Current Version 17.4.4

@ngangomsamananda
Copy link
Author

This is the call stack.
image

@kinyoklion
Copy link
Member

kinyoklion commented Jul 30, 2024

@ngangomsamananda Was that crash with the debug version of the library? I see msvcp140d.dll which indicates a debug build, so I again want to validate that the debug DLL is used in debug and the release DLL in release. If it is the release DLL, then use the debug one and we can get a correct stack.

This crash is in locking a std::mutex (if you are using the debug version, if you are using the release build it could be anywhere.) Which likely indicates it is either memory corruption (similar to creating the TLS config after the config is no longer valid), or because of something like an ABI mismatch, or potentially initialization order. I can really only act on it if I know that the stack trace can be trusted.

The first crash, that started this thread, was from an exception being thrown under a specific condition, which should have been handled by the SDK.

At this point I cannot reproduce either. I updated your example to run inside a loop (with the corrected config and without global variables) and executed it continuously for about 5 hours and encountered no issues.

@kinyoklion
Copy link
Member

I also think you should be able to get the debug symbols for msvcp140.dll from Microsoft symbol servers.

@kinyoklion
Copy link
Member

At this point I would do the following.

1.Comment out all but the minimum of configuration. If it runs with this configuration, then incrementally re-add configuration until it doesn't. Then that will either reveal what the problem was (for instance the TLS config initialization order when I did this process.) Or reveal that even the basic configuration has some problem.

  1. Build the DLL locally and see if it has the same problem.

The only difference for the SDK version to resolve the identify issue was catching the exception. So whatever else is different is between the version of the DLL you were using and the intermediate releases.

So, another possibility is to download a few of those versions and see if one specific one starts crashing. Then we can isolate changes between those.

Thank you,
Ryan

@ngangomsamananda
Copy link
Author

ngangomsamananda commented Jul 31, 2024

I have removed and keep only the minimum code still get the crash

#include <iostream>
#include <launchdarkly/client_side/bindings/c/sdk.h>
#include <launchdarkly/bindings/c/context_builder.h>
#include <launchdarkly/client_side/bindings/c/config/builder.h>

LDClientSDK g_pLDClient = nullptr;
int main()
{
    const char* userid = "user_id";
    const char* mobile_key = "mob-123-abc";
    LDClientConfigBuilder configBuilder = LDClientConfigBuilder_New(mobile_key);
    LDClientConfig config;
    LDStatus status = LDClientConfigBuilder_Build(configBuilder, &config);
    if (!LDStatus_Ok(status)) {
        std::cout << "ldcsdk client config builder failed";
    }

    LDContextBuilder context_builder = LDContextBuilder_New();
    LDContextBuilder_AddKind(context_builder, "user", userid);
    LDContext context = LDContextBuilder_Build(context_builder);
    g_pLDClient = LDClientSDK_New(config, context);
    unsigned int maxwait = 10 * 1000; /* 10 seconds */
    bool initialized_successfully;
    if (LDClientSDK_Start(g_pLDClient, maxwait, &initialized_successfully))
    {
        if (initialized_successfully)
        {
            std::cout << "LaunchDarkly: client Initialization Succeded." << std::endl;
        }
        else
        {
            std::cout << "LaunchDarkly: client Initialization failed\n";
        }
    }
    else {
        std::cout << "LaunchDarkly: The client is still initializing.\n";
    }
}

The crash started from v3.6.1. v3.6.2 also get the same crash. I had tried v3.6.0 and it works fine.

@ngangomsamananda
Copy link
Author

ngangomsamananda commented Jul 31, 2024

@ngangomsamananda Was that crash with the debug version of the library? I see msvcp140d.dll which indicates a debug build, so I again want to validate that the debug DLL is used in debug and the release DLL in release. If it is the release DLL, then use the debug one and we can get a correct stack.

This crash is in locking a std::mutex (if you are using the debug version, if you are using the release build it could be anywhere.) Which likely indicates it is either memory corruption (similar to creating the TLS config after the config is no longer valid), or because of something like an ABI mismatch, or potentially initialization order. I can really only act on it if I know that the stack trace can be trusted.

The first crash, that started this thread, was from an exception being thrown under a specific condition, which should have been handled by the SDK.

At this point I cannot reproduce either. I updated your example to run inside a loop (with the corrected config and without global variables) and executed it continuously for about 5 hours and encountered no issues.

I use the debug version of library to see the call stack. It crashes on both release and debug version.

@cwaldren-ld
Copy link
Contributor

Hi @ngangomsamananda , I can't see anything in 3.6.1 and 3.6.2 that would introduce an issue.

I have noticed a bug in the example. The SDK is not freed.

Could you add LDClientSDK_Free(g_pLDClient); as the last line before main function exits?

@ngangomsamananda
Copy link
Author

ngangomsamananda commented Aug 7, 2024

The crash occurs at LDClientSDK_New before the client SDK started. So, adding LDClientSDK_Free(g_pLDClient) at the end of main function does not make any difference.

I am wondering why the same code does not get crash in the version 3.6.0 and the versions released before and it gets crash in v3.6.1, v3.6.2 and v3.6.3.

@cwaldren-ld
Copy link
Contributor

cwaldren-ld commented Aug 7, 2024

Can you remove all of the code after LDClientSDK_New and test again?

#include <iostream>
#include <launchdarkly/client_side/bindings/c/sdk.h>
#include <launchdarkly/bindings/c/context_builder.h>
#include <launchdarkly/client_side/bindings/c/config/builder.h>

LDClientSDK g_pLDClient = nullptr;
int main()
{
    const char* userid = "user_id";
    const char* mobile_key = "mob-123-abc";
    LDClientConfigBuilder configBuilder = LDClientConfigBuilder_New(mobile_key);
    LDClientConfig config;
    LDStatus status = LDClientConfigBuilder_Build(configBuilder, &config);
    if (!LDStatus_Ok(status)) {
        std::cout << "ldcsdk client config builder failed";
    }

    LDContextBuilder context_builder = LDContextBuilder_New();
    LDContextBuilder_AddKind(context_builder, "user", userid);
    LDContext context = LDContextBuilder_Build(context_builder);
    g_pLDClient = LDClientSDK_New(config, context);
}

Please make sure you are performing a totally clean build when testing.

Additionally, if you are able, it would help to try building the SDK from source using cmake. This would eliminate any compiler mismatch between our release DLLs and your project. If it works, then we know the issue is due to some mismatch.

Lastly, you might trying clicking Repair button as in this StackOverflow.

@ngangomsamananda
Copy link
Author

As you have suggested I have tried by removing all the codes after LDClientSDK_New with a clean build. But the result is same. Still get the crash at LDClientSDK_New.

I will try to build the SDK locally from the source and see if that solve the issue.

@ngangomsamananda
Copy link
Author

Hi @cwaldren-ld I am not getting the crash at LDClientSDK_New anymore after the Visual Studio is updated to the latest version. I will test and update about the fixed of crash when the network is not available/ back from sleep.

@ngangomsamananda
Copy link
Author

@cwaldren-ld We are not able to reproduce the crash anymore with the fixed provided in v3.6.3. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working package: sdk/client Issues affecting the C++ Client SDK waiting for feedback Indicates LaunchDarkly is waiting for customer feedback before issue is closed due to staleness.
Projects
None yet
Development

No branches or pull requests

3 participants