Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace bou.ke/monkey with bytedance/mockey #570

Merged
merged 6 commits into from
Sep 13, 2024

Conversation

samanhappy
Copy link
Collaborator

@samanhappy samanhappy commented Sep 5, 2024

As mentioned in #519, the bou.ke/monkey library has been archived for a long time and fails on the macos-arm64 architecture. Consequently, we need to find an alternative library.

After extensive testing, I found that gomonkey bytedance/mockey is the only library that works, although its APIs are not compatible with bou.ke/monkey. To address this, I have introduced a separate monkey module in EaseProbe that encapsulates the gomonkey bytedance/mockey library. This approach minimizes the cost of modifying test code and allows us to easily replace the library in the future without affecting our test code.

Key code changes include:

  1. Makefile Update: Changed -gcflags=-l to -gcflags=all=-l to accommodate changes for gomonkey bytedance/mockey.
  2. Test Files Update: Most test files only required changing the import path from bou.ke/monkey to github.com/megaease/easeprobe/monkey. However, in probe/ssh/ssh_test.go, where the original usage was unsupported, I modified the logic accordingly.
  3. Monkey Module: In monkey/monkey.go, I used a sync.Map to store all patches, which supports reset operations. A notable method called wait employs busy waiting to prevent test failures on macos-arm64. The exact cause of these errors is unclear to me, and this is a workaround. If anyone has a better solution, please let me know, as I am not very familiar with the ARM64 architecture.

Resolve #519

@codecov-commenter
Copy link

codecov-commenter commented Sep 5, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 86.56716% with 9 lines in your changes missing coverage. Please review.

Project coverage is 89.29%. Comparing base (eef3a25) to head (615c2f3).
Report is 65 commits behind head on main.

Files with missing lines Patch % Lines
monkey/monkey.go 86.56% 7 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #570      +/-   ##
==========================================
- Coverage   98.80%   89.29%   -9.52%     
==========================================
  Files          85       93       +8     
  Lines        5867     6528     +661     
==========================================
+ Hits         5797     5829      +32     
- Misses         51      670     +619     
- Partials       19       29      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@samanhappy
Copy link
Collaborator Author

Refined: The SIGBUS error still occurs occasionally, as seen here, and I haven't found a solution yet.

Copy link
Collaborator

@proditis proditis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge change 💪 , and monkey patching is still kind of foreign to me. Testing and letting you know.

@suchen-sci
Copy link
Contributor

Refined: The SIGBUS error still occurs occasionally, as seen here, and I haven't found a solution yet.

Try to fix it, then you will be the new monkey king!

@samanhappy samanhappy changed the title replace bou.ke/monkey with gomonkey replace bou.ke/monkey with bytedance/mockey Sep 8, 2024
@samanhappy
Copy link
Collaborator Author

Bad news: It appears that the SIGBUS error is related to the Gomonkey library, as discussed in this issue.

Good news: I've discovered a new monkey patching library bytedance/mockey. After replacing gomonkey with it, the test results are looking promising so far.

Copy link
Collaborator

@proditis proditis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run the tests a few times on my system and i seem to be getting some random errors. I havent pinpointed the cause yet just letting you know in case this something you've seen also.

See attached image, these are two consecutive make test runs with no change between invocations.
image

edit: Sorry i forgot to say that I have made a change to the way the tests run in order to avoid caching, not sure if this is the cause for the error go test -count=1 -gcflags=all=-l -cover -race ${TEST_FLAGS} -v ./...

edit 2: Following up on this, i run the tests 100 times and this error came up 6 times. This is something we'll have to figure out before merging otherwise we will end up with random failures during tests.

@samanhappy
Copy link
Collaborator Author

@proditis Thanks so much for your testing efforts! The random errors are also unacceptable to me. To help us investigate further, could you please provide the following information about your test environment:

  1. What is your system version? Is it macOS ARM64?
  2. What are the exact error messages when the tests fail? Do the errors have anything in common, or are they entirely random?

@proditis
Copy link
Collaborator

proditis commented Sep 8, 2024

My system is Linux x86_64 and go version go1.23.0 linux/amd64.

I will repeat the tests to try to figure out what is causing this failure. Will keep you posted

edit: Please find attached the output from two failed tests test1.log & test2.log

As far as i can see all failed tests are related to the TestOpenLogFail tests.

@proditis
Copy link
Collaborator

proditis commented Sep 8, 2024

I took a closer look at the failed test. Do we need to monkey.patch this call?
image

I think that with a bit of rework on this test case we may be able to get rid of this error.

@samanhappy
Copy link
Collaborator Author

Sorry for the delayed response. I reviewed the two error logs and the original TestOpenLogFail code. Here are some thoughts:

Both failures occur within the lumberjack library's internal functions. We need to determine whether these errors are caused solely by lumberjack or if our patch operations are contributing to the issue.

While the patch call in the image you attached is necessary, the patch call starting at line 188 (shown below) doesn't seem to have any effect on our test:

var lum *lumberjack.Logger
monkey.PatchInstanceMethod(reflect.TypeOf(lum), "Rotate", func(_ *lumberjack.Logger) error {
    return fmt.Errorf("error")
})

I couldn't reproduce the error on my MacBook. My suggestion is to remove this patch code and run the tests again on your system to determine whether the issue originates from the library or the patch. Does that sound like a good plan?

@proditis
Copy link
Collaborator

Hi thanks for the feedback, although i still dont understand why these calls need to be patched 🤣

I commented out the following

	//var lum *lumberjack.Logger
	//	monkey.PatchInstanceMethod(reflect.TypeOf(lum), "Rotate", func(_ *lumberjack.Logger) error {
	//		return fmt.Errorf("error")
	//	})

Rerun the tests and got the following error test3.log

PS: This is how I run my tests in case it helps, i am getting a failed run every ~20-40 runs

fail=0; runs=1; while [ "$fail" -eq 0 ];do echo "[$runs] Running tests"; make test >test.log || ((fail=fail+1)); ((runs=runs+1)); done

Copy link
Collaborator

@proditis proditis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉 🎉 🎉 🎉

@samanhappy
Copy link
Collaborator Author

Many thanks to @proditis for helping us resolve the random errors. @suchen-sci, you can now review this PR again.

@suchen-sci suchen-sci merged commit 1ccb321 into megaease:main Sep 13, 2024
6 checks passed
@suchen-sci
Copy link
Contributor

Sorry for the delayed response... LGTM!

@samanhappy
Copy link
Collaborator Author

Oops, the SIGBUS: bus error still exists as show here !

@proditis
Copy link
Collaborator

Oops, the SIGBUS: bus error still exists as show here !

I dont have a mac to run tests, but i suspect the same errors will come up with OpenBSD as well, so i'll prepare a system to run tests and see what fails. OpenBSD is much more strict with these types of violations, so hopefully we'll be able to catch most of these errors.

Could this failure be logger related also?

PS: Maybe we need to open an issue to keep track of these crashes and close it only after we have run enough tests to ensure we fixed all such failures?

@samanhappy
Copy link
Collaborator Author

I'm not sure if these errors are related to the logger, but all the SIGBUS: bus error occurrences seem to be specific to macOS-arm64.

I've opened an issue as per your advice. Let's track the progress there and keep each other updated on any developments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Find an alternative solution for bou.ke/monkey
4 participants