fix: add events when pre- or post-upgrade check fails #3211

james-munson · 2024-10-14T22:35:43Z

Which issue(s) this PR fixes:

longhorn/longhorn#9569

What this PR does / why we need it:

Add an event when upgrade pre- or post-check job completes, either with error or success message.

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

New Features
- Enhanced event broadcasting and recording capabilities during the upgrade process.
- Improved logging for pre-upgrade checks and outcomes.
Bug Fixes
- Simplified event handling by streamlining the event broadcaster functionality.
Documentation
- Updated comments for better clarity on upgrade path checks.

james-munson · 2024-10-15T19:20:49Z

app/post_upgrade.go

+		// hang around so logs cam be collected.
+		// TODO - make this a --ttl argument.
+		time.Sleep(1 * time.Hour)
+	}


Note that with the event captured, this is not as necessary, but it might be useful. I'm not sure what to use for the time to wait. The pre-upgrade job itself has spec.activeDeadlineSeconds: 900 so the pod will be killed after 15 minutes anyway, and perhaps that is a reasonable value to use.

If the event is emitted, does it need to sleep for minutes?

Not really. The event will last for an hour, so if a support bundle is collected in that time, the event should be there.
It would be the way to accomplish the goal in longhorn/longhorn#9448.
I can certainly take it out if preferred.

I removed it.

I put it back. Without it, sometimes the panic from the "fatal" error means the event does not get created.

I put it back. Without it, sometimes the panic from the "fatal" error means the event does not get created.

Can you elaborate more on the statement?

I put it back. Without it, sometimes the panic from the "fatal" error means the event does not get created.

Looks like the AI suggestion can solve this one. WDYT @james-munson ?

https://github.com/longhorn/longhorn-manager/pull/3211/files#r1803865276

The events are queued by Event/Eventf, but not necessarily propagated to the sink before the Event() call returns. They can be lost when the next thing that happens is an os.Exit as part of the log.Fatal.
However, the AI suggestion is a good one. I tested it, and it looks like eventBroadcaster.Shutdown() forces a flush of the queued events, so they show up even without a sleep to delay the exit. I have pushed up the change.

constant/events.go

coderabbitai · 2024-10-16T19:49:26Z

📝 Walkthrough

Walkthrough

The changes in this pull request focus on enhancing the upgrade process in the Longhorn application. Key modifications include the introduction of new constants for event handling, updates to the postUpgrade and preUpgrade functions to incorporate event broadcasting and recording, and the restructuring of related components to improve organization and clarity. Additionally, a new utility function for creating event broadcasters is added, while some functions are removed or modified for simplicity.

Changes

File Path	Change Summary
app/post_upgrade.go	Introduced constant `PostUpgradeEventer`, modified `postUpgrade` for event handling, updated `newPostUpgrader` to accept `eventRecorder`, and modified `Run` method for event recording.
app/pre_upgrade.go	Defined constant `PreUpgradeEventer`, improved logging in `PreUpgradeCmd`, expanded `preUpgrade` for event handling, added `newPreUpgrader` method, and created `preUpgrader` struct with `Run` method.
app/recurring_job.go	Removed `createEventBroadcaster` function, modified `eventCreate` method for direct event handling.
app/util.go	Added `createEventBroadcaster` function to initialize event broadcasters.
constant/events.go	Added new constants for upgrade events and updated `EventReasonUpgrade` signature.
upgrade/upgrade.go	Updated comments in `doResourceUpgrade` function for clarity on upgrade path checks.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PreUpgrader
    participant PostUpgrader
    participant EventRecorder

    User->>PreUpgrader: Initiate Pre-Upgrade
    PreUpgrader->>EventRecorder: Create Event
    PreUpgrader->>PreUpgrader: Run Pre-Upgrade Checks
    PreUpgrader->>EventRecorder: Record Outcome
    PreUpgrader-->>User: Pre-Upgrade Complete

    User->>PostUpgrader: Initiate Post-Upgrade
    PostUpgrader->>EventRecorder: Create Event
    PostUpgrader->>PostUpgrader: Run Post-Upgrade Checks
    PostUpgrader->>EventRecorder: Record Outcome
    PostUpgrader-->>User: Post-Upgrade Complete

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)

app/util.go (1)

22-23: Address the TODO comment regarding client usage.

There's a TODO comment indicating that the wrapper should be removed when all clients have moved to use the clientset. This suggests that there might be ongoing refactoring or migration work.

Consider creating a tracking issue for this TODO item to ensure it's not forgotten. Additionally, it would be helpful to provide more context about the timeline or conditions for removing this wrapper.
upgrade/upgrade.go (1)
Line range hint 290-315: LGTM: Resource status upgrades and final cleanup steps look good

The implementation correctly handles resource status upgrades for various version paths, consistent with the previous upgrade steps. The final calls to update resource statuses, delete removed settings, and update the Longhorn version setting are crucial for maintaining system consistency after the upgrade process.

One minor suggestion for improved readability:

Consider extracting the repeated semver comparison logic into a helper function to reduce code duplication. For example:
func shouldUpgrade(currentVersion, targetVersion string) bool {
    return semver.Compare(currentVersion, targetVersion) < 0
}

// Usage
if shouldUpgrade(lhVersionBeforeUpgrade, "v1.5.0") {
    // Upgrade logic here
}
This would make the code more concise and easier to maintain.
app/post_upgrade.go (2)
29-29: Consider renaming PostUpgradeEventer for clarity

The constant PostUpgradeEventer represents the event source component name. To enhance readability and alignment with naming conventions, consider renaming it to PostUpgradeEventComponent or PostUpgradeEventSource.

113-113: Consider adding context with timeout to waitManagerUpgradeComplete

The waitManagerUpgradeComplete method uses a fixed retry count and interval, potentially causing long waits. Incorporate a context with a timeout or deadline to allow for cancellation and better control over the waiting period.

Example modification:
func (u *postUpgrader) waitManagerUpgradeComplete(ctx context.Context) error {
	// Use ctx in API calls and add select statements to handle cancellation.
}
Ensure that when calling this function, you pass an appropriate context, possibly with a timeout.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 8387bcf and 31ffc3c.

⛔ Files ignored due to path filters (1)

coverage.out is excluded by !**/*.out

📒 Files selected for processing (6)

app/post_upgrade.go (3 hunks)
app/pre_upgrade.go (3 hunks)
app/recurring_job.go (0 hunks)
app/util.go (1 hunks)
constant/events.go (1 hunks)
upgrade/upgrade.go (4 hunks)

💤 Files with no reviewable changes (1)

app/recurring_job.go

🧰 Additional context used

🔇 Additional comments (15)

app/util.go (3)

3-12: LGTM: Imports are well-organized and relevant.

The imports are correctly structured, following good practices such as grouping and ordering. All imported packages are relevant to the function being implemented.

14-26: LGTM: Well-implemented event broadcaster creation.

The createEventBroadcaster function is well-structured and correctly implements the creation of a Kubernetes event broadcaster. It properly handles error cases, sets up logging, and configures event recording to a Kubernetes sink.

1-26: Summary: New utility function aligns with PR objectives.

The introduction of the createEventBroadcaster function in app/util.go aligns well with the PR objectives of enhancing the upgrade process by adding events. This utility function provides a centralized way to create event broadcasters, which can be used to log pre- and post-upgrade check results.

The implementation is solid, following good practices in error handling, resource initialization, and code structure. It sets a good foundation for improving the visibility and tracking of the upgrade process as intended by this PR.

constant/events.go (1)

65-68: LGTM! Consistent and clear event reason constants added.

The new constants for upgrade events (EventReasonFailedUpgradePreCheck, EventReasonFailedUpgradePostCheck, and EventReasonPassedUpgradeCheck) are well-named and consistent with the existing naming conventions. They effectively address the PR's objective of adding events for pre- and post-upgrade checks.

The reformatting of EventReasonUpgrade improves overall consistency. The naming convention now clearly distinguishes between failure and success scenarios, addressing the concerns raised in previous discussions.

upgrade/upgrade.go (5)

Line range hint 245-259: LGTM: Upgrade path for v1.4.x to v1.5.2 looks good

The implementation correctly handles the upgrade paths from v1.4.x to v1.5.0 and v1.5.1 to v1.5.2. The use of semver comparison ensures that upgrades are only performed when necessary.

259-266: LGTM: Upgrade path for v1.5.x to v1.6.0 is correctly implemented

The code segment properly handles the upgrade path from v1.5.x to v1.6.0, using semver comparison to determine if the upgrade is necessary. This implementation is consistent with the overall upgrade strategy.

Line range hint 266-279: LGTM: Upgrade paths for v1.6.x to v1.7.1 are properly implemented

The code correctly handles the upgrade paths from v1.6.x to v1.7.0 and v1.7.0 to v1.7.1. The use of semver comparisons ensures that upgrades are performed only when necessary, maintaining consistency with the overall upgrade strategy.

Line range hint 279-290: LGTM: Upgrade path for v1.7.x to v1.8.0 and final resource update look good

The code correctly implements the upgrade path from v1.7.x to v1.8.0, consistent with the previous upgrade steps. The final call to upgradeutil.UpdateResources ensures that all resources are updated after the version-specific upgrades, which is a good practice for maintaining system consistency.

Line range hint 1-315: Overall, the upgrade implementation looks solid and well-structured

The changes in this file successfully enhance the upgrade process for Longhorn, addressing various version paths and ensuring proper resource and status updates. The implementation is consistent, follows a clear pattern, and aligns well with the PR objectives.

Key points:

Proper use of semver comparisons for version checks

Consistent handling of upgrade paths for different versions

Appropriate updating of resources and their statuses

Final cleanup steps to maintain system consistency

The code quality is good, with only a minor suggestion for improving readability by extracting the repeated semver comparison logic into a helper function.

app/post_upgrade.go (6)

44-47: Ensure FlagNamespace environment variable is properly set

The FlagNamespace flag is now required and uses types.EnvPodNamespace as its environment variable. Verify that this environment variable is correctly set in all deployment environments to prevent potential issues with namespace resolution.

79-79: Confirm the event recorder is correctly initialized

The event recorder is initialized with the new scheme and event source. Ensure that the scheme includes all necessary types and that PostUpgradeEventer accurately represents the component emitting events.

86-88: Check error handling for newPostUpgrader().Run()

The error returned by newPostUpgrader(namespace, kubeClient, eventRecorder).Run() is assigned to err and returned. Ensure that any errors are appropriately logged or handled upstream to provide clear diagnostics in case of failures.

92-94: Addition of eventRecorder enhances event handling

Adding the eventRecorder to the postUpgrader struct allows the upgrade process to emit events, improving observability.

97-98: Update constructor to include eventRecorder

The newPostUpgrader function now accepts eventRecorder as a parameter, aligning with the updated struct definition. This change ensures that the recorder is properly passed and available within postUpgrader.

69-73: Ensure createEventBroadcaster function is defined and error handling is comprehensive

Verify that the createEventBroadcaster function exists and properly initializes the event broadcaster. Ensure comprehensive error handling within this function to prevent nil returns without errors.

If the function is not defined, you might need to implement it or import the correct package.

app/pre_upgrade.go

app/post_upgrade.go

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 31ffc3c and eaa24a8.

📒 Files selected for processing (6)

app/post_upgrade.go (3 hunks)
app/pre_upgrade.go (3 hunks)
app/recurring_job.go (0 hunks)
app/util.go (1 hunks)
constant/events.go (1 hunks)
upgrade/upgrade.go (4 hunks)

💤 Files with no reviewable changes (1)

app/recurring_job.go

🚧 Files skipped from review as they are similar to previous changes (4)

app/post_upgrade.go
app/util.go
constant/events.go
upgrade/upgrade.go

🧰 Additional context used

🔇 Additional comments (1)

app/pre_upgrade.go (1)

103-108: Ensure the ObjectReference in events refers to a valid Kubernetes object

The ObjectReference used in eventRecorder.Event should refer to an existing Kubernetes object. Using Name: PreUpgradeEventer may not correspond to a valid object, which could affect event visibility and association. Consider referencing a relevant object, such as a Pod, Deployment, or a Longhorn custom resource.

To confirm the validity of the ObjectReference, run the following script:

If no such object exists, update the ObjectReference to point to an existing resource.

app/pre_upgrade.go

Signed-off-by: James Munson <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

app/pre_upgrade.go (2)
60-84: LGTM: Enhanced pre-upgrade process with event recording.

The changes to the preUpgrade function significantly improve the pre-upgrade process by adding event recording capabilities. The event broadcaster, scheme, and recorder are correctly set up and used.

However, consider improving the error handling in the newPreUpgrader().Run() call:
 err = newPreUpgrader(namespace, lhClient, eventRecorder).Run()
 if err != nil {
-    logrus.Warnf("Done with Run() ... err is %v", err)
+    logrus.Errorf("Pre-upgrade encountered an error: %v", err)
 }
This change provides more clarity about the nature of the log message.

96-116: LGTM: Well-implemented Run method with proper event recording.

The Run method effectively encapsulates the pre-upgrade process logic, including proper event recording for both success and failure scenarios. The removal of the sleep after checks is a good improvement.

Consider wrapping the error from upgradeutil.CheckUpgradePath for better context:
 if err = upgradeutil.CheckUpgradePath(u.namespace, u.lhClient, u.eventRecorder, true); err != nil {
-    return err
+    return errors.Wrap(err, "failed to check upgrade path")
 }
This change provides more context when the error is logged or returned.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between eaa24a8 and e8cf7fb.

📒 Files selected for processing (6)

app/post_upgrade.go (3 hunks)
app/pre_upgrade.go (3 hunks)
app/recurring_job.go (0 hunks)
app/util.go (1 hunks)
constant/events.go (1 hunks)
upgrade/upgrade.go (4 hunks)

💤 Files with no reviewable changes (1)

app/recurring_job.go

🚧 Files skipped from review as they are similar to previous changes (4)

app/post_upgrade.go
app/util.go
constant/events.go
upgrade/upgrade.go

🧰 Additional context used

🔇 Additional comments (5)

app/pre_upgrade.go (5)
8-8: LGTM: New imports are appropriate for the added functionality.

The new imports are necessary and correctly added to support the enhanced pre-upgrade process, including event recording and Longhorn-specific operations.

Also applies to: 10-10, 12-14, 17-17

22-24: LGTM: New constant for event recording.

The PreUpgradeEventer constant is well-named and appropriately used for identifying the component in event recording.

42-43: Consider moving the completion log after error handling.

The deferred log statement defer logrus.Info("Completed pre-upgrade.") may not execute if the program exits before returning from the function, such as when logrus.WithError(err).Fatalf is called. This is because deferred functions are not run when os.Exit() is called within Fatalf.

To verify this behavior, we can search for similar patterns in the codebase:
#!/bin/bash
# Search for deferred logs followed by Fatalf calls
rg --type go 'defer\s+logrus\..*\n.*logrus\..*\.Fatalf'
86-90: LGTM: Well-structured preUpgrader struct.

The preUpgrader struct is well-designed, containing all necessary fields for the pre-upgrade process. It encapsulates the required dependencies, promoting better organization and modularity of the code.

92-94: LGTM: Proper constructor for preUpgrader.

The newPreUpgrader function serves as a clean and concise constructor for the preUpgrader struct, correctly initializing all required fields.

PhanLe1010

LGTM

derekbit

LGTM

derekbit · 2024-10-18T00:26:58Z

@mergify backport v1.6.x v1.7.x

mergify · 2024-10-18T00:27:06Z

backport v1.6.x v1.7.x

✅ Backports have been created

#3218 fix: add events when pre- or post-upgrade check fails (backport #3211) has been created for branch v1.6.x but encountered conflicts
#3219 fix: add events when pre- or post-upgrade check fails (backport #3211) has been created for branch v1.7.x but encountered conflicts

james-munson requested review from PhanLe1010 and a team October 14, 2024 22:35

james-munson commented Oct 15, 2024

View reviewed changes

james-munson force-pushed the 9569-pre-upgrade-failure-event branch from 6a1ec9c to cf295c1 Compare October 15, 2024 23:27

derekbit reviewed Oct 16, 2024

View reviewed changes

constant/events.go Outdated Show resolved Hide resolved

derekbit assigned james-munson Oct 16, 2024

derekbit requested review from shuo-wu and c3y1huang October 16, 2024 08:34

james-munson force-pushed the 9569-pre-upgrade-failure-event branch from cf295c1 to 31ffc3c Compare October 16, 2024 19:49

coderabbitai bot reviewed Oct 16, 2024

View reviewed changes

app/pre_upgrade.go Show resolved Hide resolved

app/pre_upgrade.go Outdated Show resolved Hide resolved

app/post_upgrade.go Outdated Show resolved Hide resolved

app/post_upgrade.go Show resolved Hide resolved

james-munson force-pushed the 9569-pre-upgrade-failure-event branch from 31ffc3c to eaa24a8 Compare October 16, 2024 22:16

coderabbitai bot reviewed Oct 16, 2024

View reviewed changes

app/pre_upgrade.go Outdated Show resolved Hide resolved

fix: add events when pre- or post-upgrade check fails

e8cf7fb

Signed-off-by: James Munson <[email protected]>

james-munson force-pushed the 9569-pre-upgrade-failure-event branch from eaa24a8 to e8cf7fb Compare October 17, 2024 16:29

coderabbitai bot reviewed Oct 17, 2024

View reviewed changes

PhanLe1010 approved these changes Oct 17, 2024

View reviewed changes

derekbit approved these changes Oct 18, 2024

View reviewed changes

derekbit merged commit e6ce3f2 into longhorn:master Oct 18, 2024
9 checks passed

This was referenced Oct 18, 2024

fix: add events when pre- or post-upgrade check fails (backport #3211) #3218

Merged

fix: add events when pre- or post-upgrade check fails (backport #3211) #3219

Merged

yangchiu mentioned this pull request Oct 21, 2024

[BUG] Pre-upgrade pod should event the reason for any failures. longhorn/longhorn#9569

Closed

coderabbitai bot mentioned this pull request Oct 24, 2024

fix(upgrade): add sleep to allow event to flush before panic #3234

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add events when pre- or post-upgrade check fails #3211

fix: add events when pre- or post-upgrade check fails #3211

james-munson commented Oct 14, 2024 •

edited by coderabbitai bot

Loading

james-munson Oct 15, 2024

derekbit Oct 16, 2024

james-munson Oct 16, 2024

james-munson Oct 16, 2024

james-munson Oct 16, 2024

derekbit Oct 16, 2024

PhanLe1010 Oct 17, 2024

james-munson Oct 17, 2024

coderabbitai bot commented Oct 16, 2024 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

PhanLe1010 left a comment

derekbit left a comment

derekbit commented Oct 18, 2024

mergify bot commented Oct 18, 2024 •

edited

Loading

fix: add events when pre- or post-upgrade check fails #3211

fix: add events when pre- or post-upgrade check fails #3211

Conversation

james-munson commented Oct 14, 2024 • edited by coderabbitai bot Loading

Which issue(s) this PR fixes:

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot commented Oct 16, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

PhanLe1010 left a comment

Choose a reason for hiding this comment

derekbit left a comment

Choose a reason for hiding this comment

derekbit commented Oct 18, 2024

mergify bot commented Oct 18, 2024 • edited Loading

✅ Backports have been created

james-munson commented Oct 14, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 16, 2024 •

edited

Loading

mergify bot commented Oct 18, 2024 •

edited

Loading