-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add events when pre- or post-upgrade check fails #3211
fix: add events when pre- or post-upgrade check fails #3211
Conversation
// hang around so logs cam be collected. | ||
// TODO - make this a --ttl argument. | ||
time.Sleep(1 * time.Hour) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that with the event captured, this is not as necessary, but it might be useful. I'm not sure what to use for the time to wait. The pre-upgrade job itself has spec.activeDeadlineSeconds: 900
so the pod will be killed after 15 minutes anyway, and perhaps that is a reasonable value to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the event is emitted, does it need to sleep for minutes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. The event will last for an hour, so if a support bundle is collected in that time, the event should be there.
It would be the way to accomplish the goal in longhorn/longhorn#9448.
I can certainly take it out if preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it back. Without it, sometimes the panic from the "fatal" error means the event does not get created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it back. Without it, sometimes the panic from the "fatal" error means the event does not get created.
Can you elaborate more on the statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it back. Without it, sometimes the panic from the "fatal" error means the event does not get created.
Looks like the AI suggestion can solve this one. WDYT @james-munson ?
https://github.com/longhorn/longhorn-manager/pull/3211/files#r1803865276
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The events are queued by Event/Eventf, but not necessarily propagated to the sink before the Event() call returns. They can be lost when the next thing that happens is an os.Exit as part of the log.Fatal.
However, the AI suggestion is a good one. I tested it, and it looks like eventBroadcaster.Shutdown() forces a flush of the queued events, so they show up even without a sleep to delay the exit. I have pushed up the change.
6a1ec9c
to
cf295c1
Compare
cf295c1
to
31ffc3c
Compare
📝 Walkthrough📝 WalkthroughWalkthroughThe changes in this pull request focus on enhancing the upgrade process in the Longhorn application. Key modifications include the introduction of new constants for event handling, updates to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PreUpgrader
participant PostUpgrader
participant EventRecorder
User->>PreUpgrader: Initiate Pre-Upgrade
PreUpgrader->>EventRecorder: Create Event
PreUpgrader->>PreUpgrader: Run Pre-Upgrade Checks
PreUpgrader->>EventRecorder: Record Outcome
PreUpgrader-->>User: Pre-Upgrade Complete
User->>PostUpgrader: Initiate Post-Upgrade
PostUpgrader->>EventRecorder: Create Event
PostUpgrader->>PostUpgrader: Run Post-Upgrade Checks
PostUpgrader->>EventRecorder: Record Outcome
PostUpgrader-->>User: Post-Upgrade Complete
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (4)
app/util.go (1)
22-23
: Address the TODO comment regarding client usage.There's a TODO comment indicating that the wrapper should be removed when all clients have moved to use the clientset. This suggests that there might be ongoing refactoring or migration work.
Consider creating a tracking issue for this TODO item to ensure it's not forgotten. Additionally, it would be helpful to provide more context about the timeline or conditions for removing this wrapper.
upgrade/upgrade.go (1)
Line range hint
290-315
: LGTM: Resource status upgrades and final cleanup steps look goodThe implementation correctly handles resource status upgrades for various version paths, consistent with the previous upgrade steps. The final calls to update resource statuses, delete removed settings, and update the Longhorn version setting are crucial for maintaining system consistency after the upgrade process.
One minor suggestion for improved readability:
Consider extracting the repeated semver comparison logic into a helper function to reduce code duplication. For example:
func shouldUpgrade(currentVersion, targetVersion string) bool { return semver.Compare(currentVersion, targetVersion) < 0 } // Usage if shouldUpgrade(lhVersionBeforeUpgrade, "v1.5.0") { // Upgrade logic here }This would make the code more concise and easier to maintain.
app/post_upgrade.go (2)
29-29
: Consider renamingPostUpgradeEventer
for clarityThe constant
PostUpgradeEventer
represents the event source component name. To enhance readability and alignment with naming conventions, consider renaming it toPostUpgradeEventComponent
orPostUpgradeEventSource
.
113-113
: Consider adding context with timeout towaitManagerUpgradeComplete
The
waitManagerUpgradeComplete
method uses a fixed retry count and interval, potentially causing long waits. Incorporate a context with a timeout or deadline to allow for cancellation and better control over the waiting period.Example modification:
func (u *postUpgrader) waitManagerUpgradeComplete(ctx context.Context) error { // Use ctx in API calls and add select statements to handle cancellation. }Ensure that when calling this function, you pass an appropriate context, possibly with a timeout.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
⛔ Files ignored due to path filters (1)
coverage.out
is excluded by!**/*.out
📒 Files selected for processing (6)
- app/post_upgrade.go (3 hunks)
- app/pre_upgrade.go (3 hunks)
- app/recurring_job.go (0 hunks)
- app/util.go (1 hunks)
- constant/events.go (1 hunks)
- upgrade/upgrade.go (4 hunks)
💤 Files with no reviewable changes (1)
- app/recurring_job.go
🧰 Additional context used
🔇 Additional comments (15)
app/util.go (3)
3-12
: LGTM: Imports are well-organized and relevant.The imports are correctly structured, following good practices such as grouping and ordering. All imported packages are relevant to the function being implemented.
14-26
: LGTM: Well-implemented event broadcaster creation.The
createEventBroadcaster
function is well-structured and correctly implements the creation of a Kubernetes event broadcaster. It properly handles error cases, sets up logging, and configures event recording to a Kubernetes sink.
1-26
: Summary: New utility function aligns with PR objectives.The introduction of the
createEventBroadcaster
function inapp/util.go
aligns well with the PR objectives of enhancing the upgrade process by adding events. This utility function provides a centralized way to create event broadcasters, which can be used to log pre- and post-upgrade check results.The implementation is solid, following good practices in error handling, resource initialization, and code structure. It sets a good foundation for improving the visibility and tracking of the upgrade process as intended by this PR.
constant/events.go (1)
65-68
: LGTM! Consistent and clear event reason constants added.The new constants for upgrade events (EventReasonFailedUpgradePreCheck, EventReasonFailedUpgradePostCheck, and EventReasonPassedUpgradeCheck) are well-named and consistent with the existing naming conventions. They effectively address the PR's objective of adding events for pre- and post-upgrade checks.
The reformatting of EventReasonUpgrade improves overall consistency. The naming convention now clearly distinguishes between failure and success scenarios, addressing the concerns raised in previous discussions.
upgrade/upgrade.go (5)
Line range hint
245-259
: LGTM: Upgrade path for v1.4.x to v1.5.2 looks goodThe implementation correctly handles the upgrade paths from v1.4.x to v1.5.0 and v1.5.1 to v1.5.2. The use of semver comparison ensures that upgrades are only performed when necessary.
259-266
: LGTM: Upgrade path for v1.5.x to v1.6.0 is correctly implementedThe code segment properly handles the upgrade path from v1.5.x to v1.6.0, using semver comparison to determine if the upgrade is necessary. This implementation is consistent with the overall upgrade strategy.
Line range hint
266-279
: LGTM: Upgrade paths for v1.6.x to v1.7.1 are properly implementedThe code correctly handles the upgrade paths from v1.6.x to v1.7.0 and v1.7.0 to v1.7.1. The use of semver comparisons ensures that upgrades are performed only when necessary, maintaining consistency with the overall upgrade strategy.
Line range hint
279-290
: LGTM: Upgrade path for v1.7.x to v1.8.0 and final resource update look goodThe code correctly implements the upgrade path from v1.7.x to v1.8.0, consistent with the previous upgrade steps. The final call to
upgradeutil.UpdateResources
ensures that all resources are updated after the version-specific upgrades, which is a good practice for maintaining system consistency.
Line range hint
1-315
: Overall, the upgrade implementation looks solid and well-structuredThe changes in this file successfully enhance the upgrade process for Longhorn, addressing various version paths and ensuring proper resource and status updates. The implementation is consistent, follows a clear pattern, and aligns well with the PR objectives.
Key points:
- Proper use of semver comparisons for version checks
- Consistent handling of upgrade paths for different versions
- Appropriate updating of resources and their statuses
- Final cleanup steps to maintain system consistency
The code quality is good, with only a minor suggestion for improving readability by extracting the repeated semver comparison logic into a helper function.
app/post_upgrade.go (6)
44-47
: EnsureFlagNamespace
environment variable is properly setThe
FlagNamespace
flag is now required and usestypes.EnvPodNamespace
as its environment variable. Verify that this environment variable is correctly set in all deployment environments to prevent potential issues with namespace resolution.
79-79
: Confirm the event recorder is correctly initializedThe event recorder is initialized with the new scheme and event source. Ensure that the
scheme
includes all necessary types and thatPostUpgradeEventer
accurately represents the component emitting events.
86-88
: Check error handling fornewPostUpgrader().Run()
The error returned by
newPostUpgrader(namespace, kubeClient, eventRecorder).Run()
is assigned toerr
and returned. Ensure that any errors are appropriately logged or handled upstream to provide clear diagnostics in case of failures.
92-94
: Addition ofeventRecorder
enhances event handlingAdding the
eventRecorder
to thepostUpgrader
struct allows the upgrade process to emit events, improving observability.
97-98
: Update constructor to includeeventRecorder
The
newPostUpgrader
function now acceptseventRecorder
as a parameter, aligning with the updated struct definition. This change ensures that the recorder is properly passed and available withinpostUpgrader
.
69-73
: EnsurecreateEventBroadcaster
function is defined and error handling is comprehensiveVerify that the
createEventBroadcaster
function exists and properly initializes the event broadcaster. Ensure comprehensive error handling within this function to prevent nil returns without errors.If the function is not defined, you might need to implement it or import the correct package.
31ffc3c
to
eaa24a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (6)
- app/post_upgrade.go (3 hunks)
- app/pre_upgrade.go (3 hunks)
- app/recurring_job.go (0 hunks)
- app/util.go (1 hunks)
- constant/events.go (1 hunks)
- upgrade/upgrade.go (4 hunks)
💤 Files with no reviewable changes (1)
- app/recurring_job.go
🚧 Files skipped from review as they are similar to previous changes (4)
- app/post_upgrade.go
- app/util.go
- constant/events.go
- upgrade/upgrade.go
🧰 Additional context used
🔇 Additional comments (1)
app/pre_upgrade.go (1)
103-108
: Ensure the ObjectReference in events refers to a valid Kubernetes objectThe
ObjectReference
used ineventRecorder.Event
should refer to an existing Kubernetes object. UsingName: PreUpgradeEventer
may not correspond to a valid object, which could affect event visibility and association. Consider referencing a relevant object, such as aPod
,Deployment
, or a Longhorn custom resource.To confirm the validity of the
ObjectReference
, run the following script:If no such object exists, update the
ObjectReference
to point to an existing resource.
Signed-off-by: James Munson <[email protected]>
eaa24a8
to
e8cf7fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
app/pre_upgrade.go (2)
60-84
: LGTM: Enhanced pre-upgrade process with event recording.The changes to the
preUpgrade
function significantly improve the pre-upgrade process by adding event recording capabilities. The event broadcaster, scheme, and recorder are correctly set up and used.However, consider improving the error handling in the
newPreUpgrader().Run()
call:err = newPreUpgrader(namespace, lhClient, eventRecorder).Run() if err != nil { - logrus.Warnf("Done with Run() ... err is %v", err) + logrus.Errorf("Pre-upgrade encountered an error: %v", err) }This change provides more clarity about the nature of the log message.
96-116
: LGTM: Well-implementedRun
method with proper event recording.The
Run
method effectively encapsulates the pre-upgrade process logic, including proper event recording for both success and failure scenarios. The removal of the sleep after checks is a good improvement.Consider wrapping the error from
upgradeutil.CheckUpgradePath
for better context:if err = upgradeutil.CheckUpgradePath(u.namespace, u.lhClient, u.eventRecorder, true); err != nil { - return err + return errors.Wrap(err, "failed to check upgrade path") }This change provides more context when the error is logged or returned.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (6)
- app/post_upgrade.go (3 hunks)
- app/pre_upgrade.go (3 hunks)
- app/recurring_job.go (0 hunks)
- app/util.go (1 hunks)
- constant/events.go (1 hunks)
- upgrade/upgrade.go (4 hunks)
💤 Files with no reviewable changes (1)
- app/recurring_job.go
🚧 Files skipped from review as they are similar to previous changes (4)
- app/post_upgrade.go
- app/util.go
- constant/events.go
- upgrade/upgrade.go
🧰 Additional context used
🔇 Additional comments (5)
app/pre_upgrade.go (5)
8-8
: LGTM: New imports are appropriate for the added functionality.The new imports are necessary and correctly added to support the enhanced pre-upgrade process, including event recording and Longhorn-specific operations.
Also applies to: 10-10, 12-14, 17-17
22-24
: LGTM: New constant for event recording.The
PreUpgradeEventer
constant is well-named and appropriately used for identifying the component in event recording.
42-43
: Consider moving the completion log after error handling.The deferred log statement
defer logrus.Info("Completed pre-upgrade.")
may not execute if the program exits before returning from the function, such as whenlogrus.WithError(err).Fatalf
is called. This is because deferred functions are not run whenos.Exit()
is called withinFatalf
.To verify this behavior, we can search for similar patterns in the codebase:
#!/bin/bash # Search for deferred logs followed by Fatalf calls rg --type go 'defer\s+logrus\..*\n.*logrus\..*\.Fatalf'
86-90
: LGTM: Well-structuredpreUpgrader
struct.The
preUpgrader
struct is well-designed, containing all necessary fields for the pre-upgrade process. It encapsulates the required dependencies, promoting better organization and modularity of the code.
92-94
: LGTM: Proper constructor forpreUpgrader
.The
newPreUpgrader
function serves as a clean and concise constructor for thepreUpgrader
struct, correctly initializing all required fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mergify backport v1.6.x v1.7.x |
✅ Backports have been created
|
Which issue(s) this PR fixes:
longhorn/longhorn#9569
What this PR does / why we need it:
Add an event when upgrade pre- or post-check job completes, either with error or success message.
Special notes for your reviewer:
Additional documentation or context
Summary by CodeRabbit
New Features
Bug Fixes
Documentation