-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add case test_metric_longhorn_backup #2145
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThe changes involve modifications to the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (1)
manager/integration/tests/test_metric.py (1)
161-175
: Ensure consistent parameter naming between functionsThe parameter
metric_labels
inwait_for_metric_sum_on_all_nodes
is passed tocheck_metric_sum_on_all_nodes
asmetric_labels
, butcheck_metric_sum_on_all_nodes
defines this parameter asexpected_labels
. For consistency and readability, consider using the same parameter name in both functions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- manager/integration/tests/test_metric.py (4 hunks)
🧰 Additional context used
🪛 Ruff
manager/integration/tests/test_metric.py
4-4:
backupstore
imported but unusedRemove unused import:
backupstore
(F401)
for backup in backups: | ||
if backup['snapshotName'] == "volume-head": | ||
continue | ||
|
||
backup_size = int(backup['size']) | ||
assert backup_size > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential issue with backup size assignment
In the loop iterating over backups, the variable backup_size
is assigned the size of each backup without accumulating. If multiple backups exist, backup_size
will hold the size of the last backup processed. Ensure that you are capturing the correct backup size intended for the test, possibly by identifying the specific backup needed or summing the sizes if appropriate.
for backup in backups: | ||
if backup['snapshotName'] == "volume-head": | ||
continue | ||
|
||
recurring_backup_size = int(backup['size']) | ||
assert recurring_backup_size > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential issue with recurring backup size assignment
Similar to the user backup size, in the loop iterating over backups, recurring_backup_size
is assigned the size of each backup without accumulating. If multiple backups exist, recurring_backup_size
will hold the size of the last backup processed. Ensure that you are capturing the correct backup size for the recurring backup test, possibly by identifying the specific backup associated with the recurring job.
backup_size = 0 | ||
backups = bv.backupList().data | ||
for backup in backups: | ||
if backup['snapshotName'] == "volume-head": | ||
continue | ||
|
||
backup_size = int(backup['size']) | ||
assert backup_size > 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor to eliminate code duplication when calculating backup sizes
The code for obtaining backup_size
(lines 701-709) and recurring_backup_size
(lines 749-757) is similar. Consider refactoring this duplicated code into a helper function to improve maintainability and reduce redundancy.
Here's an example of a helper function:
def get_backup_size(backup_volume):
backups = backup_volume.backupList().data
for backup in backups:
if backup['snapshotName'] == "volume-head":
continue
return int(backup['size'])
return 0
You can then use this function to obtain the backup sizes:
backup_size = get_backup_size(bv)
Also applies to: 749-757
@@ -1,12 +1,14 @@ | |||
import pytest | |||
import requests | |||
import time | |||
import backupstore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import backupstore
The import statement import backupstore
is not used in the code and should be removed to clean up the imports.
Apply this diff to remove the unused import:
-import backupstore
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import backupstore |
🧰 Tools
🪛 Ruff
4-4:
backupstore
imported but unusedRemove unused import:
backupstore
(F401)
233c3f7
to
32cfb33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
manager/integration/tests/test_metric.py (1)
710-710
: Add more descriptive assertion messagesThe current assertions for backup size being greater than 0 lack descriptive messages. Adding more informative messages can help in debugging if the test fails.
Consider updating the assertions like this:
assert backup_size > 0, f"User backup size should be greater than 0, but got {backup_size}" assert recurring_backup_size > 0, f"Recurring backup size should be greater than 0, but got {recurring_backup_size}"Also applies to: 758-758
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- manager/integration/tests/test_metric.py (5 hunks)
🧰 Additional context used
🔇 Additional comments (3)
manager/integration/tests/test_metric.py (3)
38-56
: LGTM: New imports and constants are well-organizedThe new imports from the
common
module and the added constants for recurring job parameters are relevant to the new test function and follow the existing code style.
160-173
: LGTM: New helper function is well-implementedThe
wait_for_metric_sum_on_all_nodes
function provides a useful abstraction for waiting and checking metric sums across all nodes. It follows the existing code style and uses an appropriate retry mechanism.
654-773
: Overall, good addition to test coverage for backup metricsThe new test function
test_metric_longhorn_backup
is a valuable addition to the test suite. It comprehensively covers both user-created and recurring backups, verifying thelonghorn_backup_actual_size_bytes
andlonghorn_backup_state
metrics. The implementation is generally good, following the existing code style and using appropriate helper functions.A few suggestions for improvement have been made:
- Replace the fixed sleep with a more robust waiting mechanism.
- Refactor duplicate code for getting backup size into a helper function.
- Add more descriptive assertion messages.
These changes will enhance the reliability and maintainability of the test.
# wait for the recurring backup job to run. | ||
time.sleep(60) | ||
bv = client.by_id_backupVolume(volume_name) | ||
wait_for_backup_count(bv, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider replacing sleep with a more robust waiting mechanism
Using a fixed time.sleep(60)
may introduce unnecessary delays or may not be sufficient if the backup takes longer. Consider using a polling mechanism or a wait function that checks for the specific condition you're waiting for.
Here's a suggested approach:
def wait_for_backup_completion(client, volume_name, timeout=300, interval=2):
start_time = time.time()
while time.time() - start_time < timeout:
bv = client.by_id_backupVolume(volume_name)
if len(bv.backupList().data) > 0:
return True
time.sleep(interval)
raise TimeoutError(f"Backup for volume {volume_name} did not complete within {timeout} seconds")
# Replace the sleep and subsequent lines with:
wait_for_backup_completion(client, volume_name)
bv = client.by_id_backupVolume(volume_name)
backup_size = 0 | ||
backups = bv.backupList().data | ||
for backup in backups: | ||
if backup['snapshotName'] == "volume-head": | ||
continue | ||
|
||
backup_size = int(backup['size']) | ||
assert backup_size > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor duplicate code for getting backup size
The code for obtaining the backup size is duplicated for user backups and recurring backups. Consider refactoring this into a helper function to improve maintainability and reduce redundancy.
Here's a suggested helper function:
def get_backup_size(backup_volume):
backups = backup_volume.backupList().data
for backup in backups:
if backup['snapshotName'] == "volume-head":
continue
return int(backup['size'])
return 0
# Then use it like this:
backup_size = get_backup_size(bv)
assert backup_size > 0, "Backup size should be greater than 0"
Also applies to: 751-758
longhorn/longhorn#9430 Signed-off-by: Roger Yao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- manager/integration/tests/test_metric.py (5 hunks)
🧰 Additional context used
🔇 Additional comments (9)
manager/integration/tests/test_metric.py (9)
38-44
: LGTM: New imports added for backup-related functionalityThe new imports are relevant to the added backup testing functionality. They include necessary functions for creating and managing backups, as well as working with recurring jobs.
46-55
: LGTM: Constants added for recurring job configurationThese constants are well-defined and will be used in the new test function for configuring recurring backup jobs. The naming is clear and follows the existing coding style.
654-681
: LGTM: Well-documented test functionThe test function is well-documented with a clear scenario description, issue reference, and expected outcomes. This makes it easy for other developers to understand the purpose and expectations of the test.
693-711
: Potential issue with backup size assignmentIn the loop iterating over backups, the variable
backup_size
is assigned the size of each backup without accumulating. If multiple backups exist,backup_size
will hold the size of the last backup processed. Ensure that you are capturing the correct backup size intended for the test, possibly by identifying the specific backup needed or summing the sizes if appropriate.
713-726
: LGTM: Proper metric verification for user-initiated backupThe code correctly verifies both the size and state metrics for the user-initiated backup. The use of
wait_for_metric_sum_on_all_nodes
ensures that the test waits for the metrics to be updated across all nodes.
731-747
: LGTM: Well-structured recurring backup job creationThe creation of the recurring backup job is well-structured and uses the previously defined constants. The use of
check_recurring_jobs
andwait_for_cron_job_count
ensures that the job is properly created before proceeding.
746-749
: Consider replacing sleep with a more robust waiting mechanismUsing a fixed
time.sleep(60)
may introduce unnecessary delays or may not be sufficient if the backup takes longer. Consider using a polling mechanism or a wait function that checks for the specific condition you're waiting for.Here's a suggested approach:
def wait_for_backup_completion(client, volume_name, timeout=300, interval=2): start_time = time.time() while time.time() - start_time < timeout: bv = client.by_id_backupVolume(volume_name) if len(bv.backupList().data) > 0: return True time.sleep(interval) raise TimeoutError(f"Backup for volume {volume_name} did not complete within {timeout} seconds") # Replace the sleep and subsequent lines with: wait_for_backup_completion(client, volume_name) bv = client.by_id_backupVolume(volume_name)
751-759
: Potential issue with recurring backup size assignmentSimilar to the user backup size, in the loop iterating over backups,
recurring_backup_size
is assigned the size of each backup without accumulating. If multiple backups exist,recurring_backup_size
will hold the size of the last backup processed. Ensure that you are capturing the correct backup size for the recurring backup test, possibly by identifying the specific backup associated with the recurring job.
761-774
: LGTM: Proper metric verification for recurring backupThe code correctly verifies both the size and state metrics for the recurring backup. The use of
wait_for_metric_sum_on_all_nodes
ensures that the test waits for the metrics to be updated across all nodes.
def wait_for_metric_sum_on_all_nodes(client, core_api, metric_name, metric_labels, expected_value): # NOQA | ||
for _ in range(RETRY_COUNTS): | ||
time.sleep(RETRY_INTERVAL) | ||
|
||
try: | ||
check_metric_sum_on_all_nodes(client, core_api, metric_name, | ||
metric_labels, expected_value) | ||
return | ||
except AssertionError: | ||
continue | ||
|
||
check_metric_sum_on_all_nodes(client, core_api, metric_name, | ||
metric_labels, expected_value) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider enhancing the wait_for_metric_sum_on_all_nodes function
The function is well-implemented, but consider the following improvements:
- Add parameters for
RETRY_COUNTS
andRETRY_INTERVAL
to make the function more flexible. - Consider using exponential backoff instead of fixed intervals for more efficient retrying.
- Add logging to provide more visibility into the waiting process.
- Consider returning a boolean to indicate success or failure instead of relying on an exception.
Example implementation:
def wait_for_metric_sum_on_all_nodes(client, core_api, metric_name, metric_labels, expected_value, max_wait_time=300, initial_interval=1):
start_time = time.time()
interval = initial_interval
while time.time() - start_time < max_wait_time:
try:
check_metric_sum_on_all_nodes(client, core_api, metric_name, metric_labels, expected_value)
return True
except AssertionError:
time.sleep(interval)
interval = min(interval * 2, 60) # exponential backoff, max 60 seconds
return False
This implementation provides more flexibility and better handles long-running waits.
# create a volume and attach it to a node. | ||
volume_size = 50 * Mi | ||
client.create_volume(name=volume_name, | ||
numberOfReplicas=1, | ||
size=str(volume_size)) | ||
volume = wait_for_volume_detached(client, volume_name) | ||
volume.attach(hostId=self_hostId) | ||
volume = wait_for_volume_healthy(client, volume_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider parameterizing volume creation
The volume creation process is hardcoded. Consider parameterizing the volume size and number of replicas to make the test more flexible and reusable.
Example:
def create_test_volume(client, name, size=50*Mi, replicas=1):
client.create_volume(name=name, numberOfReplicas=replicas, size=str(size))
volume = wait_for_volume_detached(client, name)
volume.attach(hostId=get_self_host_id())
return wait_for_volume_healthy(client, name)
volume = create_test_volume(client, volume_name)
# delete the existing backup before creating a recurring backup job. | ||
delete_backup_volume(client, volume_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding error handling for backup volume deletion
The delete_backup_volume
call should be wrapped in a try-except block to handle potential errors during deletion. This will make the test more robust.
Example:
try:
delete_backup_volume(client, volume_name)
except Exception as e:
pytest.fail(f"Failed to delete backup volume: {str(e)}")
3) | ||
|
||
# delete the existing backup before creating a recurring backup job. | ||
delete_backup_volume(client, volume_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @c3y1huang
In my test case test_metric_longhorn_backup
, I have referenced wait_for_metric_count_all_nodes
for similar usage. However, I have a question regarding my test case design. Although I have deleted the backups of volumes that were not created by the recurring job, I am concerned about potential data caching issues when using the same volume to apply a backup recurring job to check the Longhorn backup metric. Could you please help review this test case?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After longhorn/longhorn-manager#3216 was merged, my test case passed, and I did not observe any potential data caching issues when using the same volume to apply a backup recurring job to check the Longhorn backup metrics.
cc @ChanYiLin @c3y1huang as it relates to longhorn/longhorn#9429 |
Which issue(s) this PR fixes:
longhorn/longhorn#9430
What this PR does / why we need it:
This PR adds a test case to cover the backup metrics.
Special notes for your reviewer:
@c3y1huang @ChanYiLin
Additional documentation or context
longhorn/longhorn#9429
Summary by CodeRabbit