Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add case test_metric_longhorn_backup #2145

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

roger-ryao
Copy link
Contributor

@roger-ryao roger-ryao commented Oct 18, 2024

Which issue(s) this PR fixes:

longhorn/longhorn#9430

What this PR does / why we need it:

This PR adds a test case to cover the backup metrics.

Special notes for your reviewer:

@c3y1huang @ChanYiLin

Additional documentation or context

longhorn/longhorn#9429

Summary by CodeRabbit

  • New Features
    • Introduced enhanced testing for backup metrics, including validation of metrics for user-created and recurring backups.
    • Added a new helper function to improve metric validation across all nodes.
  • Bug Fixes
    • Improved existing test functions to ensure comprehensive coverage of backup operations.
  • Tests
    • Added new test functions to validate backup metrics and ensure correct reporting.
    • Defined new constants for recurring job parameters to streamline testing processes.

@roger-ryao roger-ryao requested a review from a team as a code owner October 18, 2024 08:51
Copy link

coderabbitai bot commented Oct 18, 2024

Walkthrough

The changes involve modifications to the test_metric.py file, enhancing the testing of backup metrics and recurring jobs. New imports from the backupstore module have been added, and several functions related to backup operations are introduced. Additionally, constants for recurring job parameters are defined, and a new test function is created to validate the metrics associated with backups. Existing test functions are updated to ensure comprehensive coverage of the new backup functionality.

Changes

File Change Summary
manager/integration/tests/test_metric.py - Added imports from backupstore module.
- Introduced new functions for backup operations.
- Defined new constants for recurring job parameters.
- Added test_metric_longhorn_backup to validate backup metrics.
- Modified existing test functions to integrate new backup functionality.

Poem

In the meadow, where backups grow,
New tests sprout, like seeds we sow.
Metrics dance in the morning light,
Ensuring our backups are just right.
With every hop, our code takes flight! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (1)
manager/integration/tests/test_metric.py (1)

161-175: Ensure consistent parameter naming between functions

The parameter metric_labels in wait_for_metric_sum_on_all_nodes is passed to check_metric_sum_on_all_nodes as metric_labels, but check_metric_sum_on_all_nodes defines this parameter as expected_labels. For consistency and readability, consider using the same parameter name in both functions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 460c7b0 and 8424fb5.

📒 Files selected for processing (1)
  • manager/integration/tests/test_metric.py (4 hunks)
🧰 Additional context used
🪛 Ruff
manager/integration/tests/test_metric.py

4-4: backupstore imported but unused

Remove unused import: backupstore

(F401)

Comment on lines +703 to +711
for backup in backups:
if backup['snapshotName'] == "volume-head":
continue

backup_size = int(backup['size'])
assert backup_size > 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential issue with backup size assignment

In the loop iterating over backups, the variable backup_size is assigned the size of each backup without accumulating. If multiple backups exist, backup_size will hold the size of the last backup processed. Ensure that you are capturing the correct backup size intended for the test, possibly by identifying the specific backup needed or summing the sizes if appropriate.

Comment on lines +751 to +759
for backup in backups:
if backup['snapshotName'] == "volume-head":
continue

recurring_backup_size = int(backup['size'])
assert recurring_backup_size > 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential issue with recurring backup size assignment

Similar to the user backup size, in the loop iterating over backups, recurring_backup_size is assigned the size of each backup without accumulating. If multiple backups exist, recurring_backup_size will hold the size of the last backup processed. Ensure that you are capturing the correct backup size for the recurring backup test, possibly by identifying the specific backup associated with the recurring job.

Comment on lines +701 to +712
backup_size = 0
backups = bv.backupList().data
for backup in backups:
if backup['snapshotName'] == "volume-head":
continue

backup_size = int(backup['size'])
assert backup_size > 0

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor to eliminate code duplication when calculating backup sizes

The code for obtaining backup_size (lines 701-709) and recurring_backup_size (lines 749-757) is similar. Consider refactoring this duplicated code into a helper function to improve maintainability and reduce redundancy.

Here's an example of a helper function:

def get_backup_size(backup_volume):
    backups = backup_volume.backupList().data
    for backup in backups:
        if backup['snapshotName'] == "volume-head":
            continue
        return int(backup['size'])
    return 0

You can then use this function to obtain the backup sizes:

backup_size = get_backup_size(bv)

Also applies to: 749-757

@@ -1,12 +1,14 @@
import pytest
import requests
import time
import backupstore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused import backupstore

The import statement import backupstore is not used in the code and should be removed to clean up the imports.

Apply this diff to remove the unused import:

-import backupstore
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import backupstore
🧰 Tools
🪛 Ruff

4-4: backupstore imported but unused

Remove unused import: backupstore

(F401)

@roger-ryao roger-ryao self-assigned this Oct 18, 2024
@roger-ryao roger-ryao force-pushed the issue9430 branch 2 times, most recently from 233c3f7 to 32cfb33 Compare October 18, 2024 10:13
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
manager/integration/tests/test_metric.py (1)

710-710: Add more descriptive assertion messages

The current assertions for backup size being greater than 0 lack descriptive messages. Adding more informative messages can help in debugging if the test fails.

Consider updating the assertions like this:

assert backup_size > 0, f"User backup size should be greater than 0, but got {backup_size}"
assert recurring_backup_size > 0, f"Recurring backup size should be greater than 0, but got {recurring_backup_size}"

Also applies to: 758-758

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 233c3f7 and 32cfb33.

📒 Files selected for processing (1)
  • manager/integration/tests/test_metric.py (5 hunks)
🧰 Additional context used
🔇 Additional comments (3)
manager/integration/tests/test_metric.py (3)

38-56: LGTM: New imports and constants are well-organized

The new imports from the common module and the added constants for recurring job parameters are relevant to the new test function and follow the existing code style.


160-173: LGTM: New helper function is well-implemented

The wait_for_metric_sum_on_all_nodes function provides a useful abstraction for waiting and checking metric sums across all nodes. It follows the existing code style and uses an appropriate retry mechanism.


654-773: Overall, good addition to test coverage for backup metrics

The new test function test_metric_longhorn_backup is a valuable addition to the test suite. It comprehensively covers both user-created and recurring backups, verifying the longhorn_backup_actual_size_bytes and longhorn_backup_state metrics. The implementation is generally good, following the existing code style and using appropriate helper functions.

A few suggestions for improvement have been made:

  1. Replace the fixed sleep with a more robust waiting mechanism.
  2. Refactor duplicate code for getting backup size into a helper function.
  3. Add more descriptive assertion messages.

These changes will enhance the reliability and maintainability of the test.

Comment on lines +745 to +749
# wait for the recurring backup job to run.
time.sleep(60)
bv = client.by_id_backupVolume(volume_name)
wait_for_backup_count(bv, 1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider replacing sleep with a more robust waiting mechanism

Using a fixed time.sleep(60) may introduce unnecessary delays or may not be sufficient if the backup takes longer. Consider using a polling mechanism or a wait function that checks for the specific condition you're waiting for.

Here's a suggested approach:

def wait_for_backup_completion(client, volume_name, timeout=300, interval=2):
    start_time = time.time()
    while time.time() - start_time < timeout:
        bv = client.by_id_backupVolume(volume_name)
        if len(bv.backupList().data) > 0:
            return True
        time.sleep(interval)
    raise TimeoutError(f"Backup for volume {volume_name} did not complete within {timeout} seconds")

# Replace the sleep and subsequent lines with:
wait_for_backup_completion(client, volume_name)
bv = client.by_id_backupVolume(volume_name)

Comment on lines +703 to +711
backup_size = 0
backups = bv.backupList().data
for backup in backups:
if backup['snapshotName'] == "volume-head":
continue

backup_size = int(backup['size'])
assert backup_size > 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor duplicate code for getting backup size

The code for obtaining the backup size is duplicated for user backups and recurring backups. Consider refactoring this into a helper function to improve maintainability and reduce redundancy.

Here's a suggested helper function:

def get_backup_size(backup_volume):
    backups = backup_volume.backupList().data
    for backup in backups:
        if backup['snapshotName'] == "volume-head":
            continue
        return int(backup['size'])
    return 0

# Then use it like this:
backup_size = get_backup_size(bv)
assert backup_size > 0, "Backup size should be greater than 0"

Also applies to: 751-758

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 32cfb33 and 70caa8a.

📒 Files selected for processing (1)
  • manager/integration/tests/test_metric.py (5 hunks)
🧰 Additional context used
🔇 Additional comments (9)
manager/integration/tests/test_metric.py (9)

38-44: LGTM: New imports added for backup-related functionality

The new imports are relevant to the added backup testing functionality. They include necessary functions for creating and managing backups, as well as working with recurring jobs.


46-55: LGTM: Constants added for recurring job configuration

These constants are well-defined and will be used in the new test function for configuring recurring backup jobs. The naming is clear and follows the existing coding style.


654-681: LGTM: Well-documented test function

The test function is well-documented with a clear scenario description, issue reference, and expected outcomes. This makes it easy for other developers to understand the purpose and expectations of the test.


693-711: Potential issue with backup size assignment

In the loop iterating over backups, the variable backup_size is assigned the size of each backup without accumulating. If multiple backups exist, backup_size will hold the size of the last backup processed. Ensure that you are capturing the correct backup size intended for the test, possibly by identifying the specific backup needed or summing the sizes if appropriate.


713-726: LGTM: Proper metric verification for user-initiated backup

The code correctly verifies both the size and state metrics for the user-initiated backup. The use of wait_for_metric_sum_on_all_nodes ensures that the test waits for the metrics to be updated across all nodes.


731-747: LGTM: Well-structured recurring backup job creation

The creation of the recurring backup job is well-structured and uses the previously defined constants. The use of check_recurring_jobs and wait_for_cron_job_count ensures that the job is properly created before proceeding.


746-749: Consider replacing sleep with a more robust waiting mechanism

Using a fixed time.sleep(60) may introduce unnecessary delays or may not be sufficient if the backup takes longer. Consider using a polling mechanism or a wait function that checks for the specific condition you're waiting for.

Here's a suggested approach:

def wait_for_backup_completion(client, volume_name, timeout=300, interval=2):
    start_time = time.time()
    while time.time() - start_time < timeout:
        bv = client.by_id_backupVolume(volume_name)
        if len(bv.backupList().data) > 0:
            return True
        time.sleep(interval)
    raise TimeoutError(f"Backup for volume {volume_name} did not complete within {timeout} seconds")

# Replace the sleep and subsequent lines with:
wait_for_backup_completion(client, volume_name)
bv = client.by_id_backupVolume(volume_name)

751-759: Potential issue with recurring backup size assignment

Similar to the user backup size, in the loop iterating over backups, recurring_backup_size is assigned the size of each backup without accumulating. If multiple backups exist, recurring_backup_size will hold the size of the last backup processed. Ensure that you are capturing the correct backup size for the recurring backup test, possibly by identifying the specific backup associated with the recurring job.


761-774: LGTM: Proper metric verification for recurring backup

The code correctly verifies both the size and state metrics for the recurring backup. The use of wait_for_metric_sum_on_all_nodes ensures that the test waits for the metrics to be updated across all nodes.

Comment on lines +160 to +173
def wait_for_metric_sum_on_all_nodes(client, core_api, metric_name, metric_labels, expected_value): # NOQA
for _ in range(RETRY_COUNTS):
time.sleep(RETRY_INTERVAL)

try:
check_metric_sum_on_all_nodes(client, core_api, metric_name,
metric_labels, expected_value)
return
except AssertionError:
continue

check_metric_sum_on_all_nodes(client, core_api, metric_name,
metric_labels, expected_value)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider enhancing the wait_for_metric_sum_on_all_nodes function

The function is well-implemented, but consider the following improvements:

  1. Add parameters for RETRY_COUNTS and RETRY_INTERVAL to make the function more flexible.
  2. Consider using exponential backoff instead of fixed intervals for more efficient retrying.
  3. Add logging to provide more visibility into the waiting process.
  4. Consider returning a boolean to indicate success or failure instead of relying on an exception.

Example implementation:

def wait_for_metric_sum_on_all_nodes(client, core_api, metric_name, metric_labels, expected_value, max_wait_time=300, initial_interval=1):
    start_time = time.time()
    interval = initial_interval
    while time.time() - start_time < max_wait_time:
        try:
            check_metric_sum_on_all_nodes(client, core_api, metric_name, metric_labels, expected_value)
            return True
        except AssertionError:
            time.sleep(interval)
            interval = min(interval * 2, 60)  # exponential backoff, max 60 seconds
    
    return False

This implementation provides more flexibility and better handles long-running waits.

Comment on lines +684 to +691
# create a volume and attach it to a node.
volume_size = 50 * Mi
client.create_volume(name=volume_name,
numberOfReplicas=1,
size=str(volume_size))
volume = wait_for_volume_detached(client, volume_name)
volume.attach(hostId=self_hostId)
volume = wait_for_volume_healthy(client, volume_name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider parameterizing volume creation

The volume creation process is hardcoded. Consider parameterizing the volume size and number of replicas to make the test more flexible and reusable.

Example:

def create_test_volume(client, name, size=50*Mi, replicas=1):
    client.create_volume(name=name, numberOfReplicas=replicas, size=str(size))
    volume = wait_for_volume_detached(client, name)
    volume.attach(hostId=get_self_host_id())
    return wait_for_volume_healthy(client, name)

volume = create_test_volume(client, volume_name)

Comment on lines +728 to +729
# delete the existing backup before creating a recurring backup job.
delete_backup_volume(client, volume_name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding error handling for backup volume deletion

The delete_backup_volume call should be wrapped in a try-except block to handle potential errors during deletion. This will make the test more robust.

Example:

try:
    delete_backup_volume(client, volume_name)
except Exception as e:
    pytest.fail(f"Failed to delete backup volume: {str(e)}")

3)

# delete the existing backup before creating a recurring backup job.
delete_backup_volume(client, volume_name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @c3y1huang

In my test case test_metric_longhorn_backup, I have referenced wait_for_metric_count_all_nodes for similar usage. However, I have a question regarding my test case design. Although I have deleted the backups of volumes that were not created by the recurring job, I am concerned about potential data caching issues when using the same volume to apply a backup recurring job to check the Longhorn backup metric. Could you please help review this test case?

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After longhorn/longhorn-manager#3216 was merged, my test case passed, and I did not observe any potential data caching issues when using the same volume to apply a backup recurring job to check the Longhorn backup metrics.

Screenshot_20241024_154554

@yangchiu
Copy link
Member

yangchiu commented Nov 6, 2024

cc @ChanYiLin @c3y1huang as it relates to longhorn/longhorn#9429

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants