ci: support v2 data engine for hal cluster #2147

yangchiu · 2024-10-22T05:46:51Z

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9671

What this PR does / why we need it:

support v2 data engine for hal cluster

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

New Features
- Enhanced test environment configurability with dynamic disk path management.
- Updated Longhorn CLI to the latest stable version for improved installation processes.
- Expanded Terraform configurations for better resource management in Rancher environments.
Bug Fixes
- Improved error handling in installation scripts to ensure reliability during setup.
Documentation
- Added new packages and commands in user data for enhanced service management and kernel module loading.

Signed-off-by: Yang Chiu <[email protected]>

coderabbitai · 2024-10-22T05:46:58Z

Caution

Review failed

The pull request is closed.

Walkthrough

The pull request includes modifications to several scripts and configuration files. Key changes involve the addition of the OperatingSystem library and dynamic variable assignments in the common.resource file, enhancing environment configurability. The longhorn-setup.sh scripts have been updated to use the latest version of longhornctl, while the Terraform configuration file has been modified to improve disk settings and user data for better resource management in a Rancher environment.

Changes

File Path	Change Summary
e2e/keywords/common.resource	Added `OperatingSystem` library; introduced `${host_provider}` and `${disk_path}` variables; updated `Set test environment` keyword to use dynamic `disk_path`.
pipelines/e2e/scripts/longhorn-setup.sh	Updated `longhornctl` version from `v1.7.1-rc2` to `v1.7.2`; maintained structure and error handling.
test_framework/scripts/longhorn-setup.sh	Updated Longhorn CLI version from `v1.7.1-rc2` to `v1.7.2`; enhanced installation process and error handling.
test_framework/terraform/harvester/ubuntu/main.tf	Updated disk configuration, enhanced `user_data` section with new packages and commands; improved resource management.

Assessment against linked issues

Objective	Addressed	Explanation
Support v2 volume for HAL cluster (#9671)	❓	It is unclear if the changes directly address the issue related to hugepages capacity.

Possibly related PRs

ci: fix test failures caused by unstable harvester cluster #2144: The changes in the common.resource file enhance the configurability of the test environment, which may relate to the improvements in the testing framework aimed at stabilizing tests in the Harvester cluster.
fix(robot): fix typo when passing parameters to uninstallation test case #2148: The modifications in the common.resource regarding dynamic variable assignments and environment variable checks may connect with the handling of environment variables in the Longhorn test pod as addressed in PR fix(robot): fix typo when passing parameters to uninstallation test case #2148.

Suggested reviewers

chriscchien

Poem

In the land of code where rabbits play,
New paths are set, come what may.
With disks that dance and scripts that sing,
Our test environment's a wondrous thing!
Hops of joy in every line,
Code so bright, it surely will shine! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)

pipelines/e2e/scripts/longhorn-setup.sh (1)
Line range hint 1-78: Consider enhancing error handling throughout the script.

While the longhornctl_check function has error handling, other critical operations in the script might benefit from additional checks. For example, consider adding error checks after key operations like creating secrets, installing backupstores, or installing the CSI snapshotter.

Here's an example of how you could enhance error handling:
install_backupstores
if [ $? -ne 0 ]; then
    echo "Failed to install backupstores"
    exit 1
fi
Adding similar checks throughout the script would improve its robustness and make troubleshooting easier.
e2e/keywords/common.resource (1)
35-37: Approve with suggestion: Dynamic disk path assignment

The dynamic assignment of disk_path based on the host_provider is a good improvement. It allows for different disk configurations depending on the environment, which is crucial for supporting various setups including the HAL cluster mentioned in the PR objectives.

However, there's room for improvement:

The current implementation assumes '/dev/xvdh' for all non-Harvester environments, which might not be universally correct.

Consider making the disk path more configurable for non-Harvester environments.

Consider refactoring the disk path assignment to be more flexible:
${default_disk_path}=    Get Environment Variable    DEFAULT_DISK_PATH    /dev/xvdh
${disk_path}=    Set Variable If    "${host_provider}" == "harvester"    /dev/vdc    ${default_disk_path}
This approach allows setting a default disk path via an environment variable, falling back to '/dev/xvdh' if not specified. This provides more flexibility for different environments while maintaining the current behavior as a default.
test_framework/scripts/longhorn-setup.sh (2)

Line range hint 1-562: Consider modularizing the script for better maintainability

While the current change is minimal and correct, the overall script is quite long and complex. Consider breaking it down into smaller, more manageable modules or separate scripts for different functionalities (e.g., installation, testing, upgrade). This would improve readability and maintainability in the long run.

Additionally, adding more inline comments explaining the purpose and functionality of each major section would greatly enhance the script's readability and make it easier for other developers to understand and maintain.

Line range hint 1-562: Enhance error handling and logging

To improve the robustness and debuggability of the script, consider implementing the following enhancements:

Add a logging function to standardize log output format and severity levels.

Implement more comprehensive error handling, especially for critical operations.

Consider adding a cleanup function to handle interruptions or failures gracefully.

These improvements would make the script more resilient and easier to troubleshoot in case of issues.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 86eb70b and b113b1d.

📒 Files selected for processing (4)

e2e/keywords/common.resource (2 hunks)
pipelines/e2e/scripts/longhorn-setup.sh (1 hunks)
test_framework/scripts/longhorn-setup.sh (1 hunks)
test_framework/terraform/harvester/ubuntu/main.tf (2 hunks)

🧰 Additional context used

🔇 Additional comments (11)

pipelines/e2e/scripts/longhorn-setup.sh (2)
29-29: Verify if the longhornctl version update aligns with PR objectives.

The update to longhornctl v1.7.2 seems appropriate, but we should verify if this change is explicitly mentioned in the PR objectives or if it's part of the broader goal of supporting the v2 data engine for the HAL cluster.

To ensure this change aligns with the PR objectives:

Check if the version update is mentioned in the PR description or linked issues.

Verify if v1.7.2 of longhornctl is compatible with or required for the v2 data engine support.

Could you please clarify if this version update is a necessary part of implementing v2 data engine support for the HAL cluster?

29-29: Approve the update to longhornctl version.

The change from v1.7.1-rc2 to v1.7.2 for longhornctl is appropriate. This update moves from a release candidate to a stable version, which is generally a good practice for production environments.

To ensure consistency across the codebase, let's verify if this version update has been applied uniformly:

This will help us confirm that all references to longhornctl have been updated to v1.7.2 throughout the project.

✅ Verification successful

Approve the update to longhornctl version.

The update to longhornctl from v1.7.1-rc2 to v1.7.2 is consistent across the codebase and does not introduce any discrepancies.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for consistent longhornctl version across the codebase

# Test: Search for longhornctl version references
rg -i 'longhornctl.*v1\.7\.' --type sh
Length of output: 40

Script:
#!/bin/bash
# Description: Verify consistent longhornctl version references across the entire codebase

# Test: Search for longhornctl version references in all file types with a flexible pattern
rg -i 'longhornctl.*v1\.7\.\d+' 
Length of output: 33
e2e/keywords/common.resource (3)

4-4: LGTM: Addition of OperatingSystem library

The inclusion of the OperatingSystem library is a good addition. It provides useful keywords for OS-related tasks, which can be beneficial for file system operations and environment variable handling in the test automation process.

34-34: LGTM: Dynamic host provider configuration

The introduction of the host_provider variable using the Get Environment Variable keyword is a good practice. It allows for dynamic configuration based on the host environment, which enhances the flexibility and reusability of the test suite across different setups, including the HAL cluster mentioned in the PR objectives.

Line range hint 4-37: Summary: Good improvements with minor suggestion

Overall, the changes in this file are well-aligned with the PR objectives of supporting the v2 data engine for the HAL cluster. The additions and modifications enhance the flexibility and configurability of the test environment, which is crucial for supporting various setups.

Key improvements:

Addition of the OperatingSystem library for enhanced OS-related operations.

Introduction of dynamic host provider configuration.

Implementation of conditional disk path assignment.

The only suggestion is to consider making the disk path assignment more flexible for non-Harvester environments, as detailed in the previous comment.

These changes contribute positively to the test suite's ability to handle different environments, including the HAL cluster, and support the v2 data engine as intended.

test_framework/scripts/longhorn-setup.sh (1)

Line range hint 356-362: Longhorn CLI version updated to v1.7.2

The Longhorn CLI version has been updated from v1.7.1-rc2 to v1.7.2. This change ensures that the script uses the latest stable version of the Longhorn CLI.

To ensure this change is consistent across the project, please run the following command:

This will help identify any other scripts that might need to be updated to use the new version.
test_framework/terraform/harvester/ubuntu/main.tf (5)
82-83: Confirm the necessity of added packages cryptsetup and dmsetup

The packages cryptsetup and dmsetup have been added to the installation list. These are typically used for disk encryption and device mapping.

Ensure that these packages are required for your setup. If disk encryption or device management features are needed, verify that subsequent configurations utilize these packages appropriately.

89-94: Review the disabling of the multipathd service

The following commands have been added to stop and disable the multipathd service and socket:
- systemctl stop multipathd.service
- systemctl stop multipathd.socket
- systemctl disable multipathd.service
- systemctl disable multipathd.socket
Confirm that disabling multipathd will not affect storage devices that rely on multipath I/O in your environment.

95-105: Load necessary kernel modules and ensure persistence

Kernel modules are being loaded and set to persist across reboots:

Modules loaded:

uio

uio_pci_generic

vfio_pci

nvme-tcp

dm_crypt

These modules are added to /etc/modules-load.d/modules.conf for persistence.

Ensure that all these modules are required for your application, and that they are compatible with the system's kernel version.

106-107: Set up hugepages configuration correctly

Hugepages are configured with:

Runtime setting: echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

Persistent setting: echo "vm.nr_hugepages=1024" >> /etc/sysctl.conf

Make sure that allocating 1024 hugepages (2 MiB each) aligns with your application's memory requirements and that the system has sufficient resources.

51-58: Verify the updated disk configuration and image references

The disk_info section has been modified:

The imageName is updated to "longhorn-qa/image-kkkjv". Ensure that this image exists and is accessible in the longhorn-qa namespace.

A new disk configuration is added with storageClassName set to "harvester-longhorn", a size of 100, and a bootOrder of 2.

Make sure that the new storage class is correctly set up in your environment and that the boot order is intended.

Run the following script to confirm the existence of the image and storage class:

ci: support v2 data engine for hal cluster

b113b1d

Signed-off-by: Yang Chiu <[email protected]>

yangchiu self-assigned this Oct 22, 2024

yangchiu requested a review from a team as a code owner October 22, 2024 05:46

coderabbitai bot reviewed Oct 22, 2024

View reviewed changes

khushboo-rancher approved these changes Oct 25, 2024

View reviewed changes

khushboo-rancher enabled auto-merge (rebase) October 25, 2024 18:24

Merge branch 'master' into hal-v2

1eb36aa

khushboo-rancher merged commit 942e57e into longhorn:master Oct 25, 2024
1 of 2 checks passed

This was referenced Oct 28, 2024

ci: support rockylinux for hal cluster #2152

Open

ci: support sle-micro for hal cluster #2150

Merged

ci: support sles for hal cluster #2149

Merged

ci: support cifs and azurite on pytest and e2e #2155

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: support v2 data engine for hal cluster #2147

ci: support v2 data engine for hal cluster #2147

yangchiu commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

Review failed

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

ci: support v2 data engine for hal cluster #2147

ci: support v2 data engine for hal cluster #2147

Conversation

yangchiu commented Oct 22, 2024 • edited by coderabbitai bot Loading

Which issue(s) this PR fixes:

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

coderabbitai bot commented Oct 22, 2024 • edited Loading

Review failed

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

yangchiu commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)