Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmware: Fall back to vmtoolsd if vmware-rpctool errs #4444

Merged
merged 1 commit into from
Sep 26, 2023

Conversation

akutz
Copy link
Contributor

@akutz akutz commented Sep 18, 2023

Addresses #4436

Proposed Commit Message

vmware: Fallback to vmtoolsd if rpctool errs

This patch udpates the ds-identify script and the
VMware datasource to fall back to using the
vmtoolsd program if vmware-rpctool errors.

Fixes GH-4436

Additional Context

I validated the output and exit code for both commands:

$ vmtoolsd --cmd "info-get guestinfo.userdata.encoding"; echo $?
gzip+base64
0
$ vmware-rpctool "info-get guestinfo.userdata.encoding"; echo $?
gzip+base64
0
$ vmtoolsd --cmd "info-get guestinfo.userdata.enc"; echo $?
No value found
1
$ vmware-rpctool "info-get guestinfo.userdata.enc"; echo $?
No value found
1

Test Steps

  • Locally I verified the OVF datasource with:

    make clean_pyc && PYTHONPATH="$(pwd)" python3 -m pytest -v --log-level=DEBUG tests/unittests/sources/test_ovf.py 
  • Locally I verified the VMware datasource with:

    make clean_pyc && PYTHONPATH="$(pwd)" python3 -m pytest -v --log-level=DEBUG tests/unittests/sources/test_vmware.py 
  • Locally I verified ds-identify with:

    make clean_pyc && PYTHONPATH="$(pwd)" python3 -m pytest -v --log-level=DEBUG tests/unittests/test_ds_identify.py
  • I also copied the modified ds-identify and datasources to a VM and ran the following:

      sudo rm -f cloud-init*.log; sudo cloud-init clean; sudo cloud-init init

    Everything behaved as expected.

  • I also replaced /usr/bin/vmware-rpctool with:

    #!/bin/sh
    
    exit 1

    And ran:

    sudo rm -f cloud-init*.log; sudo cloud-init clean; sudo cloud-init init

    And /var/log/cloud-init.log showed the expected behavior occurred:

    2023-09-18 18:29:04,244 - util.py[WARNING]: Failed to get guestinfo value for key metadata: Unexpected error while running command.
    Command: ['/usr/bin/vmware-rpctool', 'info-get guestinfo.metadata']
    Exit code: 1
    Reason: -
    Stdout: 
    Stderr: 
    2023-09-18 18:29:04,244 - util.py[DEBUG]: Failed to get guestinfo value for key metadata: Unexpected error while running command.
    Command: ['/usr/bin/vmware-rpctool', 'info-get guestinfo.metadata']
    Exit code: 1
    Reason: -
    Stdout: 
    Stderr: 
    Traceback (most recent call last):
      File "/usr/lib/python3.11/site-packages/cloudinit/sources/DataSourceVMware.py", line 551, in guestinfo_get_value
        (stdout, stderr) = subp(args)
                          ^^^^^^^^^^
      File "/usr/lib/python3.11/site-packages/cloudinit/subp.py", line 335, in subp
        raise ProcessExecutionError(
    cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
    Command: ['/usr/bin/vmware-rpctool', 'info-get guestinfo.metadata']
    Exit code: 1
    Reason: -
    Stdout: 
    Stderr: 
    2023-09-18 18:29:04,246 - DataSourceVMware.py[DEBUG]: Getting guestinfo value for key metadata with /usr/bin/vmtoolsd
    2023-09-18 18:29:04,246 - subp.py[DEBUG]: Running command ['/usr/bin/vmtoolsd', '--cmd', 'info-get guestinfo.metadata'] with allowed return codes [0] (shell=False, capture=True)
    2023-09-18 18:29:04,295 - DataSourceVMware.py[DEBUG]: Getting guestinfo value for key metadata.encoding with /usr/bin/vmtoolsd
    2023-09-18 18:29:04,295 - subp.py[DEBUG]: Running command ['/usr/bin/vmtoolsd', '--cmd', 'info-get guestinfo.metadata.encoding'] with allowed return codes [0] (shell=False, capture=True)
    2023-09-18 18:29:04,329 - DataSourceVMware.py[DEBUG]: Getting encoded data for key=guestinfo.metadata, enc=gzip+base64
    2023-09-18 18:29:04,330 - DataSourceVMware.py[DEBUG]: Decoding gzip+base64 format guestinfo.metadata
    2023-09-18 18:29:04,330 - DataSourceVMware.py[DEBUG]: Getting guestinfo value for key userdata with /usr/bin/vmtoolsd
    2023-09-18 18:29:04,330 - subp.py[DEBUG]: Running command ['/usr/bin/vmtoolsd', '--cmd', 'info-get guestinfo.userdata'] with allowed return codes [0] (shell=False, capture=True)
    2023-09-18 18:29:04,373 - DataSourceVMware.py[DEBUG]: Getting guestinfo value for key userdata.encoding with /usr/bin/vmtoolsd
    2023-09-18 18:29:04,373 - subp.py[DEBUG]: Running command ['/usr/bin/vmtoolsd', '--cmd', 'info-get guestinfo.userdata.encoding'] with allowed return codes [0] (shell=False, capture=True)
    2023-09-18 18:29:04,411 - DataSourceVMware.py[DEBUG]: Getting encoded data for key=guestinfo.userdata, enc=gzip+base64
    2023-09-18 18:29:04,411 - DataSourceVMware.py[DEBUG]: Decoding gzip+base64 format guestinfo.userdata
    2023-09-18 18:29:04,411 - DataSourceVMware.py[DEBUG]: Getting guestinfo value for key vendordata with /usr/bin/vmtoolsd
    
  • With /usr/bin/vmware-rpctool still replaced, I validated ds-identify by running sudo rm /var/run/cloud-init/.ds-identify.result /var/run/cloud-init/ds-identify.log and then running sudo /usr/lib/cloud-init/ds-identify. The resulting log fail showed the VMware datasource was still detected:

    $ sudo stat /var/run/cloud-init/ds-identify.log 
      File: /var/run/cloud-init/ds-identify.log
      Size: 1360      	Blocks: 8          IO Block: 4096   regular file
    Device: 0,23	Inode: 2318        Links: 1
    Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2023-09-18 19:54:51.286717602 +0000
    Modify: 2023-09-18 19:54:48.650490201 +0000
    Change: 2023-09-18 19:54:48.650490201 +0000
     Birth: 2023-09-18 19:54:48.498477087 +0000
    [up 1725229.52s] ds-identify 
    policy loaded: mode=search report=false found=all maybe=all notfound=disabled
    /etc/cloud/cloud.cfg set datasource_list: ['NoCloud', 'ConfigDrive', 'OpenStack', 'VMware', None]
    DMI_PRODUCT_NAME=VMware20,1
    DMI_SYS_VENDOR=VMware, Inc.
    DMI_PRODUCT_SERIAL=VMware-42 1a b5 b3 64 97 90 e6-35 7b d8 97 66 df c3 d0
    DMI_PRODUCT_UUID=b3b51a42-9764-e690-357b-d89766dfc3d0
    PID_1_PRODUCT_NAME=unavailable
    DMI_CHASSIS_ASSET_TAG=No Asset Tag
    DMI_BOARD_NAME=440BX Desktop Reference Platform
    FS_LABELS=
    ISO9660_DEVS=
    KERNEL_CMDLINE=BOOT_IMAGE=/boot/vmlinuz-6.1.10-10.ph5-esx root=PARTUUID=c724cded-cf33-4d1c-b510-59478c322f84 init=/lib/systemd/systemd rcupdate.rcu_expedited=1 rw systemd.show_status=0 quiet noreplace-smp cpu_init_udelay=0 net.ifnames=0 plymouth.enable=0 systemd.unified_cgroup_hierarchy=yes
    VIRT=vmware
    UNAME_KERNEL_NAME=Linux
    UNAME_KERNEL_RELEASE=6.1.10-10.ph5-esx
    UNAME_KERNEL_VERSION=#1-photon SMP Mon Apr 24 22:51:08 UTC 2023
    UNAME_MACHINE=x86_64
    UNAME_NODENAME=photon5-w-tpm
    UNAME_OPERATING_SYSTEM=GNU/Linux
    DSNAME=
    DSLIST=NoCloud ConfigDrive OpenStack VMware None
    MODE=search
    ON_FOUND=all
    ON_MAYBE=all
    ON_NOTFOUND=disabled
    pid=24540 ppid=24539
    is_container=false
    is_ds_enabled(IBMCloud) = false.
    is_ds_enabled(IBMCloud) = false.
    check for 'VMware' returned found
    Found single datasource: VMware
    [up 1725229.68s] returning 0
    

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@akutz akutz force-pushed the bugfix/vmware-rpctool-fallback branch 4 times, most recently from 80479ce to f9a0ba4 Compare September 18, 2023 18:34
@akutz akutz force-pushed the bugfix/vmware-rpctool-fallback branch 7 times, most recently from e306216 to f1930b8 Compare September 18, 2023 20:34
@akutz
Copy link
Contributor Author

akutz commented Sep 18, 2023

@TheRealFalcon I think the integration test is failing for an unrelated reason. I looked into it, and it's a wireguard module test:

FAILED tests/integration_tests/modules/test_wireguard.py::TestWireguardWithoutKmod::test_wireguard_tools_installed

@akutz akutz force-pushed the bugfix/vmware-rpctool-fallback branch from f1930b8 to 4fb5978 Compare September 18, 2023 21:47
@akutz
Copy link
Contributor Author

akutz commented Sep 19, 2023

Okay @TheRealFalcon , it's ready for review. Cc @PengpengSun

@akutz akutz force-pushed the bugfix/vmware-rpctool-fallback branch from 4fb5978 to cdeb997 Compare September 19, 2023 17:14
@akutz
Copy link
Contributor Author

akutz commented Sep 19, 2023

@TheRealFalcon / @PengpengSun,

I spent an hour refactoring DataSourceVMware a bit to support better unit testing around the fallback case. All tests pass.

@akutz akutz force-pushed the bugfix/vmware-rpctool-fallback branch from cdeb997 to 490a255 Compare September 19, 2023 17:43
@TheRealFalcon TheRealFalcon self-assigned this Sep 20, 2023
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akutz Thanks for this change! I left some inline comments, but good change overall.

cloudinit/sources/DataSourceOVF.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceOVF.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceOVF.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceVMware.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceVMware.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceVMware.py Outdated Show resolved Hide resolved
doc/rtd/reference/datasources/ovf.rst Outdated Show resolved Hide resolved
doc/rtd/reference/datasources/vmware.rst Outdated Show resolved Hide resolved
tools/ds-identify Outdated Show resolved Hide resolved
tools/ds-identify Outdated Show resolved Hide resolved
@akutz akutz force-pushed the bugfix/vmware-rpctool-fallback branch 5 times, most recently from 6f3ea96 to a0386f4 Compare September 21, 2023 18:32
@akutz
Copy link
Contributor Author

akutz commented Sep 21, 2023

Hi @TheRealFalcon,

I am missing something. The tests are passing for me locally, but failing on GH actions.

@TheRealFalcon
Copy link
Member

@akutz

From the tests/unittests/sources/test_vmware.py::TestDataSourceVMwareIMC::test_get_data_cloudinit_metadata_json failure:

E           cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
E           Command: ['/usr/bin/vmware-rpctool', 'info-get guestinfo.metadata']
E           Exit code: 255
E           Reason: -
E           Stdout: 
E           Stderr: Error: /usr/bin/vmware-rpctool must be run inside a virtual machine on a VMware hypervisor product.

Are you missing a mock somewhere?

@akutz
Copy link
Contributor Author

akutz commented Sep 21, 2023

@akutz

From the tests/unittests/sources/test_vmware.py::TestDataSourceVMwareIMC::test_get_data_cloudinit_metadata_json failure:

E           cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
E           Command: ['/usr/bin/vmware-rpctool', 'info-get guestinfo.metadata']
E           Exit code: 255
E           Reason: -
E           Stdout: 
E           Stderr: Error: /usr/bin/vmware-rpctool must be run inside a virtual machine on a VMware hypervisor product.

Are you missing a mock somewhere?

Quite possibly? But why would it work locally?

@TheRealFalcon
Copy link
Member

TheRealFalcon commented Sep 21, 2023

Quite possibly? But why would it work locally?

It looks like it might actually exist on these runners. I don't have it locally and the test passes for me, so I think the missing binary is triggering the normal fallback, but the exit code on the runners is causing us to raise early because of the 255 exit code.

@akutz
Copy link
Contributor Author

akutz commented Sep 21, 2023

Quite possibly? But why would it work locally?

It looks like it might actually exist on these runners. I don't have it locally and the test passes for me, so I think the missing binary triggering the normal fallback, but the exit code on the runners is causing us to raise early because of the 255 exit code.

Huh. I don't have it locally either. I tested for it being missing, not existing and returning 255. Let me think about that.

@akutz akutz force-pushed the bugfix/vmware-rpctool-fallback branch 2 times, most recently from 4310dbd to 5cfd96f Compare September 21, 2023 20:52
@akutz
Copy link
Contributor Author

akutz commented Sep 21, 2023

Quite possibly? But why would it work locally?

It looks like it might actually exist on these runners. I don't have it locally and the test passes for me, so I think the missing binary triggering the normal fallback, but the exit code on the runners is causing us to raise early because of the 255 exit code.

Huh. I don't have it locally either. I tested for it being missing, not existing and returning 255. Let me think about that.

@TheRealFalcon I was able to repro it locally by adding the same command to my PATH and have it exit with 255. My latest change addresses the issue. I did not think there was any issue with raising the exception, but I forgot that this prevents additional sub-platforms from being found.

@akutz akutz force-pushed the bugfix/vmware-rpctool-fallback branch from 5cfd96f to 6f907f3 Compare September 21, 2023 20:58
tools/ds-identify Outdated Show resolved Hide resolved
@akutz akutz force-pushed the bugfix/vmware-rpctool-fallback branch from 6f907f3 to fc7b92b Compare September 21, 2023 21:06
@akutz akutz force-pushed the bugfix/vmware-rpctool-fallback branch 3 times, most recently from 58a0268 to 90a3c28 Compare September 21, 2023 21:51
This patch udpates the ds-identify script and the VMware datasource
to fall back to using the vmtoolsd program if vmware-rpctool errors.
@akutz akutz force-pushed the bugfix/vmware-rpctool-fallback branch from 90a3c28 to 964e6ab Compare September 22, 2023 14:39
@akutz
Copy link
Contributor Author

akutz commented Sep 22, 2023

@TheRealFalcon I verified this works on a real system again. Please note the reason I just force pushed one more time is to add a little extra logging. It was not working on the real system, but the reason was not apparent. It ended up being because the current codebase has b64 in atomic_helper not util where it used to be. My system had the old location. But that error was being swallowed. Anyway, hopefully this is finally good to go :)

@akutz
Copy link
Contributor Author

akutz commented Sep 26, 2023

Hi @TheRealFalcon,

Please let me know if there is anything else needed to resolve this PR. Thanks!

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akutz ! LGTM!

@TheRealFalcon TheRealFalcon merged commit 3a031a7 into canonical:main Sep 26, 2023
26 checks passed
@akutz akutz deleted the bugfix/vmware-rpctool-fallback branch September 26, 2023 22:37
@playerla
Copy link

Hi,
Have you been able to test with secure RPC ? That was the reason behind this MR (#4436)

Just got the daily build cloud-init_23.3.daily-202309271631-d9cdc298~ubuntu20.04.1_all.deb. As intented it fallbacks to vmtoolsd. But unexpectedly still get Permission Denied

I opened #4475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants