-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iperf3 tests need control over bind address to support tests with NAT'd hosts #1476
Comments
Add support for the bind_address parameter to be read from the iperf3.conf file and use it to override bind address interpretation from the test spec. This enables pscheduler throughput tests against hosts that are deployed in NAT environments where their public IP used by clients is different from their local network address. The config provides the local knowledge to use the correct bind address. Proposed fix for perfsonar#1476
Thanks for creating this issue and for the PR. I'm experiencing the same issue, so would also be interested in a resolution for this issue. |
The better way to deal with this is:
|
Thanks @mfeit-internet2 , that's indeed a valid workaround that works. What also works is to add the options However I'd still be interested in a more permanent solution going forward that doesn't require further manual adjustments. |
Thanks Mark. This /etc/hosts based work around looks promising. I'll give
it a try.
One issue with some of our ps builds is that they don't have registered
host names. Fixable but has caused us to favor IP addresses initially for
these cases.
…On Thu, Oct 10, 2024, 10:46 AM Mark Feit ***@***.***> wrote:
The better way to deal with this is:
- Put an entry in the NATted host's /etc/hosts that points the outside
host's name at the inside address (e.g., 10.9.8.7
outside.cloud.example.org).
- Make sure the resolver is configured to query hosts before dns.
- Always refer to the host by its FQDN and never by its outside IP
when configuring tests.
—
Reply to this email directly, view it on GitHub
<#1476 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABO6MNLFOJKTDHWB5FDO3TZ22OOJAVCNFSM6AAAAABPLXX36GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBVGQ3TQMRSGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi again, perhaps this should go on a separate issue (let me know if you'd like me to open a new one) but since it's very related I'll post here first: There's the same issue affecting owampd, where it will try to bind to the address specified as the destination. Unfortunately in this case the workaround of using /etc/hosts does not seem to work. With that /etc/hosts edit, iperf3 works just fine, and Is this expected? If there isn't and changing the code is required, would you accept a patch that works similarly to the one in this PR, where it's possible to optionally override the listen address for UDP tests? I could look into writing it if you'd like to see a patch. (Having a config option in this sense I think would make sense, but let me know if you have a specific opinion on how this should be done) Thanks! |
@pllopis That should be filed against the owamp repository. The OWAMP protocol embeds addresses in its payloads and I'd have to think about whether or not making alterations to support NAT would conform to RFC 4656. |
@mfeit-internet2 I've set up an install of 5.1.4 on Jetstream2 with the name-based configuration you recommended for NAT environments. https://js2.ps.rundmz.projects.rcops.dev This works for reaching the host and performing ad hoc tests. But I'm still not seeing scheduled tests results show up in our campus toolkit install. I have our site configured to test every 6 hours but don't see any throughput results showing up. The trace tests appear to report correctly. I have my NAT'd ps node set up with it's correct FQDN.
Here are the tests configured by our campus ps node (ps-sd.rc.uab.edu) showing up on my js2 instance:
Here are those tests from the ps-sd side:
I'm noticing that the js2 node reports the tests under it's hostname without the domain name. Not sure if that's the source of the problem. Do you have any suggestions where to look to fix this config? |
It appears, throughput results from ps-sd->js2 have started to show up. I'm wondering if these one-directional results relate to the URL hostname vs fqdn differences noted above. Does the throughput task takes its name from the equivalent of the |
Bidirectional tests with the NAT'd node still appear to have problems. I change the NAT'd device hostname to be the FQDN, rather than rely on system config to provide the hostname+domain.
This does ensure that the FQDN for the NAT host shows up in it's own view of the scheduled test URLs reported by pscheduler. This doesn't appear to resolve the bi-directional test not working. Inspecting the scheduled tests, it appears they are run and succeeding in both directions. This is the record of bidirectional tests registered
The test URLs show that there are results in both directions. For some reason, however, the web UI of the testing node does not show the tests results for when the NAT'd node is the source (second test above). Looking at the results of a test, shows that iperf test did execute and generated results: However, I cannot access those results through the web UI dashboard: |
Looking at the tests from the perspective of the NAT'd node suggests it sees the tests as failing.
In the above the result URL the reported failure is reportedly due to not being able to reach the started iperf server: {"added":"2024-11-01T04:34:06+00:00","state":"failed","errors":null,"result":{"diags":"/usr/bin/iperf3 -p 5201 -4 -B js2.ps.rundmz.projects.rcops.dev -c 138.26.220.66 -t 20 --json --rsa-public-key-path /var/pscheduler-server/runner/tmp/tmpf0q07ueb/tmpkp6skygy/public-key --username 9VGmQPkKmNOYAm0jirbR","error":"iperf3 returned an error: unable to connect to server - server may have stopped running or use a different port, firewall issue, etc.: Connection refused","succeeded":false},"duration":"PT3S","end-time":"2024-11-01T17:13:08+00:00","priority":null,"start-time":"2024-11-01T17:13:05+00:00","limit-diags":"Hints:\n requester: 138.26.220.66\n server: 10.1.221.182\nIdentified as everybody\nClassified as default\nApplication: Defaults applied to non-friendly hosts\n Group 1: Limit 'allowed-tests' passed\n Group 1: Limit 'throughput-default-parallel' passed\n Group 1: Limit 'throughput-default-time' passed\n Group 1: Limit 'throughput-default-udp' passed\n Group 1: Want all, 4/4 passed, 0/4 failed: PASS\n Application PASSES\nPassed one application. Stopping.\nProposal meets limits\nPriority set at 0:\n Initial priority (Set to 0)","participant":0,"result-full":[{"diags":"/usr/bin/iperf3 -p 5201 -4 -B js2.ps.rundmz.projects.rcops.dev -c 138.26.220.66 -t 20 --json --rsa-public-key-path /var/pscheduler-server/runner/tmp/tmpf0q07ueb/tmpkp6skygy/public-key --username 9VGmQPkKmNOYAm0jirbR","error":"iperf3 returned an error: unable to connect to server - server may have stopped running or use a different port, firewall issue, etc.: Connection refused","succeeded":false},null],"clock-survey":[{"time":"2024-11-01T13:13:34.472013-04:00","offset":5.435943603515625e-05,"source":"ntp","reference":"secondary reference (2) from 130.207.244.240","synchronized":true},{"time":"2024-11-01T12:13:34.788386-05:00","offset":2.258e-06,"source":"chrony","reference":"23.155.40.38","synchronized":true}],"participants":["js2.ps.rundmz.projects.rcops.dev","138.26.220.66"],"result-merged":{"diags":"Participant 0:\n/usr/bin/iperf3 -p 5201 -4 -B js2.ps.rundmz.projects.rcops.dev -c 138.26.220.66 -t 20 --json --rsa-public-key-path /var/pscheduler-server/runner/tmp/tmpf0q07ueb/tmpkp6skygy/public-key --username 9VGmQPkKmNOYAm0jirbR\n","error":"iperf3 returned an error: unable to connect to server - server may have stopped running or use a different port, firewall issue, etc.: Connection refused","succeeded":false},"state-display":"Failed","participant-data":{"schema":3,"iperf3-version":"3.17.1"},"participant-data-full":[{"schema":3,"iperf3-version":"3.17.1"},{"_auth":null,"schema":3,"server_port":5201,"iperf3-version":"3.17.1"}],"href":"https://js2.ps.rundmz.projects.rcops.dev/pscheduler/tasks/3ebf604f-54d6-4eb6-9ccc-3dbf94f439f3/runs/f4e99390-6cf0-4bf1-9a3b-906172843fab","task-href":"https://js2.ps.rundmz.projects.rcops.dev/pscheduler/tasks/3ebf604f-54d6-4eb6-9ccc-3dbf94f439f3","result-href":"https://js2.ps.rundmz.projects.rcops.dev/pscheduler/tasks/3ebf604f-54d6-4eb6-9ccc-3dbf94f439f3/runs/f4e99390-6cf0-4bf1-9a3b-906172843fab/result"} |
An additional note on the tests reported by the NAT node that is confusing. It seems that for a test js2->ps-sd as seen from the NAT'd node the URL referenced for the test appears to be for a test in the reverse direction, ie. data flowing to the js2 node (the listner). Here is a failed test record:
Inspecting the results URL indicates failure but for a test in the opposite direction.
|
Using pscheduler to test throughput against servers running in NAT'd environments, like cloud hosting, requires the test to use the public IP of the instance from the external test participant but use the instance internal (often private) IP from the NAT'd server.
The pscheduler task subcommand and throughput test have some support for specify bind addresses, but these do not get translated to the correct bind behavior when running iperf3.
An easy option is to avoid specifying an iperf3 bind parameter "-B" for command invocations. This causes iperf3 to bind on any interface and allows the test to proceed. Unfortunately, the only why to cause iperf3 to be called in this way is to submit pscheduler thoughput tasks with only a destination host parameter, e.g.
pscheduler task throughput -d <destination>
. These test specifications work to test throughput to and from a NAT'd host but require that the test is submitted from a shell on source host (so that the source address is implied).This works because the code is written to not use the -B bind parameter to iperf3 if there is no source host provided in the test specification, in run_client() and run_server().
This doesn't work with throughput tests specify both a source and destination host. In those cases, the -B parameter is passed to the server and the client iperf3 invocations and will default to the public IP of the NAT'd host. This causes the iperf3 bind to fail and kills the command. The pscheduler task result then reports a timeout and the test fails. Specifying the various bind parameters of the
task
subcommand orthroughput
test doesn't work either because these do not make it through to the iperf3 command construction in a way that makes sense for the NAT'd use-case.Since scheduled tests and third-party tests specify both source and destination hosts in the test specification the existing code prevents these tests from working with hosts that are using NAT. This prevents regular test for hosts in cloud environments.
The text was updated successfully, but these errors were encountered: