Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update agentstate to include task network namespace and default interface name to populate task network config #4315

Merged
merged 1 commit into from
Sep 12, 2024

Conversation

mye956
Copy link
Contributor

@mye956 mye956 commented Sep 4, 2024

Summary

This PR will update the agent state so that it keeps track of the task network namespace path and default interface name of the corresponding task namespace. This is a follow up to the following PRs:

where we will be populating the new TaskNetworkConfig struct and its fields for the TaskResponse .

Note: There will be a follow up PR to actually obtain and populate the DefaultIfname on host mode.

Implementation details

  • agent/api/task/task.go
    • Added new DefaultIfname field to the Task struct to keep track of the default interface name of the task
    • Added new Getters and Setters for NetworkNamespace and DefaultIfname
  • agent/engine/docker_task_engine.go
    • Obtaining the task network namespace from the CNI config that we get from buildCNIConfigFromTaskContainerAwsvpc and then setting it as the value for the task NetworkNamespace
    • Obtaining the default interface name from the NetworkConfigs of the CNI config that we get from buildCNIConfigFromTaskContainerAwsvpc. By default on linux, the interface name is eth0 and the first entry within the NetworkConfigs list should correspond to the task ENI that was passed in from the task payload. Otherwise, this would mean we're trying to start a task in AWSVPC without a corresponding task ENI
  • agent/handlers/v4/tmdsstate.go
  • Now setting the values of FaultInjectionEnabled and TaskNetworkConfig for the TaskResponse based on the agent state (before, it was empty)
  • No longer restricting these values based on whether the task is FaultInjectionEnabled or not for future proofing
  • If the task is running on host mode, we will always set the task network namespace path to be host so that it can be used in the concurrent map introduced in Add read/write lock in the fault handler #4309 for the fault handlers It won't be used in the actual fault injection implementation
  • ecs-agent/tmds/handlers/v4/state/response.go
    • New function to create a new NewTaskNetworkConfig pointer object based on the passed in networkMode, networkNamespacePath, and interfaceName. This is called in agent/handlers/v4/tmdsstate.go

Testing

  • Added new tests TestV4GetTaskMetadataWithTaskNetworkConfig in agent/handlers/task_server_setup_test.go to test whether or not we can create/populate the TaskNetworkConfig of the TaskResponse based on the corresponding task
  • Modified TestProvisionContainerResourcesAwsvpcSetPausePIDInVolumeResources in agent/engine/docker_task_engine_test.go so that it can also check that the DefaultIfname and NetworkNamespace of the task is being set properly

Manual Testing
Tested whether or not the DefaultIfname and NetworkNamespace can be loaded back after an agent restart by first launching a task in AWSVPC mode that will call the TMDS endpoint to get the task metadata.

Note: Additional logging statements were added but not pushed as part of the changes in order to better see the expected behavior/results.

Results:

Task start up

vel=debug time=2024-09-05T19:13:00Z msg="Handling http request" method="GET" from="169.254.172.2:42614"
level=info time=2024-09-05T19:13:00Z msg="Writing response for v4 task metadata" tmdsEndpointContainerID="d1105a10-be0b-4f8a-82c7-56a19a9ee517" taskARN="arn:aws:ecs:us-west-2:113424923516:task/default/35467b7441554e759180b43f13b6e3eb"
level=info time=2024-09-05T19:13:00Z msg="[DEBUG] Task Network config from task metadata request" networkMode="awsvpc" networkNamespace="/host/proc/12141/ns/net" defaultInterfaceName="eth0"
level=debug time=2024-09-05T19:13:00Z msg="Received non-transition events" task="35467b7441554e759180b43f13b6e3eb"
level=debug time=2024-09-05T19:13:00Z msg="Updating task's known status" task="35467b7441554e759180b43f13b6e3eb"
level=debug time=2024-09-05T19:13:00Z msg="Found container with earliest known status" container="test" knownStatus=RUNNING desiredStatus=RUNNING task="35467b7441554e759180b43f13b6e3eb"
level=debug time=2024-09-05T19:13:00Z msg="Updating task's desired status" taskKnownStatus="RUNNING" taskDesiredStatus="RUNNING" nContainers=2 nENIs=1 taskFamily="tmds-test" taskVersion="4" taskArn="arn:aws:ecs:us-west-2:113424923516:task/default/35467b7441554e759180b43f13b6e3eb"

After restart agent

level=info time=2024-09-05T19:15:08Z msg="Event stream ContainerChange start listening..." module=eventstream.go
level=info time=2024-09-05T19:15:08Z msg="[DEBUG] Loaded task from local state. Task default interface name: eth0, Task default network namespace: /host/proc/12141/ns/net, Task network mode: awsvpc" module=data.go
level=debug time=2024-09-05T19:15:08Z msg="Setting cluster to default; none configured" module=agent.go
level=info time=2024-09-05T19:15:08Z msg="Cluster was successfully restored" cluster="default"
level=debug time=2024-09-05T19:15:08Z msg="Loading pause container tarball:" image="/images/amazon-ecs-pause.tar"
level=debug time=2024-09-05T19:15:08Z msg="Inspecting container image: " image="amazon/amazon-ecs-pause:0.1.0"
level=debug time=2024-09-05T19:15:08Z msg="Setting up ENI Watcher" module=agent_unix.go
level=info time=2024-09-05T19:15:08Z msg="eni watcher has been initialized" module=watcher_linux.go
level=debug time=2024-09-05T19:15:15Z msg="Handling http request" method="GET" from="169.254.172.2:57386"
level=info time=2024-09-05T19:15:15Z msg="Writing response for v4 task metadata" tmdsEndpointContainerID="d1105a10-be0b-4f8a-82c7-56a19a9ee517" taskARN="arn:aws:ecs:us-west-2:113424923516:task/default/35467b7441554e759180b43f13b6e3eb"
level=info time=2024-09-05T19:15:15Z msg="[DEBUG] Task Network config from task metadata request" networkNamespace="/host/proc/12141/ns/net" defaultInterfaceName="eth0" networkMode="awsvpc"

Host Mode

level=debug time=2024-09-05T19:18:09Z msg="Handling http request" method="GET" from="172.31.47.213:40596"
level=info time=2024-09-05T19:18:09Z msg="Writing response for v4 task metadata" tmdsEndpointContainerID="d70285f4-d6a1-4538-81b4-c91f92cb9174" taskARN="arn:aws:ecs:us-west-2:113424923516:task/default/1c4a5f45c75c4638a7da1b5133a840b0"
level=info time=2024-09-05T19:18:09Z msg="[DEBUG] Task Network config from task metadata request" networkMode="host" networkNamespace="host" defaultInterfaceName=""

After restart

level=info time=2024-09-05T19:19:03Z msg="[DEBUG] Loaded task from local state. Task default interface name: , Task default network namespace: , Task network mode: host" module=data.go
level=debug time=2024-09-05T19:19:03Z msg="Setting cluster to default; none configured" module=agent.go
level=info time=2024-09-05T19:19:03Z msg="Cluster was successfully restored" cluster="default"
level=debug time=2024-09-05T19:19:03Z msg="Loading pause container tarball:" image="/images/amazon-ecs-pause.tar"
level=debug time=2024-09-05T19:19:09Z msg="Handling http request" method="GET" from="172.31.47.213:57346"
level=info time=2024-09-05T19:19:09Z msg="Writing response for v4 task metadata" tmdsEndpointContainerID="d70285f4-d6a1-4538-81b4-c91f92cb9174" taskARN="arn:aws:ecs:us-west-2:113424923516:task/default/1c4a5f45c75c4638a7da1b5133a840b0"
level=info time=2024-09-05T19:19:09Z msg="[DEBUG] Task Network config from task metadata request" networkMode="host" networkNamespace="host" defaultInterfaceName=""

New tests cover the changes: Yes

Description for the changelog

Feature: Update agent state to initialize and populate task default interface name and network namespace

Additional Information

Does this PR include breaking model changes? If so, Have you added transformation functions?

Does this PR include the addition of new environment variables in the README?

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@mye956 mye956 requested a review from a team as a code owner September 4, 2024 23:03
@mye956 mye956 force-pushed the fis-agentstate branch 7 times, most recently from 6d1fd7b to 37fb05b Compare September 5, 2024 04:05
@mye956 mye956 added the bot/test label Sep 5, 2024
@mye956 mye956 force-pushed the fis-agentstate branch 3 times, most recently from 8c59d8d to b38e6ce Compare September 5, 2024 18:36
@mye956 mye956 changed the title WIP DO NOT REVIEW Update agentstate to include task network namespace and default interface name to populate task network config Sep 5, 2024
@mye956 mye956 added the bot/test label Sep 5, 2024
Copy link
Contributor

@xxx0624 xxx0624 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

NetworkNamespace string `json:"NetworkNamespace,omitempty"`

// TODO: Will need to initialize/set the value in a follow PR
FaultInjectionEnabled bool `json:"FaultInjectionEnabled,omitempty"`

DefaultIfname string `json:"DefaultIfname,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be good to have a comment about what the defaultIfname means in different mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can eth0 -> it can be eth0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we populate DefaultIfname for Host mode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add a comment for expected value for Bridge mode

ecs-agent/tmds/handlers/v4/state/response.go Outdated Show resolved Hide resolved
xxx0624
xxx0624 previously approved these changes Sep 5, 2024
NetworkNamespace string `json:"NetworkNamespace,omitempty"`

// TODO: Will need to initialize/set the value in a follow PR
FaultInjectionEnabled bool `json:"FaultInjectionEnabled,omitempty"`

DefaultIfname string `json:"DefaultIfname,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can eth0 -> it can be eth0?

xxx0624
xxx0624 previously approved these changes Sep 6, 2024
@@ -2358,6 +2358,12 @@ func (engine *DockerTaskEngine) provisionContainerResourcesAwsvpc(task *apitask.
field.TaskID: task.GetID(),
"ip": taskIP,
})
task.SetNetworkNamespace(cniConfig.ContainerNetNS)
// Note: By default, the interface name is set to eth0 within the CNI configs. We can also always assume that the first entry of the CNI network config to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this ordering true? Or is it always config for task ENI? Can there be other Network configs returned in the slice here other than the config for task ENI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the name eth0 assigned by Agent? Do we make any assumption on the string being eth0? Are there any assumptions of host interface name?

Copy link
Contributor Author

@mye956 mye956 Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, there can only be one ENI that will be used for a task in AWSVPC (and it seems to be true for trunk ENI as well). For reference, this is what I'm referring to on why we can make this assumption -> https://github.com/aws/amazon-ecs-agent/blob/master/agent/api/task/task_linux.go#L285

It does seem like there are other network configs that will be appended/added but the primary task ENI/network config is what's being processed first. FWIW, we're also sort of already making an assumption that the very first entry of the list of task ENI will be the primary network interface here -> https://github.com/aws/amazon-ecs-agent/blob/master/agent/api/task/task.go#L2826

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the name eth0 assigned by Agent? Do we make any assumption on the string being eth0? Are there any assumptions of host interface name?

Yep, the name eth0 is the default ENI name on linux and it's set here .

From what I can tell, seems like we might be making an assumption for eth0? -> https://github.com/search?q=repo%3Aaws%2Famazon-ecs-agent%20eth0&type=code

As for host mode, no there is not and we can't make this assumption as it can vary on platforms and hardware

NetworkNamespace string `json:"NetworkNamespace,omitempty"`

// TODO: Will need to initialize/set the value in a follow PR
FaultInjectionEnabled bool `json:"FaultInjectionEnabled,omitempty"`

DefaultIfname string `json:"DefaultIfname,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we populate DefaultIfname for Host mode?

NetworkNamespace string `json:"NetworkNamespace,omitempty"`

// TODO: Will need to initialize/set the value in a follow PR
FaultInjectionEnabled bool `json:"FaultInjectionEnabled,omitempty"`

DefaultIfname string `json:"DefaultIfname,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add a comment for expected value for Bridge mode

Copy link
Contributor Author

@mye956 mye956 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prateekchaudhry

Where do we populate DefaultIfname for Host mode?

This will be done a near future PR. Currently, agent doesn't know the default host network interface.

We can also add a comment for expected value for Bridge mode

It isn't being obtained for the rest of the network modes like bridge in these changes and we don't have plans currently to do so.

@mye956 mye956 merged commit e75627d into aws:dev Sep 12, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants