Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to run NFF-go in AWS EC2 with ena driver #718

Open
guesslin opened this issue Oct 8, 2020 · 2 comments
Open

Failed to run NFF-go in AWS EC2 with ena driver #718

guesslin opened this issue Oct 8, 2020 · 2 comments

Comments

@guesslin
Copy link
Contributor

guesslin commented Oct 8, 2020

Hi, I have a problem that I can't run nff-go on AWS EC2 instance. I got some error messages from DPDK about the init port failure with the ENA driver.

  • error message about DPDK port init failure
Oct 08 02:56:02 ip-172-31-41-87 router[18195]: Invalid value for nb_tx_desc(=2048), should be: <= 1024, >= 128, and a product of 1
Oct 08 02:56:02 ip-172-31-41-87 router[18195]: ERROR: Cannot init port  0 !
Oct 08 02:56:02 ip-172-31-41-87 router[18200]: Invalid value for nb_tx_desc(=2048), should be: <= 1024, >= 128, and a 
Full message
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: ------------***-------- Initializing DPDK --------***------------
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL: Detected 2 lcore(s)
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL: Detected 1 NUMA nodes
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL: Selected IOVA mode 'PA'
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL: No available hugepages reported in hugepages-1048576kB
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL: Probing VFIO support...
Oct 08 02:56:01 ip-172-31-41-87 router[18200]: EAL: Probing VFIO support...
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL: PCI device 0000:00:05.0 on NUMA socket -1
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL:   Invalid NUMA socket, default to 0
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL: PCI device 0000:00:06.0 on NUMA socket -1
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL:   Invalid NUMA socket, default to 0
Oct 08 02:56:01 ip-172-31-41-87 router[18195]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 08 02:56:01 ip-172-31-41-87 router[18200]: EAL: PCI device 0000:00:05.0 on NUMA socket -1
Oct 08 02:56:01 ip-172-31-41-87 router[18200]: EAL:   Invalid NUMA socket, default to 0
Oct 08 02:56:01 ip-172-31-41-87 router[18200]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 08 02:56:01 ip-172-31-41-87 router[18200]: EAL: PCI device 0000:00:06.0 on NUMA socket -1
Oct 08 02:56:01 ip-172-31-41-87 router[18200]: EAL:   Invalid NUMA socket, default to 0
Oct 08 02:56:01 ip-172-31-41-87 router[18200]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 08 02:56:02 ip-172-31-41-87 router[18195]: PMD: LLQ is not supported. Fallback to host mode policy.
Oct 08 02:56:02 ip-172-31-41-87 router[18195]: PMD: Placement policy: Regular
Oct 08 02:56:02 ip-172-31-41-87 router[18200]: PMD: LLQ is not supported. Fallback to host mode policy.
Oct 08 02:56:02 ip-172-31-41-87 router[18200]: PMD: Placement policy: Regular
Oct 08 02:56:02 ip-172-31-41-87 router[18195]: ------------***------ Initializing scheduler -----***------------
Oct 08 02:56:02 ip-172-31-41-87 router[18195]: DEBUG: Scheduler can use cores: [0 1]
Oct 08 02:56:02 ip-172-31-41-87 router[18195]: ------------***---------- Creating ports ---------***------------
Oct 08 02:56:02 ip-172-31-41-87 router[18195]: Invalid value for nb_tx_desc(=2048), should be: <= 1024, >= 128, and a product of 1
Oct 08 02:56:02 ip-172-31-41-87 router[18195]: ERROR: Cannot init port  0 !
Oct 08 02:56:02 ip-172-31-41-87 router[18200]: Invalid value for nb_tx_desc(=2048), should be: <= 1024, >= 128, and a product of 1

I checked with https://github.com/DPDK/dpdk/blob/main/lib/librte_ethdev/rte_ethdev.c#L2019-L2034 generated this error message. And in nff-go/internel/low/low.h, set the nb_tx_desc to 2048.

I tried to reduce TX_RING_SIZE to 1024, but got another warning message but still can't process packets from DPDK flow.

  • error message
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: WARNING: Can't start new clone for segment1 instance 0
Full message
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: ------------***-------- Initializing DPDK --------***------------
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL: Detected 2 lcore(s)
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL: Detected 1 NUMA nodes
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL: Selected IOVA mode 'PA'
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL: No available hugepages reported in hugepages-1048576kB
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL: Probing VFIO support...
Oct 08 04:55:31 ip-172-31-41-87 router[20235]: EAL: Probing VFIO support...
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL: PCI device 0000:00:05.0 on NUMA socket -1
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL:   Invalid NUMA socket, default to 0
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL: PCI device 0000:00:06.0 on NUMA socket -1
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL:   Invalid NUMA socket, default to 0
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 08 04:55:31 ip-172-31-41-87 router[20235]: EAL: PCI device 0000:00:05.0 on NUMA socket -1
Oct 08 04:55:31 ip-172-31-41-87 router[20235]: EAL:   Invalid NUMA socket, default to 0
Oct 08 04:55:31 ip-172-31-41-87 router[20235]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 08 04:55:31 ip-172-31-41-87 router[20235]: EAL: PCI device 0000:00:06.0 on NUMA socket -1
Oct 08 04:55:31 ip-172-31-41-87 router[20235]: EAL:   Invalid NUMA socket, default to 0
Oct 08 04:55:31 ip-172-31-41-87 router[20235]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: PMD: LLQ is not supported. Fallback to host mode policy.
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: PMD: Placement policy: Regular
Oct 08 04:55:31 ip-172-31-41-87 router[20235]: PMD: LLQ is not supported. Fallback to host mode policy.
Oct 08 04:55:31 ip-172-31-41-87 router[20235]: PMD: Placement policy: Regular
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: ------------***------ Initializing scheduler -----***------------
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: DEBUG: Scheduler can use cores: [0 1]
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: ------------***---------- Creating ports ---------***------------
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: DEBUG: Port 0 MAC address: 06:10:b8:ab:99:db
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: ------------***------ Starting FlowFunctions -----***------------
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: DEBUG: Start SCHEDULER at 0 core
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: DEBUG: Start STOP at scheduler 0 core
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: DEBUG: Start new instance for receiverPort
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: 1
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: DEBUG: Start new clone for receiverPort
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: 1 instance 0 at 1 core
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: DEBUG: Start new instance for segment1
Oct 08 04:55:31 ip-172-31-41-87 router[20230]: WARNING: Can't start new clone for segment1 instance 0

Here's the information about the environment.

  • Linux distribution: Ubuntu 18.04 LTS
  • AWS instance type: t3.large
  • linux kernel version: 4.15.0-1065-aws
  • nff-go version: v0.9.2
  • ethtool -i ens6
driver: ena
version: 2.2.10g
firmware-version:
expansion-rom-version:
bus-info: 0000:00:06.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
  • ifconfig
ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 172.31.41.87  netmask 255.255.240.0  broadcast 172.31.47.255
        inet6 fe80::4f9:c1ff:fee6:a36f  prefixlen 64  scopeid 0x20<link>
        ether 06:f9:c1:e6:a3:6f  txqueuelen 1000  (Ethernet)
        RX packets 146118  bytes 189039782 (189.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 49107  bytes 4985206 (4.9 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 172.31.47.232  netmask 255.255.240.0  broadcast 172.31.47.255
        inet6 fe80::410:b8ff:feab:99db  prefixlen 64  scopeid 0x20<link>
        ether 06:10:b8:ab:99:db  txqueuelen 1000  (Ethernet)
        RX packets 192  bytes 15232 (15.2 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 82  bytes 4244 (4.2 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 3404  bytes 254476 (254.4 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3404  bytes 254476 (254.4 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
  • lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
00:04.0 Non-Volatile memory controller: Amazon.com, Inc. Device 8061
00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
00:06.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
@gshimansky
Copy link
Contributor

It looks like ENA driver doesn't support this number of TX rings. You can try to pass an appropriate value into arguments of SystemInit https://github.com/intel-go/nff-go/blob/7ff09bf9d84c823f55fc99f770be6ea7ceeedb1c/flow/flow.go#L587

@guesslin
Copy link
Contributor Author

HI @gshimansky, I tried to pass different values to TXQueuesNumberPerPort in the flow.Config (0 ~ 4), but the problem is still happening.

config := &flow.Config{
        HWTXChecksum:          true,
        TXQueuesNumberPerPort: values,        // XXX: tried [0-8], but didn't change the error message
}
flow.SystemInit(config)
journal logs
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: ------------***-------- Initializing DPDK --------***------------
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL: Detected 2 lcore(s)
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL: Detected 1 NUMA nodes
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL: Selected IOVA mode 'PA'
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL: No available hugepages reported in hugepages-1048576kB
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL: Probing VFIO support...
Oct 13 06:26:05 ip-172-31-41-87 router[2391]: EAL: Probing VFIO support...
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL: PCI device 0000:00:05.0 on NUMA socket -1
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL:   Invalid NUMA socket, default to 0
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL: PCI device 0000:00:06.0 on NUMA socket -1
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL:   Invalid NUMA socket, default to 0
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 13 06:26:05 ip-172-31-41-87 router[2391]: EAL: PCI device 0000:00:05.0 on NUMA socket -1
Oct 13 06:26:05 ip-172-31-41-87 router[2391]: EAL:   Invalid NUMA socket, default to 0
Oct 13 06:26:05 ip-172-31-41-87 router[2391]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 13 06:26:05 ip-172-31-41-87 router[2391]: EAL: PCI device 0000:00:06.0 on NUMA socket -1
Oct 13 06:26:05 ip-172-31-41-87 router[2391]: EAL:   Invalid NUMA socket, default to 0
Oct 13 06:26:05 ip-172-31-41-87 router[2391]: EAL:   probe driver: 1d0f:ec20 net_ena
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: PMD: LLQ is not supported. Fallback to host mode policy.
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: PMD: Placement policy: Regular
Oct 13 06:26:05 ip-172-31-41-87 router[2391]: PMD: LLQ is not supported. Fallback to host mode policy.
Oct 13 06:26:05 ip-172-31-41-87 router[2391]: PMD: Placement policy: Regular
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: ------------***------ Initializing scheduler -----***------------
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: DEBUG: Scheduler can use cores: [0 1]
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: ------------***---------- Creating ports ---------***------------
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: Invalid value for nb_tx_desc(=2048), should be: <= 1024, >= 128, and a product of 1
Oct 13 06:26:05 ip-172-31-41-87 router[2386]: ERROR: Cannot init port  0 !
Oct 13 06:26:05 ip-172-31-41-87 router[2391]: Invalid value for nb_tx_desc(=2048), should be: <= 1024, >= 128, and a product of 1

If I set TXQueuesNumberPerPort to 1024, there's another error message about the TX queues.

Oct 13 07:04:23 ip-172-31-41-87 router[3394]: Warning! Port 0 does not support requested number of TX queues 1024. Setting number of TX queues to 8

It seems to be the problem caused by the tx_ring_size in ENA driver can't support the value (2048) from DPDK https://github.com/intel-go/nff-go/blob/v0.9.2/internal/low/low.h#L37-L39

Is there any way I can configure the TX_RING_SIZE correctly?

Update

I tried again to set TX_RING_SIZE with patch and increase the EC2 instance type from t3.large to t3.xlarge and I can run the nff-go

Performance with iperf3
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  2.98 GBytes  2.56 Gbits/sec    5             sender
[  4]   0.00-10.00  sec  2.98 GBytes  2.56 Gbits/sec                  receiver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants