Improve the bandwidth/throughput calculation on Flow Aggregator side #2211
Labels
area/flow-visibility/aggregator
Issues or PRs related to Flow Aggregator
area/flow-visibility
Issues or PRs related to flow visibility support in Antrea
kind/design
Categorizes issue or PR as related to design.
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
Describe what you are trying to solve
When working on #1802, we found using
octetDeltaCountFromSourceNode
andflowEndSeconds
fields to calculate bandwith on Flow Aggregator side may be wrong in some corner cases. BecauseflowEndSeconds
will be updated whenever Flow Aggregator received records from the Flow Exporter on source or destination node, however theoctetDeltaCountFromSourceNode
will only be updated when the records from the Flow Exporter on source node arrived.For example, in a run of Flow Aggregator E2E test, the iperf time is 12s, while Flow Exporter active time out is 2s and Flow Aggregator active time out is 4s. The Flow Exporter on both source and destination node will export record at time 2s, 4s, 6s, 8s, 10s, and 12s after iperf traffic begin. And Flow Aggregator will export record at time around 6s, 10s, and 14s. At 6s, we expect the Flow Aggregator aggregate 3 records from Exporter which are exported at time 2s, 4s and 6s. But it may happen that Flow Aggregator only received the records from the Exporter on the destination node at 6s, then it update the
flowEndSeconds
to 6s, but theoctetDeltaCountFromSourceNode
value is still at 4s. So the bandwidth calculated will be wrong in this case.So in that PR, we choose the active time out of Flow Aggregator as 3.5s and we expect 3 aggregated records at time 5.5s, 9s, and 12.5s after iperf traffic begins, which are aggregating records at (2s, 4s), (6s, 8s), (10s, 12s) respectively. Using this method will decrease the possibility of this corner case happens significantly but doesn't solve this problem completedly.
Also, we could not control the active time out of Flow Aggregator configed by user and ELK flow collector will reply on these fields to calculate bandwidth. So it's necessary to solve this problem.
Describe the solution you have in mind
So we are thinking about using two fields
flowEndSecondsFromSource
andflowEndSecondsFromDestination
to replaceflowEndSeconds
field in Flow Aggregator. In this way, we could divide theoctetDeltaCountFromSourceNode
with the difference offlowEndSecondsFromSource
of two consecutive flow records to get the correct bandwidth on the Flow Aggregator side.Test plan
Update the Flow Aggregator E2E bandwidth tests to use flow updated time from the last record and current record as the interval instead of hardcoded 2x exporter active timeout. Also try other active timeout of Flow Aggregator to handle the corner cases.
The text was updated successfully, but these errors were encountered: