Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the bandwidth/throughput calculation on Flow Aggregator side #2211

Closed
dreamtalen opened this issue May 27, 2021 · 2 comments · Fixed by #2692
Closed

Improve the bandwidth/throughput calculation on Flow Aggregator side #2211

dreamtalen opened this issue May 27, 2021 · 2 comments · Fixed by #2692
Assignees
Labels
area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator area/flow-visibility Issues or PRs related to flow visibility support in Antrea kind/design Categorizes issue or PR as related to design. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@dreamtalen
Copy link
Contributor

Describe what you are trying to solve
When working on #1802, we found using octetDeltaCountFromSourceNode and flowEndSeconds fields to calculate bandwith on Flow Aggregator side may be wrong in some corner cases. Because flowEndSeconds will be updated whenever Flow Aggregator received records from the Flow Exporter on source or destination node, however the octetDeltaCountFromSourceNode will only be updated when the records from the Flow Exporter on source node arrived.

For example, in a run of Flow Aggregator E2E test, the iperf time is 12s, while Flow Exporter active time out is 2s and Flow Aggregator active time out is 4s. The Flow Exporter on both source and destination node will export record at time 2s, 4s, 6s, 8s, 10s, and 12s after iperf traffic begin. And Flow Aggregator will export record at time around 6s, 10s, and 14s. At 6s, we expect the Flow Aggregator aggregate 3 records from Exporter which are exported at time 2s, 4s and 6s. But it may happen that Flow Aggregator only received the records from the Exporter on the destination node at 6s, then it update the flowEndSeconds to 6s, but the octetDeltaCountFromSourceNode value is still at 4s. So the bandwidth calculated will be wrong in this case.

So in that PR, we choose the active time out of Flow Aggregator as 3.5s and we expect 3 aggregated records at time 5.5s, 9s, and 12.5s after iperf traffic begins, which are aggregating records at (2s, 4s), (6s, 8s), (10s, 12s) respectively. Using this method will decrease the possibility of this corner case happens significantly but doesn't solve this problem completedly.

Also, we could not control the active time out of Flow Aggregator configed by user and ELK flow collector will reply on these fields to calculate bandwidth. So it's necessary to solve this problem.

Describe the solution you have in mind
So we are thinking about using two fields flowEndSecondsFromSource and flowEndSecondsFromDestination to replace flowEndSeconds field in Flow Aggregator. In this way, we could divide the octetDeltaCountFromSourceNode with the difference of flowEndSecondsFromSource of two consecutive flow records to get the correct bandwidth on the Flow Aggregator side.

Test plan
Update the Flow Aggregator E2E bandwidth tests to use flow updated time from the last record and current record as the interval instead of hardcoded 2x exporter active timeout. Also try other active timeout of Flow Aggregator to handle the corner cases.

@dreamtalen dreamtalen added kind/design Categorizes issue or PR as related to design. area/flow-visibility Issues or PRs related to flow visibility support in Antrea area/component/flow-aggregator labels May 27, 2021
@dreamtalen dreamtalen self-assigned this Jun 2, 2021
@zyiou zyiou added area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator and removed area/component/flow-aggregator labels Jun 9, 2021
@heanlan heanlan assigned heanlan and unassigned dreamtalen Aug 26, 2021
@zyiou
Copy link
Contributor

zyiou commented Aug 26, 2021

Another solution is to make octetDeltaCount reflect more accurate stat between source and destination flow records that is in correspondence to current flowEndSeconds. We have an issue tracking this in go-ipfix. vmware/go-ipfix#250

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator area/flow-visibility Issues or PRs related to flow visibility support in Antrea kind/design Categorizes issue or PR as related to design. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants