Skip to content

Commit

Permalink
Add Pktgen Check for CI (#148)
Browse files Browse the repository at this point in the history
CI will now run Pktgen as an additional test metric.

Commit log:

* Save performance in ci

* Add flask to install

* Allowed for public linting of unauthorized users

* Starting the queue

* Start to fixing the queue problem

* Added run mode for future advancements

* Major fixes to ci list system

* Remove debug lines

* Clean up definitions

* Remove exit statement

* Add error handling in run_ci and fix docs

* Create polling thread for requests

* Delete new line

* Account for run mode from #140

* Errors are much more difficult in threads :(

* Smarter than using tuples

* Added event-based handling instead of polling

* Let request handler clears event

* Initialize ci for pktgen

* Huge update to pktgen configuration

* Updated gitignore for log files

* Added speed tester and pktgen in conjunction

* Fix redundancies and bugs

* Disabled flow lookup and created better benchmarks

* Fixed things so nn30 works with pktgen

* Fixes from github comments

* Make results more stable by restarting manager

* Added fixes from review and redundancies

* Fixes from github review

* Additions to readme about CI developments

* Add a section about Pktgen helper nodes

* Fix missed line

* Add compatability for different interfaces

* Fixes for user input and debugging

* Fixes for Pktgen and binding interface

* Renaming a ton of files

* Fixed symbolic link

* Added benchmark info to README and renamed file

* Fix Pktgen install requested changes

* Address Github change requests

* Adjustments to interface configuration

* Fixed manager debug
  • Loading branch information
kevindweb authored and koolzz committed Aug 6, 2019
1 parent e217d71 commit ca145c6
Show file tree
Hide file tree
Showing 21 changed files with 610 additions and 151 deletions.
6 changes: 4 additions & 2 deletions ci/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
access_log
config
webhook-config.json
encrypted_secret.bin
Expand All @@ -10,4 +9,7 @@ repository
linter-output.txt
*key
*.pub
*.stats
*stats
*out*
*log*
nimbnode*
37 changes: 34 additions & 3 deletions ci/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ The CI process can be broken into multiple steps:

5. Run linter on the checked out code

Runs the `run_linter` function in `helper-functions.sh`
Runs the `run_linter` function in `helper-manager-functions.sh`

6. Clean up and restart all worker nodes

Expand All @@ -83,11 +83,15 @@ The CI process can be broken into multiple steps:

Use paramiko to ssh and run `run-workload.py`

9. Acquire results from the worker nodes
9. Run modes are supplied to tell the worker which applications to test

Handle installation with `worker_files/worker.sh` for builds, and setting up manager for performance tests

10. Acquire results from the worker nodes

Use scp to copy the result stat file from worker

10. Submit results as a comment on github
11. Submit results as a comment on github

Uses the `post-msg.py` script

Expand All @@ -106,6 +110,33 @@ ProxyPassReverse /onvm-ci/ http://nimbnode44:8080/
```
(Also need to setup github webhook to post to **http://nimbus.seas.gwu.edu/onvm-ci/github-webhook**)

### Public and Private CI Runs

CI is now able to accept requests from unauthenticated users. There is a list of Github users in the public project allowed to create a full run. Anyone who is able to view the private `-dev` repository is able to run CI there as well. In `openNetVM`, if a user is not in our list, the linter and branch checks will be executed, ignoring statistics calculations from the worker nodes.

### Setting Up a Connected Worker

Connecting two nodes is useful for measuring statistics with tools like Pktgen and the MTCP stack. There is a bit of setup required to get working connection working. Firstly, an SFP+ 10Gb Intel cable will be required to connect the Network Interface Cards in the two machines. Once this is done, attempt to bring up the correct interfaces for a stable connection. Some debugging might be required:
- If you don't know which `ifconfig -a` interface is correct, use `ethtool -p <interface name> 120`
- This will blink a light on the interface (you have to be next to the machine for this to help)
- Do this on both machines, to find the name of the interfaces that are linked
- Run `sudo ifconfig <interface name> 11.0.0.1/24 up` on the first machine and `sudo ifconfig <interface name> 11.0.0.2/24 up`
- This will ensure `ping` understands what IP address it is supposed to talk to
- If `ping -I <interface> 11.0.0.2` on the first machine works, great, if not, try changing the IP addresses or viewing `dmesg`

Now that the interfaces are connected, choose which machine will be the CI worker, and which is a helper (Pktgen for example). Install Pktgen on this node by sending the `ci/install_pktgen` files to that machines' home folder. *Remember public keys must be created for all new machines*. Store these public keys in a folder with the server name, see the next section on statistics for more information. Run `chmod +x install-pktgen.sh` if it's not already an executable and run `./install-pktgen.sh` to install everything. If there are dependency errors, the machine might be a different version, so try to install the necessary packages. Once everything is installed, test ONVM->Pktgen between the machines, and if a connection is established, CI should work just fine with no more setup!

### Advanced Statistics

As CI continued to improve, with more programs to test with, benchmarks were made to track the average performance of a worker. In the future, CI will be able to handle multiple workers running many different tests. Since server configurations are not all the same, some with different hardware (Intel x710 vs. x520 NIC for example), performance of the nodes will not be the same. All that matters with CI is that the result of a run is the same or better, not globally across all nodes, but based on the specific server it ran on. For each worker, create a folder in the ci directory with the name of the worker IP. For example if `nimbnode17` is the current worker, a folder with path `/ci/nimbnode17/` should exist. In this folder, 3 files should be there at least. Firstly, a `benchmarks` file (used by the manager) should look similar to this:

```
AVG_SPEED_TESTER_SPEED=40000000
AVG_PKTGEN_SPEED=10000000
AVG_MTCP_SPEED=.230
```
This is a configuration file, sourced by the manager to keep track of `nimbnode17`'s average performance for each test (currently Speed Tester, Pktgen, and mTCP). The other two files in the folder should be the two public keys, one for the worker, and the second for the worker's client server. Check the previous section on setting up a connection for more information.

### Checking if Online

If you are worried if the CI build is offline or want to make sure it is listening for events, you can check the following url: curl http://nimbus.seas.gwu.edu/onvm-ci/status. If that URL returns 404, CI is offline. Otherwise it will display a message saying it is online.
Expand Down
2 changes: 1 addition & 1 deletion ci/ci_busy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

set -e

. helper-functions.sh
. helper-manager-functions.sh
SCRIPT_LOC=$(pwd)

print_header "Validating Config File and Sourcing Variables"
Expand Down
45 changes: 6 additions & 39 deletions ci/helper-functions.sh → ci/helper-manager-functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,45 +38,6 @@ print_header() {
echo ""
}

# sets up dpdk, sets env variables, and runs the install script
install_env() {
git submodule sync
git submodule update --init

echo export ONVM_HOME=$(pwd) >> ~/.bashrc
export ONVM_HOME=$(pwd)

cd dpdk

echo export RTE_SDK=$(pwd) >> ~/.bashrc
export RTE_SDK=$(pwd)

echo export RTE_TARGET=x86_64-native-linuxapp-gcc >> ~/.bashrc
export RTE_TARGET=x86_64-native-linuxapp-gcc

echo export ONVM_NUM_HUGEPAGES=1024 >> ~/.bashrc
export ONVM_NUM_HUGEPAGES=1024

echo $RTE_SDK

sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"

cd ../
pwd
. ./scripts/install.sh
}

# makes all onvm code
build_onvm() {
cd onvm
make clean && make
cd ../

cd examples
make clean && make
cd ../
}

# obtains core config in cores.out file
obtain_core_config() {
cd scripts
Expand Down Expand Up @@ -140,3 +101,9 @@ run_linter() {
fi
done
}

# inputs are key_file, worker ip address, stats file - in that order 1,2,3
fetch_files() {
scp -i $1 -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null $2:$3 ./$2.$3
check_exit_code "ERROR: Failed to fetch results from $2"
}
1 change: 1 addition & 0 deletions ci/install_pktgen/helper-worker-functions.sh
41 changes: 41 additions & 0 deletions ci/install_pktgen/install-pktgen.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/bin/bash

. helper-install-functions.sh

set -e

sudo rm -rf repository

git clone https://github.com/sdnfv/openNetVM.git repository
check_exit_code "ERROR: Failed cloning"

print_header "Installing Dependencies"
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get install -y build-essential linux-headers-$(uname -r) git
sudo apt-get install -y libnuma1
sudo apt-get install -y libnuma-dev
sudo apt-get install libpcap-dev
sudo apt-get install libreadline-dev

print_header "Installing Lua"
curl -R -O http://www.lua.org/ftp/lua-5.3.5.tar.gz
tar zxf lua-5.3.5.tar.gz
cd lua-5.3.5
sudo make linux test
sudo make install

cd repository

print_header "Installing Environment"
install_env $RUN_PKT
check_exit_code "ERROR: Installing environment failed"

print_header "Make pktgen-dpdk"
cd ~/repository/tools/Pktgen/pktgen-dpdk/
make

print_header "Updating lua script"
cp ~/pktgen-timed-config.lua ~/repository/tools/Pktgen/openNetVM-Scripts/pktgen-config.lua

print_header "Pktgen installed"
110 changes: 110 additions & 0 deletions ci/install_pktgen/pktgen-timed-config.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
-- openNetVM
-- https://github.com/sdnfv/openNetVM
--
-- BSD LICENSE
--
-- Copyright(c)
-- 2015-2016 George Washington University
-- 2015-2016 University of California Riverside
-- All rights reserved.

-- Redistribution and use in source and binary forms, with or without
-- modification, are permitted provided that the following conditions
-- are met:

-- Redistributions of source code must retain the above copyright
-- notice, this list of conditions and the following disclaimer.
-- Redistributions in binary form must reproduce the above copyright
-- notice, this list of conditions and the following disclaimer in
-- the documentation and/or other materials provided with the
-- distribution.
-- The name of the author may not be used to endorse or promote
-- products derived from this software without specific prior
-- written permission.

-- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-- "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-- LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-- A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-- OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-- SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-- LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-- DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-- THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-- (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-- OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

-- Change any of the settings below to configure Pktgen-DPDK

-- A list of the test script for Pktgen and Lua.
-- Each command somewhat mirrors the pktgen command line versions.
-- A couple of the arguments have be changed to be more like the others.

package.path = package.path ..";?.lua;test/?.lua;app/?.lua;"

require "Pktgen"

local function doWait(port, waitTime)
local idx;

pktgen.delay(1000);

pkt_rate_file = io.open("port_stats", "w");

if ( waitTime == 0 ) then
return;
end
waitTime = waitTime - 1;

-- Try to wait for the total number of packets to be sent.
local idx = 0;
while( idx < waitTime ) do
-- Write port stats to output file separated by line
pkt_rate_file:write(pktgen.portStats("all", "rate")[0]["pkts_rx"] .. "\n");
idx = idx + 1;

local sending = pktgen.isSending(port);
if ( sending[tonumber(port)] == "n" ) then
break;
end
pktgen.delay(1000);
end

pkt_rate_file:close()
end

printf("Lua Version : %s\n", pktgen.info.Lua_Version);
printf("Pktgen Version : %s\n", pktgen.info.Pktgen_Version);
printf("Pktgen Copyright : %s\n", pktgen.info.Pktgen_Copyright);

prints("pktgen.info", pktgen.info);

printf("Port Count %d\n", pktgen.portCount());
printf("Total port Count %d\n", pktgen.totalPorts());


-- set up a mac address to set flow to
--
-- TO DO LIST:
--
-- Please update this part with the destination mac address, source and destination ip address you would like to sent packets to

pktgen.set_mac("0", "90:e2:ba:5e:73:21");
pktgen.set_ipaddr("0", "dst", "10.11.1.17");
pktgen.set_ipaddr("0", "src", "10.11.1.16");

pktgen.set_proto("all", "udp");
pktgen.set_type("all", "ipv4");

pktgen.set("all", "size", 64)
pktgen.set("all", "burst", 32);
pktgen.set("all", "sport", 1234);
pktgen.set("all", "dport", 1234);
pktgen.set("all", "count", 1000000000);
pktgen.set("all", "rate",100);

pktgen.vlan_id("all", "start", 1);

pktgen.start("all");
doWait("all", 30);
pktgen.quit();
43 changes: 30 additions & 13 deletions ci/manager.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
set -e

# source helper functions file
. helper-functions.sh
. helper-manager-functions.sh
SCRIPT_LOC=$(pwd)

print_header "Validating Config File and Sourcing Variables"
Expand Down Expand Up @@ -78,7 +78,7 @@ fi
print_header "Cleaning up Old Results"

sudo rm -f *.txt
sudo rm -rf stats
sudo rm -rf *stats
sudo rm -rf repository

print_header "Checking Worker and GitHub Creds Exist"
Expand Down Expand Up @@ -127,7 +127,6 @@ then
fi

print_header "Preparing Workers"

for worker_tuple in "${WORKER_LIST[@]}"
do
tuple_arr=($worker_tuple)
Expand Down Expand Up @@ -157,10 +156,16 @@ do
tuple_arr=($worker_tuple)
worker_ip="${tuple_arr[0]}"
worker_key_file="${tuple_arr[1]}"
scp -i $worker_key_file -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -r ./repository $worker_ip:
check_exit_code "ERROR: Failed to copy ONVM files to $worker_ip"
scp -i $worker_key_file -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null helper-functions.sh worker.sh $worker_ip:
# make sure the config file is updated with the correct run mode
sed -i "/WORKER_MODE*/c\\WORKER_MODE=\"${RUN_MODE}\"" worker_files/worker-config
# create directory for scp
mkdir temp
# put all files in one temporary folder for one scp
cp -r ./$worker_ip/* ./repository ./worker_files/* temp
scp -i $worker_key_file -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -r ./temp/* $worker_ip:
check_exit_code "ERROR: Failed to copy ONVM files to $worker_ip"
# get rid of the temp folder now for next worker
sudo rm -rf temp
done

print_header "Running Workloads on Workers"
Expand All @@ -174,18 +179,30 @@ do
done

print_header "Obtaining Performance Results from all workers"

rm -f results_summary.stats

for worker_tuple in "${WORKER_LIST[@]}"
do
tuple_arr=($worker_tuple)
worker_ip="${tuple_arr[0]}"
worker_key_file="${tuple_arr[1]}"
scp -i $worker_key_file -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null $worker_ip:stats ./$worker_ip.stats
check_exit_code "ERROR: Failed to fetch results from $worker_ip"
# TODO: this will overwrite results if we have more than 1 worker, investigate this case
python3 speed-tester-analysis.py ./$worker_ip.stats $worker_ip results_summary.stats
# get the benchmarks for each node (some servers are faster)
. ./$worker_ip/benchmarks
# TODO: this will overwrite results if we have more than 1 worker, investigate this case
if [[ "$RUN_MODE" -eq "0" ]]
then
# fetch pktgen stats
fetch_files $worker_key_file $worker_ip pktgen_stats
python3 pktgen-analysis.py ./$worker_ip.pktgen_stats $worker_ip pktgen_summary.stats $AVG_PKTGEN_SPEED
check_exit_code "Failed to parse Pktgen stats"
# fetch speed_tester stats
fetch_files $worker_key_file $worker_ip speed_stats
python3 speed-tester-analysis.py ./$worker_ip.speed_stats $worker_ip speed_summary.stats $AVG_SPEED_TESTER_SPEED
check_exit_code "Failed to parse Speed Tester stats"
else
# only fetch speed tester stats if mode is not 0
fetch_files $worker_key_file $worker_ip speed_stats
python3 speed-tester-analysis.py ./$worker_ip.speed_stats $worker_ip speed_summary.stats $AVG_SPEED_TESTER_SPEED
check_exit_code "Failed to parse Speed Tester stats"
fi
check_exit_code "ERROR: Failed to analyze results from $worker_ip"
done

Expand Down
Loading

0 comments on commit ca145c6

Please sign in to comment.