A data analytics pipeline for pSSID that receives, stores, and visualizes WiFi test metrics gathered by Raspberry Pi WiFi probes.
Picture on the left is an overview of the entire pSSID architecture with the role of this data analytics pipeline highlighted. In short, it receives test results (metrics) gathered by the probes, stores and visualizes them.
Picture on the right is the architecture of the pipeline itself. It leverages the
idea of the ELK stack, simply replacing Elasticsearch
and Kibana
with Opensearch
and Grafana
, resepctively.
The setup of the pipeline assumes that you have a virtual machine running Ubuntu 22 and that the machine has Docker installed. If not, you could install it with
sudo apt update && sudo apt install docker.io docker-compose -y
-
Clone this repository to the machine you would like to host the pipeline on. Each service has its own
docker-compose
file for better modularization. If demand changes, say you need moreOpensearch
nodes, you could simply provision more nodes without touching other components of the pipeline. -
Set passwords for
Opensearch
, which is required since version 2.12.0. The easiest way to do so is with environment variables. Add the following lines to your.bashrc
file. This documentation usesadmin
as the username andOpensearchInit2024
as the password for demonstration.
export OPENSEARCH_INITIAL_ADMIN_PASSWORD=OpensearchInit2024
export OPENSEARCH_USER=admin
export OPENSEARCH_PASSWORD=OpensearchInit2024
opensearch-one-node.yml
and
logstash.yml
, so it is not recommended that you change the variable names unless
there is a good reason. You could freely change their values.
Don't forget to run
source ~/.bashrc
to load the environment variables.
docker-compose
with sudo
, since the root user cannot read the
environment variables defined by non-root users. Make sure the current user is in
the docker
group so that you can directly run docker-compose
without sudo
.
Add yourself to the docker
group and activate it by running the following command.
sudo usermod -aG docker ${USER} && newgrp docker
Opensearch
requires vm.max_map_count
to be at least 262144.
Check your current value by running
sysctl vm.max_map_count
and if it is too low, say 65530 by default on some machine, edit the
/etc/sysctl.conf
file and add the following
vm.max_map_count=262144
Apply the change
sudo sysctl -p
- Configure
Logstash
. Create a directory on the host machine, saylogstash-pipeline
, with at least alogstash.conf
file in it.logstash.conf
contains input, output sources, and custom filters you would like to implement. A sample file is provided inside the directorylogstash-pipeline
. You could use it as your pipeline directory and add more.conf
files to it.
Open logstash.yml
and edit the following TODO item.
Mount the directory you just created to the pipeline
directory inside the
container.
# TODO: mount your pipeline directory into the container. USE ABSOLUTE PATH!
- <ABS_PATH_TO_YOUR_PIPELINE_DIRECTORY>:/usr/share/logstash/pipeline
-
No configuration is required for
Grafana
. -
Start the three components of the service with
docker-compose
.
docker-compose -f <path-to-opensearch.yml> up -d
docker-compose -f <path-to-logstash.yml> up -d
docker-compose -f <path-to-grafana.yml> up -d
OPTIONAL: you could also start the opensearch dashboard in the same way.
docker-compose -f <path-to-opensearch-dashboard.yml> up -d
By default, Logstash
listens for Filebeat
input at port 9400, Opensearch
listens for Logstash
input at port 9200, Grafana
dashboard is hosted at
port 3000, and the optional Opensearch
dashboard is hosted at port 5601.
Make sure the firewall settings allow external traffic to ports 9400, 3000, and
5601.
This file contains the input source, custom filters, and output destination. See the
sample file for more details. The input and output fields generally require minimal
changes, if any. Most of the customization is done in the filter
field. You could
implement as many filters as you like, and a more complicated filtering at the
Logstash level usually results in simpler configuration later at the Grafana level.
The sample file contains a single pipeline with multiple filters applied. Refer to the official documentation for more advanced examples with multiple pipelines.
On each WiFi probe, install Filebeat
. Refer to the documentation
here.
Then open the configuration file /etc/filebeat/filebeat.yml
and edit the following
fields.
Specify the input source for Filebeat
, which is the output destination of pSSID.
In the following example, test results gathered by pSSID are written to
/var/log/pssid.log
on the probe.
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/pssid.log
Comment out the output section for Elasticsearch
and uncomment the one for
Logstash
.
output.logstash:
hosts: ["<pipeline-hostname>:9400"]
Naviagte to the Grafana
dashboard at <pipeline-hostname>:3000
. By default,
Grafana
username and password are both admin
. To add a data source, select
Opensearch
in the list of available sources and configure as follows.
Remarks:
URL
: use https instead of http, and checkBasic auth
andSkip TLS Verify
under theAuth
section.User
andPassword
underBasic Auth Details
areOPENSEARCH_USER
andOPENSEARCH_PASSWORD
defined earlier, which areadmin
andOpensearchInit2024
in our example. Also make sure to use the Docker aliased hostnameopensearch-node1
instead of the actual hostname of your pipeline machine.Index name
: wild card patterns are allowed here. To see the list of allOpensearch
indices, run
curl -u <OPENSEARCH_USER>:<OPENSEARCH_PASSWORD> --insecure \
"https://localhost:9200/_cat/indices?v"
on the pipeline machine.
- Click on
Get Version and Save
, which should automatically populate theVersion
andMax concurrent Shard Requests
fields, indicating a successful configuration.
Having configured the data sources, now you could create visualization panels and dashboards.