GitHub - dentropy/discord-binding

Discord Binding

The goal of this project is to take the data exported form Tyrrrz/DiscordChatExporter and put it into a relational database so aggregations can be easily calculated and so the data can be used in other parts of an ETL pipeline.

Additional Reference Docs

Scraping Discord
- This page explains how to get your own Discord data to feed into this ETL pipeline
Setup Postgres
- This doc contains instructions to setup and access a local postgres server
Setup Postgraphile
- Postgraphile generate and runs a graphql API from just looking inside a postgres database
neo4j Docs
- Setup neo4j and contains some example queries, including how to reset the database

Transforming the data from DiscordChatExporter

Requirements:

S3 Bucket loaded with data from DiscordChatExporter
Postgres Database, you can use postgres.dockercompose.yml if you do not have on already setup

Steps:

Setup python virtual environment and install requirements.txt

python3.10 minimum unless you install deps manually

# install pip
curl https://bootstrap.pypa.io/get-pip.py | python3 $1
python3 -m pip install virtualenv
sudo apt install python3-venv # Debian Distros
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Set environment variables using .env file

cp .env_example .env
$EDITOR .env

Update the environment variables under DB Select and S3, the ones below

# DB Select
db_select='postgres'
db_url='psql://$USER:$PASS@$HOSTNAME:$PORT/$DATABASE_NAME'

# S3
aws_access_key_id=''
aws_secret_access_key=''
endpoint_url=''
bucket_name=''

Run ETL pipeline, also remember tmux exists

# Using Bash
source env/bin/activate
python3 run_dag.py &
cat *.log

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
JupyterNotebooks		JupyterNotebooks
api		api
containers		containers
docs		docs
frontend		frontend
modules		modules
research_and_development		research_and_development
schemas		schemas
tests		tests
.env_example		.env_example
.gitignore		.gitignore
README.md		README.md
clear_ipynb.sh		clear_ipynb.sh
embeddings_test.py		embeddings_test.py
graphile_time_queries.js		graphile_time_queries.js
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run_S3_dag.py		run_S3_dag.py
run_S3_generate_index.py		run_S3_generate_index.py
run_S3_get_guild_names.py		run_S3_get_guild_names.py
run_S3_index_json_paths.py		run_S3_index_json_paths.py
run_add_postgres_constraints.py		run_add_postgres_constraints.py
run_calculate_embeddings_for_messages.py		run_calculate_embeddings_for_messages.py
run_create_sql_tables.py		run_create_sql_tables.py
run_filesystem_dag.py		run_filesystem_dag.py
run_reset_neo4j.py		run_reset_neo4j.py
run_select_json_object_paths.py		run_select_json_object_paths.py
run_test_django_api.py		run_test_django_api.py
run_test_queries.py		run_test_queries.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discord Binding

Additional Reference Docs

Transforming the data from DiscordChatExporter

About

Releases

Packages

Contributors 2

Languages

dentropy/discord-binding

Folders and files

Latest commit

History

Repository files navigation

Discord Binding

Additional Reference Docs

Transforming the data from DiscordChatExporter

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages