AudioDup - Near-duplicate Detection of Audios

This repository presents my trivial approach for near-duplicate detection of audios, by generating acoustic fingerprints.

Setup Instructions

We assume that you have access to a computer with MacOS. However, you should generally be fine with any Unix/Linux-based systems as well.
Make sure you have installed Python 3.7 and the latest version of pipenv.
Install MySQL connector using brew install mysql-connector-c.
- Fix a potential bug by this.
Install brew install portaudio && brew install ffmpeg.
Install all dependencies with pipenv install.
Setup a databset & user for the program:

CREATE DATABASE dejavu;
CREATE USER 'dejavu'@'localhost' IDENTIFIED BY 'dejavu';
GRANT ALL PRIVILEGES ON dejavu.* TO 'dejavu'@'localhost';

We would use the FMA Dataset to perform testing. To avoid wasting too much time & disk space, you do not have to download the whole dataset.
Put what you downloaded into the data folder.
Run pipenv shell python3 collect.py to collect all fingerprints.
Run pipenv shell python3 test.py to collect test results.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data		data
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
_config.yml		_config.yml
check_md5.py		check_md5.py
collect.py		collect.py
collect_max_pooling.py		collect_max_pooling.py
collect_md5.py		collect_md5.py
collect_video_fingerprint.py		collect_video_fingerprint.py
extract_audio.py		extract_audio.py
init_db.sql		init_db.sql
recognize.py		recognize.py
server.py		server.py
test.py		test.py