This package is meant to be used in deep learning applications and automation, not as a means to download specific images and post IDs or searching for tags. For that application, please check out py621 which is not related to this package in any way.
The package is meant to be used with the official db export format from E621, posts information. See here for available db exports and here for general information on the API.
!! This is a pre-release version, and is not meant for production use !!
Proper documentation, tests, and automated updates to the package will be added later.
You can install the package using pip install py621dl
on python>=3.11
The E621Downloader class must be initialized using the Reader class, to which the csv file must be passed. The Reader supports only the official db export csv files of the format "posts-YYYY-MM-DD.csv.gz", either compressed or uncompressed.
The E621Downloader class can be initialized with the following parameters:
csv_reader
: the Reader objecttimeout
: the timeout for the requests, in secondsretries
: the number of retries for the requests
It can be used as an iterable, yielding lists of np.ndarray
objects of the images. The list size
will depend on your batch_size
specified for Reader
. The images are of opencv BGR format.
The downloader automatically handles and filters deleted or flagged posts, and will attempt to fill
the batch with new images so that it will always yield a full batch.
The Reader class can be initialized with the following parameters:
csv_file
: the path to the csv filebatch_size
: the size of the batch to be returned by theE621Downloader
excluded_tags
: a list of E621 tags to be excluded from the resultsminimum_score
: the minimum score of the posts to be included in the resultschunk_size
: the size of the chunk to be read from the csv file at oncecheckpoint_file
: the path to the checkpoint file, to resume from any point. If path doesn't exist, a new file will be created.repeat
: whether to repeat from the beginning of the csv file when the end is reached automatically. OtherwiseStopIteration
is raised.E621Downloader
handles this exception and raises its ownStopIteration
when the end is reached.
from py621dl import Reader, E621Downloader
reader = Reader("posts-2022-10-30.csv.gz")
downloader = E621Downloader(reader, timeout=10, retries=3)
for batch in downloader:
# do something with the batch
pass
For any opened issues, please create a linked branch for that issue and create pull requests into the test branch for completed edits.
To get started with contribution to this repository, you will need Python 3.11 and Poetry. After that, simply navigate to a folder into which you have cloned this repository, and do the following:
poetry use 3.11
poetry install --with dev
Note that python 3.11 will need to be in your PATH for it to poetry use 3.11
to work. Otherwise refer to Poetry documentation.
In order to write your own tests for new code (strongly recommended), you will need to run pip install -e .
from the project folder, in order to install it locally based on the current state of the files, so that pytest may use this package as if it was properly installed on an end-user system, without the need to re-build and re-install it with every change you make.
You can also use pip install -e .
to insall the package locally, so you can simply use import py621dl
and any changes in your cade will be instantly reflected while you debug the code.