py621dl - an iterable E621 downloader

This package is meant to be used in deep learning applications and automation, not as a means to download specific images and post IDs or searching for tags. For that application, please check out py621 which is not related to this package in any way.

The package is meant to be used with the official db export format from E621, posts information. See here for available db exports and here for general information on the API.

!! This is a pre-release version, and is not meant for production use !!

Proper documentation, tests, and automated updates to the package will be added later.

Installation

You can install the package using pip install py621dl on python>=3.11

Usage

The E621Downloader class must be initialized using the Reader class, to which the csv file must be passed. The Reader supports only the official db export csv files of the format "posts-YYYY-MM-DD.csv.gz", either compressed or uncompressed.

The E621Downloader class can be initialized with the following parameters:

csv_reader: the Reader object
timeout: the timeout for the requests, in seconds
retries: the number of retries for the requests

It can be used as an iterable, yielding lists of np.ndarray objects of the images. The list size will depend on your batch_size specified for Reader. The images are of opencv BGR format. The downloader automatically handles and filters deleted or flagged posts, and will attempt to fill the batch with new images so that it will always yield a full batch.

The Reader class can be initialized with the following parameters:

csv_file: the path to the csv file
batch_size: the size of the batch to be returned by the E621Downloader
excluded_tags: a list of E621 tags to be excluded from the results
minimum_score: the minimum score of the posts to be included in the results
chunk_size: the size of the chunk to be read from the csv file at once
checkpoint_file: the path to the checkpoint file, to resume from any point. If path doesn't exist, a new file will be created.
repeat: whether to repeat from the beginning of the csv file when the end is reached automatically. Otherwise StopIteration is raised. E621Downloader handles this exception and raises its own StopIteration when the end is reached.

Example use

from py621dl import Reader, E621Downloader

reader = Reader("posts-2022-10-30.csv.gz")
downloader = E621Downloader(reader, timeout=10, retries=3)

for batch in downloader:
    # do something with the batch
    pass

Contributing

For any opened issues, please create a linked branch for that issue and create pull requests into the test branch for completed edits.

To get started with contribution to this repository, you will need Python 3.11 and Poetry. After that, simply navigate to a folder into which you have cloned this repository, and do the following:

poetry use 3.11
poetry install --with dev

Note that python 3.11 will need to be in your PATH for it to poetry use 3.11 to work. Otherwise refer to Poetry documentation.

In order to write your own tests for new code (strongly recommended), you will need to run pip install -e . from the project folder, in order to install it locally based on the current state of the files, so that pytest may use this package as if it was properly installed on an end-user system, without the need to re-build and re-install it with every change you make.

You can also use pip install -e . to insall the package locally, so you can simply use import py621dl and any changes in your cade will be instantly reflected while you debug the code.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
resources		resources
src/py621dl		src/py621dl
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

py621dl - an iterable E621 downloader

Installation

Usage

Example use

Contributing

About

Languages

License

slobodaapl/py621dl

Folders and files

Latest commit

History

Repository files navigation

py621dl - an iterable E621 downloader

Installation

Usage

Example use

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages