A Python package to find license expressions and copyright statements in a codebase.
Based on Google LicenseClassifer V2, GoLicense-Classifier (or glc for short) focuses on performance without compromising with accuracy.
Note: Currently, this package only supports Linux Platform. Work is in progress for Windows and Mac.
Installing GoLicense-Classifier is as simple as
pip install golicense-classifier
Or, you can build the package from source as
git clone https://github.com/AvishrantsSh/GoLicense-Classifier.git
make dev
make package
To get started, import LicenseClassifier
class from the module as
from LicenseClassifier.classifier import LicenseClassifier
Note: Work on Copyright Statement is still in beta phase. Expect some issues, mostly with binary files
The class comes bundled with some handy functions, each suited for a different task.
-
scan_directory
This method is used to recursively walk through a directory and find license expressions and copyright statements. It returns a dictionary object with keys
header
andfiles
.
classifier = LicenseClassifier() res = classifier.scan_directory('PATH_TO_DIR')
-
max_size
Maximum size of file in MB. Default is set to 10MB. Set
max_size < 0
to ignore size constraints -
use_buffer
(Experimental)
Set toTrue
to use buffered file scanning.max_size
will be used as buffer size. -
use_scancode_mapping
Set to
True
if you want to use Scancode license key mappings. Default is set toTrue
.
-
-
scan_file
This method is used to find license expressions and copyright statements in a single file.
classifier = LicenseClassifier() res = classifier.scan_file('PATH_TO_FILE')
-
max_size
Maximum size of file in MB. Default is set to 10MB. Set
max_size < 0
to ignore size constraints -
use_buffer
(Experimental)
Set toTrue
to use buffered file scanning.max_size
will be used as buffer size. -
use_scancode_mapping
Set to
True
if you want to use Scancode license key mappings. Default is set toTrue
.
-
You can set custom threshold for scanning purpose that best suits your need. Simply change the parameter threshold
during object creation as
classifier = LicenseClassifier(threshold = 0.9)
Contributions are what makes the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
To get started, read the Contributing Guide.
- Google LicenseClassfifer V2 https://github.com/google/licenseclassifier/
- Ctypes Shared Library Code https://github.com/AvishrantsSh/Ctypes_LicenseClassifier
- Ctypes https://docs.python.org/3/library/ctypes.html