A Scrapy pipeline to categorize items using MonkeyLearn.
The size of the item batches sent to MonkeyLearn.
Default: 200
Example:
MONKEYLEARN_BATCH_SIZE = 200
The ID of the monkeylearn module.
Example:
MONKEYLEARN_MODULE = 'cl_oFKL5wft'
In case of using a classifier, if the sandbox version should be used.
Default: False
Example:
MONKEYLEARN_USE_SANDBOX = True
The auth token.
Example:
MONKEYLEARN_TOKEN = 'TWFuIGlzIGRp...'
A field or list of Item text fields to use for classification. Also comma-separated string with field names is supported.
Example:
MONKEYLEARN_FIELD_TO_PROCESS = 'title'
MONKEYLEARN_FIELD_TO_PROCESS = ['title', 'description']
MONKEYLEARN_FIELD_TO_PROCESS = 'title,description'
The field where the MonkeyLearn output will be stored.
Example:
MONKEYLEARN_FIELD_OUTPUT = 'categories'
An example value of the MONKEYLEARN_FIELD_OUTPUT field after classification is:
[{'label': 'English', 'probability': 0.321}]
In your settings.py file, add the previously described settings and add MonkeyLearnPipeline
to your pipelines, e.g.:
ITEM_PIPELINES = {
'scrapy_monkeylearn.pipelines.MonkeyLearnPipeline': 100,
}
Copyright (c) 2015 MonkeyLearn.
Released under the MIT license.