bayespam

A simple bayesian spam classifier.

About

Bayespam is inspired by Naive Bayes classifiers, a popular statistical technique of e-mail filtering.

Here, the message to be identified is cut into simple words, also called tokens. That are compared to all the corpus of messages (spam or not), to determine the frequency of different tokens in both categories.

A probabilistic formula is used to calculate the probability that the message is a spam. When the probability is high enough, the classifier categorizes the message as likely a spam, otherwise as likely a ham. The probability threshold is fixed at 0.8 by default.

Documentation

Learn more about Bayespam here: https://docs.rs/bayespam.

Usage

Add to your Cargo.toml manifest:

[dependencies]
bayespam = "1.1.0"

Use a pre-trained model

Add a model.json file to your package root. Then, you can use it to score and identify messages:

extern crate bayespam;

use bayespam::classifier;

fn main() -> Result<(), std::io::Error> {
    // Identify a typical spam message
    let spam = "Lose up to 19% weight. Special promotion on our new weightloss.";
    let score = classifier::score(spam)?;
    let is_spam = classifier::identify(spam)?;
    println!("{:.4?}", score);
    println!("{:?}", is_spam);

    // Identify a typical ham message
    let ham = "Hi Bob, can you send me your machine learning homework?";
    let score = classifier::score(ham)?;
    let is_spam = classifier::identify(ham)?;
    println!("{:.4?}", score);
    println!("{:?}", is_spam);

    Ok(())
}

$> cargo run
0.9993
true
0.6311
false

Train your own model and save it as JSON into a file

You can train a new model from scratch, save it as JSON to reload it later:

extern crate bayespam;

use bayespam::classifier::Classifier;
use std::fs::File;

fn main() -> Result<(), std::io::Error> {
    // Create a new classifier with an empty model
    let mut classifier = Classifier::new();

    // Train the classifier with a new spam example
    let spam = "Don't forget our special promotion: -30% on men shoes, only today!";
    classifier.train_spam(spam);

    // Train the classifier with a new ham example
    let ham = "Hi Bob, don't forget our meeting today at 4pm.";
    classifier.train_ham(ham);

    // Identify a typical spam message
    let spam = "Lose up to 19% weight. Special promotion on our new weightloss.";
    let score = classifier.score(spam);
    let is_spam = classifier.identify(spam);
    println!("{:.4}", score);
    println!("{}", is_spam);

    // Identify a typical ham message
    let ham = "Hi Bob, can you send me your machine learning homework?";
    let score = classifier.score(ham);
    let is_spam = classifier.identify(ham);
    println!("{:.4}", score);
    println!("{}", is_spam);

    // Serialize the model and save it as JSON into a file
    let mut file = File::create("my_super_model.json")?;
    classifier.save(&mut file, false)?;

    Ok(())
}

$> cargo run
0.9999
true
0.0001
false

$> cat my_super_model.json
{"token_table":{"forget":{"ham":1,"spam":1},"only":{"ham":0,"spam":1},"meeting":{"ham":1,"spam":0},"our":{"ham":1,"spam":1},"dont":{"ham":1,"spam":1},"bob":{"ham":1,"spam":0},"men":{"ham":0,"spam":1},"today":{"ham":1,"spam":1},"shoes":{"ham":0,"spam":1},"special":{"ham":0,"spam":1},"promotion:":{"ham":0,"spam":1}}}

Contribution

Contributions via issues or pull requests are appreciated.

License

Bayespam is distributed under the terms of the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
model.json		model.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bayespam

About

Documentation

Usage

Use a pre-trained model

Train your own model and save it as JSON into a file

Contribution

License

About

Releases

Packages

Languages

License

Houski/bayespam

Folders and files

Latest commit

History

Repository files navigation

bayespam

About

Documentation

Usage

Use a pre-trained model

Train your own model and save it as JSON into a file

Contribution

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages