In the interest of keeping file sizes small, we have not included the training data used to create the machine learning algroithm (~111MB).
If you are interested in exploring the training data, you can find the dataset we used here.
Onion articles (and other satirical articles) are pretty funny. Even though they are written like news, we can usually tell that they are meant to be satire. But, we wondered, could a computer do the same? We used Kaggle, Keras, Tensorflow, Pandas, and others to find out.
You can find a link to our pre-trained weights here (link is currently restricted to U-M accounts).
Warning: The file download is quite large at ~140 MB.
Our final project report includes more information on what this project does, how it works, and how it could be improved. It is included in our project as Final_Project_Report.pdf
.
Our data collection file is onion_farmer.py
API's/Websites Used: CNN, The Onion, AP News, Clickhole, Pushshift API (for Reddit).
Data is stored in the database file static\onion_barn.db
.
To limit data points collected to 25 at a time, we scrape webpages one article at a time. For Pushshift API, we used a recursive call to ensure that we could get more than 25 articles at once without exceeding the 25/call limit.
Averages, percentages, and more are calculated from the data in visuals.py
.
Our JOIN statement is used in most_commented
on line 34 of visuals.py
.
The output from calculations is written to static\caulculations.csv
.
Our visualizations are stored in static\visuals
.
We have also generated wordclouds from our training datasets, found in static\visualizations
.
Our project report can be found at Final_Project_Report.pdf
.