Skip to content

Latest commit

 

History

History
4 lines (3 loc) · 464 Bytes

README.md

File metadata and controls

4 lines (3 loc) · 464 Bytes

archive crawler with feature extraction

The following code demonstrates a web crawler for a forum with multiple pages that contain blog posts. Different features are extracted from the parsed texts and the correlations between the features are investigated. With a larger database, the identified features could be used to train a machine learning algorithm to predict the popularity of new blog posts.

If Jupyter Notebook does not render, please refer to pdf.