archive crawler with feature extraction

The following code demonstrates a web crawler for a forum with multiple pages that contain blog posts. Different features are extracted from the parsed texts and the correlations between the features are investigated. With a larger database, the identified features could be used to train a machine learning algorithm to predict the popularity of new blog posts.

If Jupyter Notebook does not render, please refer to pdf.