This code accompanies a blog post at http://www.oblomovka.com/wp/2016/02/17/are-email-subject-lines-getting-longer/
Summary: I noticed that the subject lines from my inbox circa 1998 seem, on average, shorter than the subject lines of emails I receive today (2016).
In my email corpus, there seems to be a general upward trend in email subject length: the subject line grows, on average, more than one character per year.
This means that by the year 2036, the average email subject line will be eighty characters, and no longer be able to fit across an 80-character VT100 terminal. Chaos will ensue.
Here are some tools I wrote to help me conduct this vital analysis:
Program | Function |
---|---|
bring_me_emails.py | Can slurp up and save stats of emails stored in mbox, Maildir or notmuchmail. |
strip_ids.py | Can remove personal info (message-ids) from a sqlite3 database created by bring_me_emails.py |
Subject line growth-sqlite-scatter.ipynb | Create a scatter plot of email subject line length (ipython/jupyter notebook) |
plotlengths.py | Plot scatter plot and save as a PNG file from the command line |
You'll need the matplotlib libraries to use the plotlengths.py script, and Jupyter/ipython to run the ipython notebook. If you want to read mail stored in notmuch, you'll need its accompanying python library.
An example email database (115MB, stripped of personal info) is available separately (torrent)
Questions, queries? Mail me at [email protected] with the subject line "Hello Danny, I do hope this subject line won't ruin your future averages for any subsequent analysis that you have planned". Thanks!