Small project to analyze various news site with the goal of evaluating their gender diversity.
Method for now is quite simple:
- Parse the page, getting the articles titles
- Extract anything that looks like a name us- ing the Standford NER algorithm
- Identify probable gender of each name
- Source page: total citation, history per day, word cloud?
- Contact (page ok, need to send email)
- Add prev/next day to dashboard
- Analysis detail page (with prev/next)
- Let's start in french, but translatable already?
- Add text to dashboard
- Home: Explain what this is about
- Sources : Pick one for detail, see stats (today?)
- Dashboard
- Store all titles (to do some keywords analysis after?), not only those with identified names
- Could create a "re-analyze" based on the saved html
- Add links to the articles/titles found, to allow "gendered news"
- For fun - remove male news from a site?
- Add sentiment analysis on titles?
- Check hugging face interface/api for easy module switch?
- Add Django translations