Skip to content

This repository contains the supporting code and data for the paper "Semantic Clustering of Italian Political News on Facebook: Comparing text-embedding-3-large and UmBERTo Embeddings using HDBSCAN and K-means."

License

Notifications You must be signed in to change notification settings

fabiogiglietto/Semantic-Clustering-Italian-News

Repository files navigation

Semantic Clustering of Italian Political News on Facebook

This repository contains the code and data supporting the working paper "Semantic Clustering of Italian Political News on Facebook: Comparing text-embedding-3-large and UmBERTo Embeddings using HDBSCAN and K-means".

Overview

This study compares the performance of OpenAI's text-embedding-3-large model against the BERT-based UmBERTo model for clustering Italian political news content. We utilize two distinct datasets of political news stories circulated on Facebook before the 2018 and 2022 Italian elections.

Repository Contents

  • /: R and Python scripts for data processing, embedding generation, clustering, and analysis

  • rawdata/: Title and description of 35,795 links circulated on Facebook prior to 2018 and 2022 Italian elections. Sample of pair links coded by thematic coherence by human expertsin JSONL

  • output/: Empty output folders

  • output/: Empty data folders

  • LICENSE: License information for the project

Contact

For questions or feedback, please open an issue in this repository or contact Fabio Giglietto.

About

This repository contains the supporting code and data for the paper "Semantic Clustering of Italian Political News on Facebook: Comparing text-embedding-3-large and UmBERTo Embeddings using HDBSCAN and K-means."

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages