Skip to content

Text summarization of news articles using spaCy, gensim, Text-to-Text Transfer Transformer (T5) Transformer, and T5 Transformer implementation through Spark NLP.

Notifications You must be signed in to change notification settings

dhannywi/News_Summarization

Repository files navigation

Text Summarization through Transformers

Text summarization is the process of creating a short, accurate, and fluent summary of a longer text document to distill the most important information from a source text. Automatic text summarization is a common problem in machine learning and natural language processing (NLP). With an ever-growing amount of text data generated daily, automatic text summarization methods are greatly needed to help users consume and discover relevant information more quickly.

In this project, we developed a model which, when given a piece of text in English, would generate summaries of the text in the same language. We started by gathering the desired dataset from Kaggle. Then, we preprocessed the data, performed exploratory data analysis to gain insights, produced embeddings, and used various architectures like spaCy, gensim, Text-to-Text Transfer Transformer (T5) Transformer, and T5 Transformer implementation through Spark NLP to perform the summarization task. The text summarization model was trained on a Kaggle News Dataset containing 4515 news articles along with their summaries.

More details about the project and results are available on the PDF report and presentation slides.

Authors

Dhanny Indrakusuma
Manasvini Karthikeyan
Shashwat Jyotishi

Additional Resources

About

Text summarization of news articles using spaCy, gensim, Text-to-Text Transfer Transformer (T5) Transformer, and T5 Transformer implementation through Spark NLP.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published