Skip to content

Extractive text summarization using genetic algorithms.

Notifications You must be signed in to change notification settings

thangld201/GA-Text-Summarization

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GA-Text-Summarization

Extractive text summarization using genetic algorithms

If you found our code useful for research, please use the following BibTeX entry for citation.

@misc{chen2021genetic,
      title={Genetic Algorithms For Extractive Summarization}, 
      author={William Chen and Kensal Ramos and Kalyan Naidu Mullaguri},
      year={2021},
      eprint={2105.02365},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Install Dependencies

pip install requirements.txt

Format Dataset

This splits the corpus in the stories folder into a body (the actual article) and highlights (the summary). Does not split dataset into training and testing.

cd src
python dataset.py

Split into Train and Test

The program assumes that the dataset is split into training and testing in the following manner. There is no script included for automatic splitting.

GA-Text-Summarization\src\dataset\train\body\sample.txt
GA-Text-Summarization\src\dataset\train\highlights\sample.txt

GA-Text-Summarization\src\dataset\test\body\sample.txt
GA-Text-Summarization\src\dataset\test\highlights\sample.txt

Train Model

cd src
python main.py

Test Model

cd src
python test.py

About

Extractive text summarization using genetic algorithms.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Gherkin 100.0%