Analysis on the Story of a Stone

Author: Yu Lou

Written for Python 3.

preprocess.py

Preprocess text.

Input

hlm.txt: original text

Output

preprocessing.txt: preprocessed text.

preprocess_chapters.py

Split text into chapters and preprocess them.

Input

hlm.txt: original text

Output

chapters(folder): preprocessed text. One file for each chapter, numbered from "1.text".

dict_creat.py

Creat dictionary.

Input

preprocessing.txt: preprocessed text.

Output

dict.csv: dictionary.

word_split.py

Split words apart.

Input

preprocessing.txt: preprocessed text.
dict.csv: dictionary.

Output

word_split.text: splitted text.

word_split_chapters.py

Split words apart in all chapters.

Input

preprocessing.txt: preprocessed text.
dict.csv: dictionary.
chapter(folder): preprocessed text for all chapters.

Output

chapter_split(folder): splitted text. One file for each chapter, numbered from "1.text".

word_count.py

Count words.

Input

word_split.text: splitted text.

Output

word_count.csv: counting result, sorted by number of occurence.

word_count_chapters.py

Count words in each chapters.

Input

chapter_split(folder): splitted text for each chapter.

Output

word_count_chapters.csv: counting result. One line per word and one chapter per column.

analysis.py

Do PCA analysis. Show result on screen.

Prerequisite

"sklearn", "numpy" and "matplotlib" is needed to run this program.

Input

word_count_chapters.csv: word counting result for each chapters.

Output

components.csv: weights for each components.

suffix_tree.py

Libary for suffix tree.

correctness_calculate.py

Calculate the correctness of word splitting algorithm.

Input

*_answer.txt: answer.
*_result.txt: result of the program.

("*" is file prefix)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis on the Story of a Stone

preprocess.py

Input

Output

preprocess_chapters.py

Input

Output

dict_creat.py

Input

Output

word_split.py

Input

Output

word_split_chapters.py

Input

Output

word_count.py

Input

Output

word_count_chapters.py

Input

Output

analysis.py

Prerequisite

Input

Output

suffix_tree.py

correctness_calculate.py

Input

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
README.md		README.md
analysis.py		analysis.py
correctness_calculate.py		correctness_calculate.py
dict_creat.py		dict_creat.py
hlm.txt		hlm.txt
hlm_ANSI.txt		hlm_ANSI.txt
html_generate.py		html_generate.py
preprocess.py		preprocess.py
preprocess_chapters.py		preprocess_chapters.py
preprocess_windows.py		preprocess_windows.py
suffix_tree.py		suffix_tree.py
word_count.py		word_count.py
word_count_chapters.py		word_count_chapters.py
word_split.py		word_split.py
word_split_chapters.py		word_split_chapters.py

chinarobotlab/analysis_on_the_story_of_a_stone

Folders and files

Latest commit

History

Repository files navigation

Analysis on the Story of a Stone

preprocess.py

Input

Output

preprocess_chapters.py

Input

Output

dict_creat.py

Input

Output

word_split.py

Input

Output

word_split_chapters.py

Input

Output

word_count.py

Input

Output

word_count_chapters.py

Input

Output

analysis.py

Prerequisite

Input

Output

suffix_tree.py

correctness_calculate.py

Input

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages