📰 BigData Project : Arxiv_Analysis (Computer Science)

Get to know some interesting things concerning the academic frontier in CS by analysing numerous papers in ArXiv.

Group Leader: Jianshu Zhang

Group Member: Yanfu Kai; Ziheng Peng

Adviser : Prof. Run Wang

Report Pre

Our work

Arxiv_Analysis Project Structure

This project is structured as follows:

./crawler_utils: Contains utilities for crawling data from Arxiv.
./dataset: To replicate the whole project, you need to download bert-base-uncased. And all the csv file can be reproduced by running ./crawler_utils/crawl.py, ./dataset/prepocess.py, ./dataset/trans_to_bert.py.
./results: The result of data analysis.
./tools: Includes every tools used for analysis and the outputs will be saved in ./visualization. Below is a list of the scripts along with a brief description of their purpose:

cata_kmeans.py: Performs K-Means clustering on the dataset to identify distinct groups based on characteristics.

cata_num_rank.py: Rank the number of different catagories from 11/30/2022 - 12/01/2023 .

cata_rela_cs.py: Analyzes the relationship between different categories .

cata_rela_sum.py: Summarizes the relationships between categories by using a network.

cata_wordcloud.py: Generates a word cloud from categorical data to visualize the frequency or importance of categories.

month_inter.py: Try to find the statistic regularity of the interval of the initial and the last submission.

month_statistic.py: Interprets monthly data, possibly to identify trends or patterns over time.

_ rela_cs_radar.py_: Creates a radar chart to show the relationship of cs with other catagories.

year_statistic.py: Calculates yearly statistics to provide insights into long-term trends.
./visualization: Visualization of data analysis results, containing various and appropriate figures.
./test_if_spark_can_work.py: Test the Spark environment setup.

Tasks

Algorithm Design

Few examples of our visualization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📰 BigData Project : Arxiv_Analysis (Computer Science)

Our work

Arxiv_Analysis Project Structure

Tasks

Algorithm Design

Few examples of our visualization

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
crawler_utils		crawler_utils
dataset		dataset
results		results
tools		tools
visualization		visualization
README.md		README.md
Report.pdf		Report.pdf
presentation.pptx		presentation.pptx
test_if_spark_can_work.py		test_if_spark_can_work.py

SKURA502/BigData_Arxiv_Analysis

Folders and files

Latest commit

History

Repository files navigation

📰 BigData Project : Arxiv_Analysis (Computer Science)

Our work

Arxiv_Analysis Project Structure

Tasks

Algorithm Design

Few examples of our visualization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages