This project is intended to read and analyze CSV files. Based on the example source code written in LUA, we implemented multiple functions in Python as listed below. To suppot these functions, we defined 5 classes with specific methods as described below.
git clone https://github.com/yzhu27/CSVAnalyser.git
cd ./CSVAnalyser
python ./main.py -e ALL
*Notice: run main.py
in the root directory directly.
Read CSV
- Import the input file to a dictionary line by line, separated by given separator.
CLI
- Update information through command line. Help string would be printed if run "-h".
Generate Statistical Summaries
- This function is for column data. For each column, the data is either numeric (which denoted with a leading upper case letter) or symbolic (which denoted with a leading lower case letter). Employ different statistical variebles to describe both types of data.
Cols
- Record column names and variables, differentiating dependent variables and independent variables by leading letters of column names.
Rows
- Record data by row.
Num
- Num class is for calculating features of numeric data. Methods of add, mid and div are included, among which mid is stand for the middle value of the sorted data, while div means standard deviation of this column of numbers.
Sym
- Sym class is for calculating features of symbolic data. Methods of add, mid and div are included, among which mid represents the most common symbol in the set; div is the entropy of these symbols.
Data
The test cases are given by LUA source code and https://github.com/yzhu27/CSVAnalyser/blob/main/data/auto93.csv. Test coverage is: