Log Analysis Scripts

This is a collection of Python scripts that work with log files of the sort generated by Concord Consortium tools.

Prerequisites

Python 3 is assumed.

Installation

Clone this repository and cd into its directory.

Recommended: create a virtual environment for python:

python3 -m venv venv
source venv/bin/activate

The first line is one-time setup; the second "source" command must be repeated each time you open a new terminal window. You should see "venv" in your prompt when the virtual environment is activated.

Install the required Python packages:

pip install -r requirements.txt

Scripts included

For all scripts:

cd to the application directory
Make sure the virtual environment has been activated (source venv/bin/activate, as above)

Command line argument -v or --verbose will print out additional information.

Command line argument -h or --help will show all available command line options and usage information.

`analyze-json-column.py`

Analyzes columns of a CSV log file that contain JSON data, and lists all of the keys that occur in the JSON.

You must supply the name of the CSV file and the heading of the column that contains the JSON data (generally parameters or extras for CC logs), eg

./src/analyze-json-column -c parameters my-data-file.csv

The output will be a list of keys, one per line, using dots to separate levels of hierarchy.
So if the JSON data only included:

{ "role": "student",
  "page": { "number": 1, "title: "Introduction" } }

The output would be:

role
page.number
page.title

This is mostly useful so that you know what keys can be used with the next script.

`expand-json-fields.py`

Extract fields from a JSON column of a CSV file into their own columns.

You supply the heading of JSON column and one or more fields that exist in that column (dot-separated, as above), and this script will extract the values of those fields in each row to their own columns. These columns will be added after all existing columns. The JSON column will be removed.

Example:

./src/expand-json-fields.py -c parameters -f problem -f role my-data-file.csv > new-file.csv

`deidentify-columns.py`

Replace the values in one or more columns with opaque identifiers.

You specify the names of one or more columns (eg, student name), and each unique value will be replaced with an anonymous identifier (specifically, this is a short uuid built from the original value).

A file is also written out with the mapping of original values to hashed values.

Example:

./src/deidentify-columns.py -c student_name -c school -m mapping.csv my-data-file.csv > new-file.csv

License

All content is (c) The Concord Consortium and licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Log Analysis Scripts

Prerequisites

Installation

Scripts included

`analyze-json-column.py`

`expand-json-fields.py`

`deidentify-columns.py`

License

About

Releases

Packages

Languages

concord-consortium/log-analysis-scripts

Folders and files

Latest commit

History

Repository files navigation

Log Analysis Scripts

Prerequisites

Installation

Scripts included

analyze-json-column.py

expand-json-fields.py

deidentify-columns.py

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`analyze-json-column.py`

`expand-json-fields.py`

`deidentify-columns.py`

Packages