Skip to content

concord-consortium/log-analysis-scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Log Analysis Scripts

This is a collection of Python scripts that work with log files of the sort generated by Concord Consortium tools.

Prerequisites

Python 3 is assumed.

Installation

Clone this repository and cd into its directory.

Recommended: create a virtual environment for python:

python3 -m venv venv
source venv/bin/activate

The first line is one-time setup; the second "source" command must be repeated each time you open a new terminal window. You should see "venv" in your prompt when the virtual environment is activated.

Install the required Python packages:

pip install -r requirements.txt

Scripts included

For all scripts:

  • cd to the application directory
  • Make sure the virtual environment has been activated (source venv/bin/activate, as above)

Command line argument -v or --verbose will print out additional information.

Command line argument -h or --help will show all available command line options and usage information.

analyze-json-column.py

Analyzes columns of a CSV log file that contain JSON data, and lists all of the keys that occur in the JSON.

You must supply the name of the CSV file and the heading of the column that contains the JSON data (generally parameters or extras for CC logs), eg

./src/analyze-json-column -c parameters my-data-file.csv

The output will be a list of keys, one per line, using dots to separate levels of hierarchy.
So if the JSON data only included:

{ "role": "student",
  "page": { "number": 1, "title: "Introduction" } }

The output would be:

role
page.number
page.title

This is mostly useful so that you know what keys can be used with the next script.

expand-json-fields.py

Extract fields from a JSON column of a CSV file into their own columns.

You supply the heading of JSON column and one or more fields that exist in that column (dot-separated, as above), and this script will extract the values of those fields in each row to their own columns. These columns will be added after all existing columns. The JSON column will be removed.

Example:

./src/expand-json-fields.py -c parameters -f problem -f role my-data-file.csv > new-file.csv

deidentify-columns.py

Replace the values in one or more columns with opaque identifiers.

You specify the names of one or more columns (eg, student name), and each unique value will be replaced with an anonymous identifier (specifically, this is a short uuid built from the original value).

A file is also written out with the mapping of original values to hashed values.

Example:

./src/deidentify-columns.py -c student_name -c school -m mapping.csv my-data-file.csv > new-file.csv

License

All content is (c) The Concord Consortium and licensed under the MIT License.

About

Collection of python scripts for processing log files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages