This is a collection of Python scripts that work with log files of the sort generated by Concord Consortium tools.
Python 3 is assumed.
Clone this repository and cd
into its directory.
Recommended: create a virtual environment for python:
python3 -m venv venv
source venv/bin/activate
The first line is one-time setup; the second "source" command must be repeated each time you open a new terminal window. You should see "venv" in your prompt when the virtual environment is activated.
Install the required Python packages:
pip install -r requirements.txt
For all scripts:
cd
to the application directory- Make sure the virtual environment has been activated (
source venv/bin/activate
, as above)
Command line argument -v
or --verbose
will print out additional information.
Command line argument -h
or --help
will show all available command line options and usage information.
Analyzes columns of a CSV log file that contain JSON data, and lists all of the keys that occur in the JSON.
You must supply the name of the CSV file and the heading of the column that contains the JSON data (generally parameters
or extras
for CC logs), eg
./src/analyze-json-column -c parameters my-data-file.csv
The output will be a list of keys, one per line, using dots to separate levels of hierarchy.
So if the JSON data only included:
{ "role": "student",
"page": { "number": 1, "title: "Introduction" } }
The output would be:
role
page.number
page.title
This is mostly useful so that you know what keys can be used with the next script.
Extract fields from a JSON column of a CSV file into their own columns.
You supply the heading of JSON column and one or more fields that exist in that column (dot-separated, as above), and this script will extract the values of those fields in each row to their own columns. These columns will be added after all existing columns. The JSON column will be removed.
Example:
./src/expand-json-fields.py -c parameters -f problem -f role my-data-file.csv > new-file.csv
Replace the values in one or more columns with opaque identifiers.
You specify the names of one or more columns (eg, student name
), and each unique value will be replaced with an anonymous identifier
(specifically, this is a short uuid built from the original value).
A file is also written out with the mapping of original values to hashed values.
Example:
./src/deidentify-columns.py -c student_name -c school -m mapping.csv my-data-file.csv > new-file.csv
All content is (c) The Concord Consortium and licensed under the MIT License.