Skip to content
Cyrille Rossant edited this page Sep 25, 2013 · 36 revisions

IMPORTANT NOTICE: this file format specification is not finalized yet

Decisions after e-mail discussions late August

  • KWA becomes the central file of the specification, needs a change name (probably KWIK)
  • It's a JSON file, containing ALL metadata, scientific and non-scientific, about all files, and all stages, and aesthetic information as well.
  • A data set is useless without this central file (like the XML in the Klusters file format).
  • There is no data duplication in the metadata, as there is no metadata whatsoever in all the other files (KWX, KWD...)
  • The new file formats are saved in a subfolder by default. This subfolder will contain KWA, KWX, KWD.
  • The PRM, PRB files are necessary when creating the files at first, and do not reside in the subfolder.

Versions

  • File format version 1: mid-August, release in KlustaViewa 0.2.0 and KwikSkope 0.1.0
  • File format version 2: late September, work in progress

Overview

This file format deals with multichannel extracellular recordings. The recordings made during an experiment are saved in a small set of files that contain unprocessed (raw) and processed data as well as detected and sorted spikes. These files are handled by all programs of the suite: KwikKonvert, KwikScope, SpikeDetekt, KlustaKwik and KlustaViewa.

The files are either binary HDF5 files, or text-based files written in a human-readable structured format.

Conversion tools from the previous file format (supported in NDManager, NeuroScope and Klusters) to the new format are provided.

File format specification

  • All data corresponding to a given experiment are stored in two HDF5-based files and a JSON file:

    • the KWIK file contains the metadata
    • the KWX file contains the spiking data
    • the KWD files contain the (un)filtered data
  • Non-scientific visualization-related metadata is also stored in the KWIK file (a JSON text-based file).

  • All files contain a version number in / (VERSION attribute), which is an integer equal to 2 now.

  • The input files the user provides to the programs to generate these data are:

    • the raw data coming out from the acquisition system, in any proprietary format (NS5, etc.)
    • processing parameters (PRM file) and description of the probe (PRB file)

Software-generated files

These files should not be edited in normal circumstances. They are generated by the programs as a function of the user inputs. All these files have a common filename prefix, i.e. myexperiment20130801.kwx, myexperiment20130801.raw.kwd, and so on.

The KWX and KWD files are HDF5 files. They are handled by the Python programs through the PyTables library. They can be read by any HDF5/NetCDF4 library or viewer in any programming language.

Any HDF5 file contains a hierarchy of groups (folder-like) and datasets (file-like). A group can contain sub-groups and datasets (heterogeneous tables or multidimensional arrays). Any group can have attributes (metadata), which are simple key-value associations. Any group or dataset has a POSIX-like path, the root being /.

We detail here in detail the internal structure of the files.

KWX

The HDF5 KWX file contains all spiking information.

  • /shanks/shankX/ ( X being the shank index, starting from 1): group with the spikes detected on that shank.

  • /shanks/shankX/spikes: table, one row = one spike, and the following columns:

    • time: UInt64, spike time, in number of samples (max ~ 10^19)
    • features: Float32(nfet,), a vector with the spike features, typically nfet=nchannels*fetdim+nextrafet, with fetdim the number of principal components per channel
    • masks: UInt8(nfet,), a vector with the masks, 0=totally masked, 255=unmasked
    • cluster_auto: UInt32, the cluster number (max ~ 10^10), obtained after the automatic clustering stage
    • cluster_manual: UInt32, the cluster number (max ~ 10^10), obtained after the manual stage
  • /shanks/shankX/waveforms: table, one row = one spike, and the following columns:

    • waveform_filtered: Int16(nsamples*nchannels,), a vector with the high-pass filtered spike waveform. Stride order: sample first, channel second.
    • waveform_unfiltered: Int16(nsamples*nchannels,), a vector with the raw spike waveform.
  • /shanks/shankX/clusters: table, one row = one cluster, and the following columns:

    • cluster: UInt32, the cluster number
    • group: UInt8, the cluster group (max = 255)
  • /shanks/shankX/groups_of_clusters: table, one row = one cluster group, and the following columns:

    • group: UInt8, the group number (convention in KlustaViewa: 0=Noise, 1=MUA, 2=Good, 3=Unsorted)
    • name: String(64), the group name

KWD

The HDF5 KWD files contain all non-spiking (raw or filtered) information.

  • .raw.KWD:

    • /data_raw: EArray(Int16, (duration*freq, nchannels)) with the raw data on all channels
  • .high.KWD:

    • /data_high: high-pass filered data
  • .low.KWD:

    • /data_low: low-pass filered data

KWIK

This JSON text file contains all metadata related to the experiment, and aesthetic information about channel and cluster colors, the channel positions and scaling, etc.

{
    "channel_height": 0.25,
    "lastviewed_x": 123.4,

    "channels":
        [
             {"channel": 0, "group": 0, "name": "ch0"},
             {"channel": 1, "group": 0, "name": "ch1", "channel_height": 0.1},
             {"channel": 2, "group": 0, "name": "LFP", "color": 12},
             {"channel": 3, "group": 0},
             {"channel": 4, "group": 0},
             {"channel": 5, "group": 0},
             {"channel": 6, "group": 0},
             {"channel": 7, "group": 1, "name": "ch7", "ypos": 0.25},
             {"channel": 8, "group": 2, "name": "Sync Pulse", "visible": false}
        ],
        
            # ALTERNATIVELY: {"0": {"group": 0, ...}}, i.e. hash table instead of list, key=channel index
            
    "groups_of_channels":
        [
             {"group": 0, "name": "Hippocampus", "color": 14},
             {"group": 1, "name": "Prefrontal Cortex", "visible": false, "color": 13},
             {"group": 2, "name": "Auxiliary Data", "color": 2}
        ],

    "shanks": 
        {   
            "shank_index": 1,
            
            "clusters":
                [
                    {"cluster": 0, "color": 3},
                    {"cluster": 1, "color": 4},
                ],

            "groups_of_clusters":
                [
                     {"group": "0", "color": 1},
                     {"group": "1", "color": 2},
                     {"group": "2", "color": 3},
                     {"group": "3", "color": 4},
                ],
        }
}

QUESTION: where do we put the probe graph + geometry? KWIK file?

User-provided files

These files are provided by the user to the programs, which uses them to process it, and optionally integrate them in the program-generated files.

PRB

This JSON text file describes the probe used for the experiment: its geometry, its topology, and the dead channels.

{
    "dead_channels": [2, 6],
    "shanks": 
        [
            {
                "shank_index": 1,
                "channels": [0, 1, 2, 3],
                "graph": [[0, 1], [2, 3], ...],
                "geometry": {"0": [0.1, 0.2], "1": [0.3, 0.4], ...}
            },
            {
                "shank_index": 2,
                "channels": [4, 5, 6, 7],
                "graph": [[4, 5], [6, 7], ...],
                "geometry": {"4": [0.1, 0.2], "5": [0.3, 0.4], ...}
            }
        ]
}

PRM

This text file (written in a tiny subset of Python) contains all parameters necessary for the programs to process, open and display the data. Each line is either a comment (starting with #) or a VARNAME = VALUE where VARNAME is the variable name, and VALUE is either a number, a string (within quotes), or a list of those. This structure ensures that it's easy to read/write this file programmatically.

This file is converted into JSON before being saved within the KWIK file.

EXPERIMENT_NAME = 'myexperiment'
RAW_DATA_FILES = ['n6mab041109blahblah1.ns5', 'n6mab041109blahblah2.ns5']
PRB_FILE = 'buzsaki32.probe'
NCHANNELS = 32
SAMPLING_FREQUENCY = 20000.
IGNORED_CHANNELS = [2, 5]
NBITS = 16
VOLTAGE_GAIN = 10.
WAVEFORMS_NSAMPLES = 20  # or a dictionary {shank: nsamples}
FETDIM = 3  # or a dictionary {shank: fetdim}
# ...

# SpikeDetekt parameters file
# ...

Relation with the old file formats

Where do the old files go?

  • CLU ==> KWX
  • RES ==> KWX
  • FET ==> KWX
  • SPK ==> KWX
  • DAT ==> .raw.KWD
  • FIL ==> .high.KWD
  • EEG ==> .low.KWD

Conversion

KwikKonvert

The KwikKonvert tool converts from the old formats to the new ones.

  • With a PRM file:

    # Write a .raw.KWD file.
    kwikkonvert myexperiment.PRM
    
  • With command-line arguments (low priority):

    # Write a .raw.KWD file.
    kwikkonvert mydatablah1.ns5 mydatablah2.ns5 [--name mydata] [--probe myprobe.probe] [--x myxml.xml] [--nchannels 32] [--freq 20000] [--nbits 16] [--ignore-channels 1,2,3]
    

KlustaViewa

In KlustaViewa, the user can select a set of files in the old format: in this case, they will be converted into the new file format. The old .probe file is then mandatory for the creation of the HDF5 file.

Export

Some programs may need to export the data in old or open formats to simplify subsequent analysis.

KwikSkope

Automatically save in:

  • .KWIK: aesthetic information about channel groups and colors
  • .PROBE: dead channels (or in KWIK??)

SpikeDetekt

Automatically save in:

  • .KWX: spikes and trivial clusters (cluster = 2 for all spikes)
  • .low.KWD, .high.KWD: filtered data
  • .RES: spike times

KlustaKwik

  • .KWX: load spikes, save cluster_auto

KlustaViewa

Automatically save in:

  • .KWX: clusters, group of each cluster, group names
  • .KWIK: aesthetic information, cluster colors, group colors
  • .CLU: clusters, with noisy clusters merged in clusters 0 and 1. WARNING: there shouldn't be good clusters 0 and 1!

Workflow

The idea is that the PRM file contains all information required for the programs to run correctly.

  • Step 1: create a PRM text file, with the input files (in NS5 or anything), the probe files, various parameters...
  • Step 2: run KwikKonvert to create a .KWD file.
  • Step 3: run KwikSkope to view the raw data, and tag bad channels. The program requires the PRM file to know where to save the dead channels (in the PROBE file specified in the PRM file).
  • Step 4: run SpikeDetekt (using the PRM file, linking to the PROBE file and the KWD files).
  • Step 5: run KlustaKwik.
  • Step 6: run KlustaViewa.
Clone this wiki locally