Data access and format

Data access

We share our data through two services: our public Globus.org endpoint and our webshare: buzsakilab.nyumc.org. Each public session in the databank has a direct link to the dataset via our Globus endpoint and our webshare. A subset of the datasets is also available at CRCNS.org. If you have an interest in a dataset that is not listed or is lacking information, please contact us at buzsaki-databank@googlegroups.com.

Data path and naming convention

The databank uses one main path for each session called a basepath. Each session is further identified by a sessionName (name of the session), which also is referred to as a basename (the same thing). The basepath should contain the raw ephys data. The data in the basepath follow this naming convention: sessionName.*, e.g. sessionName.dat (raw ephys data) and sessionName.lfp (low-pass filtered and down-sampled ephys data). The clustered data can be in a (relative) subdirectory (e.g. the files created by KiloSort), defined in session.spikeSorting.relativePath or in the basepath. All files generated by CellExplorer will be saved to the basepath as well.

Data structures

Each type of data is saved in its own MATLAB structure, where a subset of the structures is inherited from buzcode. Please see the list of data containers in the next section. CellExplorer is fully compatible with the Buzcode toolbox repository.

Session metadata

A MATLAB struct session stored in a .mat file: sessionName.session.mat. The session struct contains all session-level metadata. The session struct can be generated using the sessionTemplate.m and inspected with gui_session.m. The sessionName.session.mat files should be stored in the basepath. It is structured as defined below:

general
- name : name of the session
- investigator : investigator of the session
- projects : projects the session belong to
- date : the date the session was recorded
- time : start time of the session
- location : location where the session took place
- experimenters : who performed the experiments
- duration : the total duration of the session (seconds)
- sessionType : type of session (chronic/acute)
- notes : any notes
animal
- name : name of the animal
- sex : sex of the animal
- species : species of animal
- strain : the strain of animal
- geneticLine : genetic line of animal
epochs
- name
- behavioralParadigm
- builtMaze
- mazeType
- manipulations
- startTime
- stopTime
extracellular
- equipment : the hardware used to acquire the data
- fileFormat : format of the raw data
- sr : sampling rate
- nChannels : number of channels
- nSamples : number of samples
- nElectrodeGroups : number of electrode groups
- electrodeGroups (struct) : struct with the definition of electrode groups (1-indexed)
- nSpikeGroups : number of spike groups
- spikeGroups (struct) : struct with the definition of spike groups (1-indexed)
- precision : e.g. signed int16.
- leastSignificantBit : range/precision in µV. Intan system: 0.195µV/bit
- srLFP : sampling rate of the LFP file
- electrode : struct with implanted electrodes
  - siliconProbes : name of the probe
  - company : company producing the probe
  - nChannels : number of channels
  - nShanks : number of shanks
  - AP_coordinates : Anterior-Posterior coordinates(mm)
  - ML_coordinates : Medial-Lateral coordinates (mm)
  - depth : implant depth (mm)
  - brainRegions : implant brain region acronym (Allen institute Atlas)
brainRegions
- regionAcronym : e.g. CA1 or HIP, Allen institute Atlas
  - brainRegion
  - channels : list of channels
  - electrodeGroups : list of electrode groups
channelTags
- tagName (e.g. Theta, Cortical, Ripple, Bad)
  - channels : list of channels (1-indexed)
  - electrodeGroups : list of electrode groups (1-indexed)
behavioralTracking
- equipment : the hardware used to acquire the data
- filenames : file names containing the tracking
- framerate : frame rate of the tracking
- notes
inputs
- inputTag : unique name, e.g. temperature, stimPulses, OptitrackTTL
  - equipment : the hardware used to acquire the data
  - inputType : adc, aux, dat, dig …
  - channels : list of channels (1-indexed)
  - description
analysisTags
- tagName: the numeric or string values saved in the tag
spikeSorting
- method : KiloSort, KiloSort2,SpyKING CIRCUS, Klustakwik, MaskedKlustakwik, MountainSort, IronClust, MClust, UltraMegaSort2000
- format : Phy, KiloSort, SpyKING CIRCUS, Klustakwik, KlustaViewa, Neurosuite, MountainSort, IronClust, ‘ALF, AllenSDK, MClust, UltraMegaSort2000
- relativePath : relative to basepath
- channels : list of channels selected.
- spikeSorter : Person performed the manual spike sorting
- notes
- cellMetrics : (boolean) if the cell metrics has been run
- manuallyCurated : (boolean) if manual curation has been performed
timeseries
- typeTag : unique type (adc, aux, dat, dig …)
  - fileName : file name
  - precision : e.g. int16
  - nChannels : number of channels
  - sr : sampling rate
  - nSamples : number of samples
  - leastSignificantBit : range/precision in µV. Intan system: 0.195µV/bit
  - equipment : the hardware used to acquire the data

Spikes

A MATLAB struct spikes stored in a .mat file: sessionName.spikes.cellinfo.mat. It can be generated with loadSpikes.m. The processing module ProcessCellMetrics.m used the script loadSpikes.m, to automatically load spike-data from either KiloSort, Phy, or Neurosuite and saves it to a spikes struct. sessionName.spikes.cellinfo.mat is saved to the basepath. The struct has the following fields:

ts: a 1xN cell-struct for N units each containing a 1xM vector with M spike events in samples.
times: a 1xN cell-struct for N units each containing a 1xM vector with M spike events in seconds.
cluID: a 1xN vector with inherited IDs from the applied clustering algorithm.
UID: a 1xN vector with values 1:N.
shankID: a 1xN vector containing the corresponding shank/electrode-group each unit (1-indexed).
maxWaveformCh: a 1xN vector with the channel for the maximum waveform for the units (0-indexed)
maxWaveformCh1: a 1xN vector with the channel for the maximum waveform for the units (1-indexed)
total: a 1xN vector with the total number of spikes for each unit.
peakVoltage: a 1xN vector with spike waveform amplitude (µV).
filtWaveform: a 1xN cell-struct with spike waveforms from maxWaveformChannel (µV).
filtWaveform_std: a 1xN cell-struct with the std of the spike waveforms (µV).
rawWaveform: a 1xN cell-struct with raw spike waveforms (µV).
rawWaveform_std: a 1xN cell-struct with std of the raw spike waveforms (µV).
timeWaveform: a 1xN cell-struct with spike timestamps for the waveforms (ms).
maxWaveform_all: a 1xN vector with channel indexes for the _all waveforms for the units (1-indexed)
rawWaveform_all: a 1xN cell-struct with raw spike waveforms from maxWaveform_all (µV).
filtWaveform_all: a 1xN cell-struct with filtered spike waveforms from maxWaveform_all (µV).
timeWaveform_all: a 1xN cell-struct with spike timestamps for the _all waveforms (ms).
numcells: the number of cells.
sessionName: name of the session (string).
spindices: a Kx2 matrix where the first column contains the K spike times for all units and the second column contains the unit index for each spike.
processinginfo: a substruct with information about how the spikes were generated including the name of the function, version, date, and the parameters.

Any extra field can be added with info about the units, e.g. the theta phase of each spike for the units, or the position/speed of the animal for each spike.

Cell metrics

The cell metrics are kept in a cell_metrics struct as described here. The cell metrics are stored in: sessionName.cell_metrics.cellinfo.mat.

Firing-rate maps

This is a data container for firing-rate map data. A MATLAB struct ratemap containing 1D or linearized firing rat maps, stored in a .mat file: sessionName.ratemap.firingRateMap.mat. The firing rate maps have the following fields:

map: a 1xN cell-struct for N units each containing a KxL matrix, where K corresponds to the bin count and L to the number of states. States can be trials, manipulation states, left-right states, etc.
x_bins: a 1xK vector with K bin values used to generate the firing rate map.
state_labels: a 1xL vector with char labels describing the states.

Events

This is a data container for event data. A MATLAB struct eventName stored in a .mat file: sessionName.eventName.events.mat with the following fields:

timestamps: Px2 matrix with intervals for the P events in seconds.
peaks: Event time for the peak of each events in seconds (Px1).
amplitude: amplitude of each event (Px1).
amplitudeUnits: specify the units of the amplitude vector.
eventID: numeric ID for classifying various event types (Px1).
eventIDlabels: cell array with labels for classifying various event types defined in stimID (cell array, Px1).
eventIDbinary: boolean specifying if eventID should be read as binary values (default: false).
center: center time-point of event (in seconds; calculated from timestamps; Px1).
duration: duration of event (in seconds; calculated from timestamps; Px1).
detectorinfo: info about how the events were detected.

The *.events.mat files should be stored in the basepath. Any events files located in the basepath will be detected in the pipeline ProcessCellMetrics.m and an average PSTHs will be generated.

Manipulations

This is a data container for manipulation data. A MATLAB struct manipulationName stored in a .mat file: sessionName.eventName.manipulation.mat with the following fields:

timestamps: Px2 matrix with intervals for the P events in seconds.
peaks: Event time for the peak of each events in seconds (Px1).
amplitude: amplitude of each event (Px1).
amplitudeUnits: specify the units of the amplitude vector.
eventID: numeric ID for classifying various event types (Px1).
eventIDlabels: cell array with labels for classifying various event types defined in stimID (cell array, Px1).
eventIDbinary: boolean specifying if eventID should be read as binary values (default: false).
center: center time-point of event (in seconds; calculated from timestamps; Px1).
duration: duration of event (in seconds; calculated from timestamps; Px1).
detectorinfo: info about how the events were detected.

The *.manipulation.mat files should be stored in the basepath. events and manipulation files are similar in content, but only manipulation intervals are excluded in the pipeline. Any manipulation files located in the basepath will be detected in the pipeline (ProcessCellMetrics.m) and an average PSTH will be generated. Events and manipulation files are similar in content, but only manipulation intervals are excluded in the pipeline.

Channels

This is a data container for channel-wise data. A MATLAB struct ChannelName stored in a .mat file: sessionName.ChannelName.channelinfo.mat with the following optional fields:

channel: a 1xQ vector containing a list of Q channel indexes (0-indexed).
channelClass: a 1xQ cell with classification assigned to each channel (char).
processinginfo: a struct with information about how the mat file was generated including the name of the function, version, date, and parameters.
detectorinfo: If the channelinfo struct is based on determined events, detectorinfo contains info about how the event was processed.

The *.channelinfo.mat files should be stored in the basepath.

Channels coordinates chanCoords : Channels coordinates struct (probe layout) with x and y position for each recording channel saved to sessionName.chanCoords.channelinfo.mat with the following fields:

x : x position of each channel (µm).
y : y position of each channel (µm).

This works as a simple 2D representation of recordings and will help you determine the location of your neurons. It is also used to determine the spike amplitude length constant of the spike waveforms across channels.

Allen Institute’s Common Coordinate Framework ccf : Allen Institute’s Common Coordinate Framework for each recording channel saved to sessionName.ccf.channelinfo.mat with the following fields:

ap : the anterior-posterior position of each channel (µm).
dv : the dorsol-ventral position of each channel (µm).
lr : the left-right position of each channel (µm).

The Allen Institute’s Common Coordinate Frame allows you to visualize your cells into a standardized mouse atlas.

Time series

This is a data container for other time-series data (check other containers for specific formats like intracellular). A MATLAB struct timeserieName stored in a .mat file: sessionName.timeserieName.timeSeries.mat with the following fields:

data : a [nSamples x nChannels] vector with time-series data.
timestamps : a [nSamples x 1] vector with timestamps.
precision : e.g. int16.
units : e.g. mV.
nChannels : number of channels.
channelNames : struct with names of channels.
sr : sampling rate.
nSamples : number of samples.
leastSignificantBit : range/precision in µV. Intan system: 0.195µV/bit.
equipment : hardware used to acquire the data.
notes : Human-readable notes about this time series data.
description : Description of this time series data.
processinginfo : a struct with information about how the .mat file was generated including the name of the function, version, date, source file, and parameters.
- sourceFileName : filename.

Any other field can be added to the struct containing time series data. The *.timeseries.mat files should be stored in the basepath.

States

This is a data container for brain states data. A MATLAB struct states stored in a .mat file: sessionName.statesName.states.mat. States can contain multiple temporal states defined by intervals, .e.g sleep/wake-states (awake/nonREM and/REM) and cortical states (Up/Down). It has the following fields:

ints: a struct containing intervals (start and stop times) for each state (required).
- .stateName: start/stop time for each instance of state stateName (required).
processinginfo: a struct with information about how the .mat file was generated including the name of the function, version, date and parameters.
detectorinfo: a struct with information about how the states were detected.

Optional fields

idx: a struct containing timestamps for each state.
- .states: a [t x 1] vector giving the state at each point in time (t: number of timestamps).
- .timestamps: a [t x 1] vector with timestamps.
- .statenames: {Nstates} cell array for the name of each state. Any other field can be added to the struct containing states data. The *.states.mat files should be stored in the basepath.

Behavior

This is a data container for behavioral tracking data. A MATLAB struct behaviorName stored in a .mat file: sessionName.behaviorName.behavior.mat with the following fields:

timestamps: an array of timestamps that match the data subfields (in seconds).
sr: sampling rate (Hz).
SpatialSeries: several options as defined below, each with optional subfields:
- units: defines the units of the data.
- resolution: The smallest meaningful difference (in specified unit) between values in data.
- referenceFrame: description defining what the zero-position is.
- coordinateSystem: position: cartesian[default] or polar. orientation: Euler or quaternion[default].
position: spatial position defined by: x, x/y or x/y/z axis default units: meters).
speed: a 1D representation of the running speed (cm/s).
orientation: .x, .y, .z, and .w (default units: radians)
pupil: pupil-tracking data: .x, .y, .diameter.
linearized: a projection of spatial parameters into a 1-dimensional representation:
- position: a 1D linearized version of the position data.
- speed: behavioral speed of the linearized behavior.
- acceleration: behavioral acceleration of the linearized behavior.
events: behaviorally derived events, .e.g. as an animal passed a specific position or consumes reward.
epochs: behaviorally derived epochs.
trials: behavioral trials defined as intervals or continuous vector with numeric trial numbers.
states: e.g. spatially defined regions like the central arm or waiting area in a maze. Can be binary or numeric.
stateNames: names of the states.
timeSeries: can contain any derived time traces projected into the behavioral timestamps e.g. temperature, oscillation frequency, power, etc.
notes: Human-readable notes about this TimeSeries dataset.
description: Description of this TimeSeries dataset.
processinginfo: a struct with information about how the .mat file was generated including.
- name of the function, version, date, parameters.

Any other field can be added to the struct containing behavior data. The *.behavior.mat files should be stored in the basepath.

Trials

A MATLAB struct trials stored in a .mat file: sessionName.trials.behavior.mat. The trials struct is a special behavior struct centered around behavioral trials. trials has the following fields:

start: trial start times in seconds.
end: trial end times in seconds.
nTrials: the number of trials.
states: e.g. spatially defined regions like a central arm or waiting area in a maze, stimulation trials, error trials. Must be binary or numeric.
stateNames: names of the states.
timeSeries: can contain any derived time traces averaged onto trial e.g. temperature. Use nan values for undefined trials.
processinginfo: a struct with information about how the .mat file was generated including the name of the function, version, date, and parameters.

Any other field can be added to the struct containing trial-specified data. The trials.behavior.mat files should be stored in the basepath. Trial-wise data should live in this container, while trial-intervals can be stored in other behavior structs.

Intracellular time series

This is a data container for intracellular recordings. Any MATLAB struct intracellularName containing intracellular data would be stored in a .mat file: sessionName.intracellularName.intracellular.mat. It contains fields inherited from timeSeries with the following fields:

data : a [nSamples x nChannels] vector with time-series data.
timestamps : a [nSamples x 1] vector with timestamps.
precision : e.g. int16.
units : e.g. mV.
nChannels : the number of channels.
channelNames : struct with names of channels.
sr : sampling rate.
nSamples : the number of samples.
leastSignificantBit : range/precision in µV. Intan system: 0.195µV/bit.
equipment : the hardware used to acquire the data.
notes : Human-readable notes about this time series data.
description : Description of this time series data.
intracellular : Intracellular specific fields
- clamping : clamping method: current,voltage.
- type : recording type (Patch or Sharp).
- solution : Glass pipette solution (string describing the solution).
- bridgeBalance : bridge balance (M ohm).
- seriesResistance : Series resistance (M ohm).
- inputResistance : Input resistance (M ohm).
- membraneCaparitance : Description of this time series data.
- electrodeMaterial : glass,tungsten, etc.
- groundElectrode : Description of the ground electrode.
- referenceElectrode : Description of the reference electrode.
processinginfo : a struct with information about how the .mat file was generated including:
- function : which function was used to generate the struct.
- version : the version of the function, if any.
- date : date of processing.
- parameters : input parameters when the struct was created.
- sourceFileName : filename of the original source file. Any other field can be added to the struct containing intracellular time series data. The *.intracellular.mat files should be stored in the basepath.

Data containers

The data is organized into data-type specific containers, a concept introduced by buzcode:

sessionName.session.mat: session-level metadata.
sessionName.*.lfp.mat: derived ephys signals including theta-band filtered lfp.
sessionName.*.cellinfo.mat: Spike derived data includingspikes, cell_metrics, mono_res
sessionName.*.firingRateMap.mat: firing rate maps. Derived from behavior and spikes, e.g. ratemap.
sessionName.*.events.mat: events data, including ripples, SWR,
sessionName.*.manipulation.mat: manipulation data:
sessionName.*.channelinfo.mat: channel-wise data, including impedance
sessionName.*.timeseries.mat:
sessionName.*.behavior.mat: behavior data, including position tracking.
sessionName.*.states.mat: brain states derived data including SWS/REM/awake and up/down states.
sessionName.*.intracellular.mat: intracellular data.