Why deconvolution
Inscopix provides image analysis software, Mosaic, which is pretty good at motion correction, and identifying ROIs. For Mosaic's signal detection, it identifies events using the simple criterion:
ΔF / F > F0 + 3SDThis works adequately for low frequency events on a low background, but completely ignores the magnitude of events, or what happens when events occur in quick succession. While I've done imaging before, I've never done deconvolution, so I thought it would be fun to try it out.
To get sample GCaMP6 imaging data, I turned to the CRCNS, which has a bunch of neuroscience datasets. The most relevant dataset was from the original GCaMP6 paper wherein they recorded calcium fluorescence in parallel with loose seal electrophysiology, to allow comparisons between calcium and spikes. For this example, I loaded the processed data from the GCaMP6s imaging dataset. If you don't care about loading data in python, you can skip the next section and head straight to Deconvolution time!
Loading Svoboda lab data into python
The data is stored in .mat files using the Svoboda lab data format. Since understanding someone else's data structure is as difficult as understanding their musical taste, I'll walk through how to load the data in python. To load the data, you can simply use the scipy.io.loadmat function, with the following arguments:
import scipy.io as scio
svoboda_data = scio.loadmat( 'data_20120521_cell5_003', squeeze_me = True, struct_as_record = False)
It is critical to use the squeeze_me and struct_as_record flags! If you don't, python will load a weird object that will crash your python instance. This took me an embarrassingly long time to figure out.
svoboda_data is a dictionary with four fields; the only one we care about is the 'obj' field, so we can extract it:
svoboda_obj = svoboda_data['obj']
svoboda_obj has a bunch of fields as well; the most important one is timeSeriesArrayHash.value, which contains the data. The value field is an array of five structs, number 0-4, which have:
0: average fluorescence of an ROI
1: average fluorescence of the surrounding neuropil
2: raw e-phys trace from the cell
3: high-pass filtered e-phys
4: detected spikes
The value struct has two fields we care about, .time (which has the... time) and .valueMatrix (which has the values of the fluorescence or voltage). Knowing this, we can load the fluorescence and spike data:
ca_data = svoboda_obj.timeSeriesArrayHash.value[0].valueMatrix
ca_time = svoboda_obj.timeSeriesArrayHash.value[0].time
spike_data = svoboda_obj.timeSeriesArrayHash.value[4].valueMatrix
spike_time = svoboda_obj.timeSeriesArrayHash.value[4].time
I wrote a small wrapper function in python to load full Svoboda lab experiments. You can then plot the calcium fluorescence and spike times on each other:
Fluorescence (blue trace, DF / F) and spike data (green) from the GCaMP6s data set, recorded May 15, 2012, cell 1. |
Deconvolution time!
Now that the data is loaded, we can deconvolve it. Thankfully, Alistair Muldal has implemented the fast non-negative deconvolution described in Vogelstein, 2010 (PyFNND) . To be honest, I only partially understand the deconvolution algorithm. It starts by assuming that calcium can be modeled by spikes that return to baseline with exponential decay, but how they turn that generative model into a fit went beyond me.
In any case, we can run the PyFNND deconvolution on our calcium trace (I've calculated the ΔF / F separately), and get both a fit of the calcium trace, and an imputed spike train:
from pyfnnd import deconvolve, plotting
n_best, c_best, LL, theta_best = deconvolve(
df_f.reshape, dt=0.0166, verbosity=1, learn_theta=(0, 1, 1, 1,0) )
plotting.plot_fit(df_f.reshape(-1, 1).T, n_best, c_best, theta_best, 0.0166)
I played around with the theta parameters to see if I could get a better fit, but the best results I got were with the default ones above.
To get a spike train, you need to threshold the spike heuristic n_hat. Here I used a threshold of 0.1 (to avoid getting false positive spikes), and plotted the imputed spike train, real spike train, and calcium signal.
The imputed spike train (red) for a threshold of 0.1, and the real spikes (green). |
To get a sense for how accurately the deconvoluted spike train matched the real spike train, I calculated the Victor-Purpura distance (VP distance) between the two. The VP distance calculates how many times you would have to insert, delete, or move a spike to turn one spike train into another. VP was handily implemented by Nicolas Jimenez in the python module fit_neuron. (Note, if you want to use this module you should download the .tar.gz from Github, as the pip install has a bug at the moment.) I also wanted to use this metric to get a better sense of the optimal threshold for n_hat, so I ran the VP distance for thresholds of n_hat ranging from 0 to 1. I used a cost of q=5, so that spikes would be inserted or deleted only if there was not a nearby spike within 200 ms.
from fit_neuron.evaluate import spkd_lib as spkd
vp_cost = 5
spk_times = ca_data.spike_time[ca_data.spike_train>0.5]
vp_dist = spkd.victor_purpura_dist(ca_data.fluor_time[n_best >thresh], spk_times, vp_cost)
from fit_neuron.evaluate import spkd_lib as spkd
vp_cost = 5
spk_times = ca_data.spike_time[ca_data.spike_train>0.5]
vp_dist = spkd.victor_purpura_dist(ca_data.fluor_time[n_best >thresh], spk_times, vp_cost)
Similarity between the imputed spike train and real spike train. The normalized VP distance is the VP distance divided by the total number of real spikes. The threshold is the threshold for n_hat. |
A normalized VP distance of 0 means all the spikes were the same for each spike trains; and a distance of 2 means all the spikes from one train had to be deleted and re-inserted to recreate the other spike train. I think a distance of 0.8 means that just over half the real spikes were shared with the imputed spikes. I ran the deconvolution for all 9 neurons in the GCaMP6s dataset, and the minimum normalized distance ranged from 0.73-1. Some cells were undermined by bursts of firing, which caused a lot of false positives in the deconvolution. Others were difficult due to random non-spike related calcium fluctuations. I tried some filtering or baseline subtraction to make the results better but could not do significantly better than the default settings
A few concluding thoughts. Python is sweet, and people have implemented a lot of useful algorithms in it. This allowed me to try things out that would have been otherwise impossible. In the future I will try to clean up python posts by using iPython notebooks.
Deconvolution works pretty well at turning large calcium events into number of spikes. The timing won't be exact, but if you're doing calcium imaging, precise timing (<50ms) doesn't matter anyway.
[The following comments reflect my understanding after a few days of coding / playing. If I make any mistakes here, hopefully my commenters can correct me!] Finally, GCaMP6s was sold as having single spike resolution, but in my hands it does not, at least not for single trials. In Fig. 3F of the GCaMP6 paper they claim to detect 99% of single spikes in visual cortex with a 1% false positive rate. When I read that, I thought they had done something similar to what I just did: given a calcium fluorescence trace, and no prior knowledge, try to identify as many action potentials as you can, and compare to ground truth. What they actually did was answer the question: if you know what a single spike's fluorescent trace is, can you tell the difference between known spike events and random noise? In real imaging, of course, you don't know what spike calcium events look like (it's certainly not a simple exponential!). Also, that false positive rate, while sounding stringent, is quite permissive: if you image at 60Hz, with a 1% false positive rate, you will detect false spikes at a rate of 0.6Hz; if your cell fires at 0.5Hz, as their example cells do, that's a problem.
Having said that, using deconvolution with GCaMP6 works well depending on the type of data you're interested in. If you are recording from low frequency neurons where each action potential can cause spiking in downstream neurons, missing half the spikes is important. But if your neuron fires has a high baseline firing rate, and individual spikes aren't too important, the combination works well.
A few concluding thoughts. Python is sweet, and people have implemented a lot of useful algorithms in it. This allowed me to try things out that would have been otherwise impossible. In the future I will try to clean up python posts by using iPython notebooks.
Deconvolution works pretty well at turning large calcium events into number of spikes. The timing won't be exact, but if you're doing calcium imaging, precise timing (<50ms) doesn't matter anyway.
[The following comments reflect my understanding after a few days of coding / playing. If I make any mistakes here, hopefully my commenters can correct me!] Finally, GCaMP6s was sold as having single spike resolution, but in my hands it does not, at least not for single trials. In Fig. 3F of the GCaMP6 paper they claim to detect 99% of single spikes in visual cortex with a 1% false positive rate. When I read that, I thought they had done something similar to what I just did: given a calcium fluorescence trace, and no prior knowledge, try to identify as many action potentials as you can, and compare to ground truth. What they actually did was answer the question: if you know what a single spike's fluorescent trace is, can you tell the difference between known spike events and random noise? In real imaging, of course, you don't know what spike calcium events look like (it's certainly not a simple exponential!). Also, that false positive rate, while sounding stringent, is quite permissive: if you image at 60Hz, with a 1% false positive rate, you will detect false spikes at a rate of 0.6Hz; if your cell fires at 0.5Hz, as their example cells do, that's a problem.
Having said that, using deconvolution with GCaMP6 works well depending on the type of data you're interested in. If you are recording from low frequency neurons where each action potential can cause spiking in downstream neurons, missing half the spikes is important. But if your neuron fires has a high baseline firing rate, and individual spikes aren't too important, the combination works well.