Skip to content

Commit aba5a3a

Browse files
authored
Improve tutorials (#568)
* minor improvements to tutorials * fix dataPipe tutorial * improve intro tutorial * improve experimenter name formatting
1 parent 5396243 commit aba5a3a

File tree

8 files changed

+216
-302
lines changed

8 files changed

+216
-302
lines changed

tutorials/dataPipe.m

+15-9
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@
6565
% scenario. The following code utilizes DataPipe’s default chunk size:
6666
%
6767

68-
fData=randi(250, 1000, 1000); % Create fake data
68+
fData = randi(250, 100, 1000); % Create fake data
6969

7070
% create an nwb structure with required fields
7171
nwb = NwbFile( ...
@@ -77,7 +77,9 @@
7777

7878
fdataNWB=types.core.TimeSeries( ...
7979
'data', fData_compressed, ...
80-
'data_unit', 'mV');
80+
'data_unit', 'mV', ...
81+
'starting_time', 0.0, ...
82+
'starting_time_rate', 30.0);
8183

8284
nwb.acquisition.set('data', fdataNWB);
8385

@@ -110,8 +112,8 @@
110112
% To demonstrate, we can create a nwb file with a compressed time series data:
111113
%%
112114

113-
dataPart1 = randi(250, 10000, 1); % "load" 1/4 of the entire dataset
114-
fullDataSize = [40000 1]; % this is the size of the TOTAL dataset
115+
dataPart1 = randi(250, 1, 1000); % "load" 1/4 of the entire dataset
116+
fullDataSize = [1 40000]; % this is the size of the TOTAL dataset
115117

116118
% create an nwb structure with required fields
117119
nwb=NwbFile( ...
@@ -123,12 +125,14 @@
123125
fData_use = types.untyped.DataPipe( ...
124126
'data', dataPart1, ...
125127
'maxSize', fullDataSize, ...
126-
'axis', 1);
128+
'axis', 2);
127129

128130
%Set the compressed data as a time series
129131
fdataNWB = types.core.TimeSeries( ...
130132
'data', fData_use, ...
131-
'data_unit', 'mV');
133+
'data_unit', 'mV', ...
134+
'starting_time', 0.0, ...
135+
'starting_time_rate', 30.0);
132136

133137
nwb.acquisition.set('time_series', fdataNWB);
134138

@@ -141,7 +145,7 @@
141145

142146
% "load" each of the remaining 1/4ths of the large dataset
143147
for i = 2:4 % iterating through parts of data
144-
dataPart_i=randi(250, 10000, 1); % faked data chunk as if it was loaded
148+
dataPart_i=randi(250, 1, 10000); % faked data chunk as if it was loaded
145149
nwb.acquisition.get('time_series').data.append(dataPart_i); % append the loaded data
146150
end
147151
%%
@@ -155,7 +159,7 @@
155159
% Following is an example of how to compress and add a timeseries
156160
% to an NWB file:
157161

158-
fData=randi(250, 10000, 1); % create fake data;
162+
fData=randi(250, 1, 10000); % create fake data;
159163

160164
%assign data without compression
161165
nwb=NwbFile(...
@@ -178,7 +182,9 @@
178182
% Assign the data to appropriate module and write the NWB file
179183
fdataNWB=types.core.TimeSeries( ...
180184
'data', fData_compressed, ...
181-
'data_unit', 'mV');
185+
'data_unit', 'mV', ...
186+
'starting_time', 0.0, ...
187+
'starting_time_rate', 30.0);
182188

183189
ephys_module.nwbdatainterface.set('data', fdataNWB);
184190
nwb.processing.set('ephys', ephys_module);

tutorials/ecephys.mlx

-128 Bytes
Binary file not shown.

tutorials/html/dataPipe.html

+36-24
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<!--
77
This HTML was auto-generated from MATLAB code.
88
To make changes, update the MATLAB code and republish this document.
9-
--><title>Neurodata Without Borders (NWB) advanced write using DataPipe</title><meta name="generator" content="MATLAB 9.11"><link rel="schema.DC" href="http://purl.org/dc/elements/1.1/"><meta name="DC.date" content="2022-01-04"><meta name="DC.source" content="dataPipe.m"><style type="text/css">
9+
--><title>Neurodata Without Borders (NWB) advanced write using DataPipe</title><meta name="generator" content="MATLAB 9.14"><link rel="schema.DC" href="http://purl.org/dc/elements/1.1/"><meta name="DC.date" content="2024-06-12"><meta name="DC.source" content="dataPipe.m"><style type="text/css">
1010
html,body,div,span,applet,object,iframe,h1,h2,h3,h4,h5,h6,p,blockquote,pre,a,abbr,acronym,address,big,cite,code,del,dfn,em,font,img,ins,kbd,q,s,samp,small,strike,strong,tt,var,b,u,i,center,dl,dt,dd,ol,ul,li,fieldset,form,label,legend,table,caption,tbody,tfoot,thead,tr,th,td{margin:0;padding:0;border:0;outline:0;font-size:100%;vertical-align:baseline;background:transparent}body{line-height:1}ol,ul{list-style:none}blockquote,q{quotes:none}blockquote:before,blockquote:after,q:before,q:after{content:'';content:none}:focus{outine:0}ins{text-decoration:none}del{text-decoration:line-through}table{border-collapse:collapse;border-spacing:0}
1111

1212
html { min-height:100%; margin-bottom:1px; }
@@ -80,7 +80,7 @@
8080
<tr><td><em>chunkSize</em></td><td>Sets chunk size for the compression. Must be less than maxSize.</td></tr>
8181
<tr><td><em>compressionLevel</em></td><td>Level of compression ranging from 0-9 where 9 is the highest level of compression. The default is level 3.</td></tr>
8282
<tr><td><em>offset</em></td><td>Axis offset of dataset to append. May be used to overwrite data.</td></tr></table>
83-
</p><h2 id="6">Chunking</h2><p>HDF5 Datasets can be either stored in continuous or chunked mode. Continuous means that all of the data is written to one continuous block on the hard drive, and chunked means that the dataset is automatically split into chunks that are distributed across the hard drive. The user does not need to know the mode used- HDF5 handles the gathering of chunks automatically. However, it is worth understanding these chunks because they can have a big impact on space used and read and write speed. When using compression, the dataset MUST be chunked. HDF5 is not able to apply compression to continuous datasets.</p><p>If chunkSize is not explicitly specified, dataPipe will determine an appropriate chunk size. However, you can optimize the performance of the compression by manually specifying the chunk size using <i>chunkSize</i> argument.</p><p>We can demonstrate the benefit of chunking by exploring the following scenario. The following code utilizes DataPipe&#146;s default chunk size:</p><pre class="codeinput">fData=randi(250, 1000, 1000); <span class="comment">% Create fake data</span>
83+
</p><h2 id="6">Chunking</h2><p>HDF5 Datasets can be either stored in continuous or chunked mode. Continuous means that all of the data is written to one continuous block on the hard drive, and chunked means that the dataset is automatically split into chunks that are distributed across the hard drive. The user does not need to know the mode used- HDF5 handles the gathering of chunks automatically. However, it is worth understanding these chunks because they can have a big impact on space used and read and write speed. When using compression, the dataset MUST be chunked. HDF5 is not able to apply compression to continuous datasets.</p><p>If chunkSize is not explicitly specified, dataPipe will determine an appropriate chunk size. However, you can optimize the performance of the compression by manually specifying the chunk size using <i>chunkSize</i> argument.</p><p>We can demonstrate the benefit of chunking by exploring the following scenario. The following code utilizes DataPipe&#146;s default chunk size:</p><pre class="codeinput">fData = randi(250, 100, 1000); <span class="comment">% Create fake data</span>
8484

8585
<span class="comment">% create an nwb structure with required fields</span>
8686
nwb = NwbFile( <span class="keyword">...</span>
@@ -92,7 +92,9 @@
9292

9393
fdataNWB=types.core.TimeSeries( <span class="keyword">...</span>
9494
<span class="string">'data'</span>, fData_compressed, <span class="keyword">...</span>
95-
<span class="string">'data_unit'</span>, <span class="string">'mV'</span>);
95+
<span class="string">'data_unit'</span>, <span class="string">'mV'</span>, <span class="keyword">...</span>
96+
<span class="string">'starting_time'</span>, 0.0, <span class="keyword">...</span>
97+
<span class="string">'starting_time_rate'</span>, 30.0);
9698

9799
nwb.acquisition.set(<span class="string">'data'</span>, fdataNWB);
98100

@@ -101,8 +103,8 @@
101103
<span class="string">'data'</span>, fData, <span class="keyword">...</span>
102104
<span class="string">'chunkSize'</span>, [1, 1000], <span class="keyword">...</span>
103105
<span class="string">'axis'</span>, 1);
104-
</pre><p>This change results in the operation completing in 0.7 seconds and resulting file size of 1.1MB. The chunk size was chosen such that it spans each individual row of the matrix.</p><p>Use the combination of arugments that fit your need. When dealing with large datasets, you may want to use iterative write to ensure that you stay within the bounds of your system memory and use chunking and compression to optimize storage, read and write of the data.</p><h2 id="9">Iterative Writing</h2><p>If experimental data is close to, or exceeds the available system memory, performance issues may arise. To combat this effect of large data, <tt>DataPipe</tt> can utilize iterative writing, where only a portion of the data is first compressed and saved, and then additional portions are appended.</p><p>To demonstrate, we can create a nwb file with a compressed time series data:</p><pre class="codeinput">dataPart1 = randi(250, 10000, 1); <span class="comment">% "load" 1/4 of the entire dataset</span>
105-
fullDataSize = [40000 1]; <span class="comment">% this is the size of the TOTAL dataset</span>
106+
</pre><p>This change results in the operation completing in 0.7 seconds and resulting file size of 1.1MB. The chunk size was chosen such that it spans each individual row of the matrix.</p><p>Use the combination of arugments that fit your need. When dealing with large datasets, you may want to use iterative write to ensure that you stay within the bounds of your system memory and use chunking and compression to optimize storage, read and write of the data.</p><h2 id="9">Iterative Writing</h2><p>If experimental data is close to, or exceeds the available system memory, performance issues may arise. To combat this effect of large data, <tt>DataPipe</tt> can utilize iterative writing, where only a portion of the data is first compressed and saved, and then additional portions are appended.</p><p>To demonstrate, we can create a nwb file with a compressed time series data:</p><pre class="codeinput">dataPart1 = randi(250, 1, 1000); <span class="comment">% "load" 1/4 of the entire dataset</span>
107+
fullDataSize = [1 40000]; <span class="comment">% this is the size of the TOTAL dataset</span>
106108

107109
<span class="comment">% create an nwb structure with required fields</span>
108110
nwb=NwbFile( <span class="keyword">...</span>
@@ -114,24 +116,26 @@
114116
fData_use = types.untyped.DataPipe( <span class="keyword">...</span>
115117
<span class="string">'data'</span>, dataPart1, <span class="keyword">...</span>
116118
<span class="string">'maxSize'</span>, fullDataSize, <span class="keyword">...</span>
117-
<span class="string">'axis'</span>, 1);
119+
<span class="string">'axis'</span>, 2);
118120

119121
<span class="comment">%Set the compressed data as a time series</span>
120122
fdataNWB = types.core.TimeSeries( <span class="keyword">...</span>
121123
<span class="string">'data'</span>, fData_use, <span class="keyword">...</span>
122-
<span class="string">'data_unit'</span>, <span class="string">'mV'</span>);
124+
<span class="string">'data_unit'</span>, <span class="string">'mV'</span>, <span class="keyword">...</span>
125+
<span class="string">'starting_time'</span>, 0.0, <span class="keyword">...</span>
126+
<span class="string">'starting_time_rate'</span>, 30.0);
123127

124128
nwb.acquisition.set(<span class="string">'time_series'</span>, fdataNWB);
125129

126130
nwbExport(nwb, <span class="string">'DataPipeTutorial_iterate.nwb'</span>);
127-
</pre><p>To append the rest of the data, simply load the NWB file and use the append method:</p><pre class="codeinput">nwb = nwbRead(<span class="string">'DataPipeTutorial_iterate.nwb'</span>); <span class="comment">%load the nwb file with partial data</span>
131+
</pre><p>To append the rest of the data, simply load the NWB file and use the append method:</p><pre class="codeinput">nwb = nwbRead(<span class="string">'DataPipeTutorial_iterate.nwb'</span>, <span class="string">'ignorecache'</span>); <span class="comment">%load the nwb file with partial data</span>
128132

129133
<span class="comment">% "load" each of the remaining 1/4ths of the large dataset</span>
130134
<span class="keyword">for</span> i = 2:4 <span class="comment">% iterating through parts of data</span>
131-
dataPart_i=randi(250, 10000, 1); <span class="comment">% faked data chunk as if it was loaded</span>
135+
dataPart_i=randi(250, 1, 10000); <span class="comment">% faked data chunk as if it was loaded</span>
132136
nwb.acquisition.get(<span class="string">'time_series'</span>).data.append(dataPart_i); <span class="comment">% append the loaded data</span>
133137
<span class="keyword">end</span>
134-
</pre><p>The axis property defines the dimension in which additional data will be appended. In the above example, the resulting dataset will be 4000x1. However, if we set axis to 2 (and change fullDataSize appropriately), then the resulting dataset will be 1000x4.</p><h2 id="13">Timeseries example</h2><p>Following is an example of how to compress and add a timeseries to an NWB file:</p><pre class="codeinput">fData=randi(250, 10000, 1); <span class="comment">% create fake data;</span>
138+
</pre><p>The axis property defines the dimension in which additional data will be appended. In the above example, the resulting dataset will be 4000x1. However, if we set axis to 2 (and change fullDataSize appropriately), then the resulting dataset will be 1000x4.</p><h2 id="13">Timeseries example</h2><p>Following is an example of how to compress and add a timeseries to an NWB file:</p><pre class="codeinput">fData=randi(250, 1, 10000); <span class="comment">% create fake data;</span>
135139

136140
<span class="comment">%assign data without compression</span>
137141
nwb=NwbFile(<span class="keyword">...</span>
@@ -154,14 +158,16 @@
154158
<span class="comment">% Assign the data to appropriate module and write the NWB file</span>
155159
fdataNWB=types.core.TimeSeries( <span class="keyword">...</span>
156160
<span class="string">'data'</span>, fData_compressed, <span class="keyword">...</span>
157-
<span class="string">'data_unit'</span>, <span class="string">'mV'</span>);
161+
<span class="string">'data_unit'</span>, <span class="string">'mV'</span>, <span class="keyword">...</span>
162+
<span class="string">'starting_time'</span>, 0.0, <span class="keyword">...</span>
163+
<span class="string">'starting_time_rate'</span>, 30.0);
158164

159165
ephys_module.nwbdatainterface.set(<span class="string">'data'</span>, fdataNWB);
160166
nwb.processing.set(<span class="string">'ephys'</span>, ephys_module);
161167

162-
<span class="comment">%write the file</span>
168+
<span class="comment">% write the file</span>
163169
nwbExport(nwb, <span class="string">'Compressed.nwb'</span>);
164-
</pre><p class="footer"><br><a href="https://www.mathworks.com/products/matlab/">Published with MATLAB&reg; R2021b</a><br></p></div><!--
170+
</pre><p class="footer"><br><a href="https://www.mathworks.com/products/matlab/">Published with MATLAB&reg; R2023a</a><br></p></div><!--
165171
##### SOURCE BEGIN #####
166172
%% Neurodata Without Borders (NWB) advanced write using DataPipe
167173
% How to utilize HDF5 compression using dataPipe
@@ -230,7 +236,7 @@
230236
% scenario. The following code utilizes DataPipe’s default chunk size:
231237
%
232238
233-
fData=randi(250, 1000, 1000); % Create fake data
239+
fData = randi(250, 100, 1000); % Create fake data
234240
235241
% create an nwb structure with required fields
236242
nwb = NwbFile( ...
@@ -242,7 +248,9 @@
242248
243249
fdataNWB=types.core.TimeSeries( ...
244250
'data', fData_compressed, ...
245-
'data_unit', 'mV');
251+
'data_unit', 'mV', ...
252+
'starting_time', 0.0, ...
253+
'starting_time_rate', 30.0);
246254
247255
nwb.acquisition.set('data', fdataNWB);
248256
@@ -275,8 +283,8 @@
275283
% To demonstrate, we can create a nwb file with a compressed time series data:
276284
%%
277285
278-
dataPart1 = randi(250, 10000, 1); % "load" 1/4 of the entire dataset
279-
fullDataSize = [40000 1]; % this is the size of the TOTAL dataset
286+
dataPart1 = randi(250, 1, 1000); % "load" 1/4 of the entire dataset
287+
fullDataSize = [1 40000]; % this is the size of the TOTAL dataset
280288
281289
% create an nwb structure with required fields
282290
nwb=NwbFile( ...
@@ -288,12 +296,14 @@
288296
fData_use = types.untyped.DataPipe( ...
289297
'data', dataPart1, ...
290298
'maxSize', fullDataSize, ...
291-
'axis', 1);
299+
'axis', 2);
292300
293301
%Set the compressed data as a time series
294302
fdataNWB = types.core.TimeSeries( ...
295303
'data', fData_use, ...
296-
'data_unit', 'mV');
304+
'data_unit', 'mV', ...
305+
'starting_time', 0.0, ...
306+
'starting_time_rate', 30.0);
297307
298308
nwb.acquisition.set('time_series', fdataNWB);
299309
@@ -302,11 +312,11 @@
302312
% To append the rest of the data, simply load the NWB file and use the
303313
% append method:
304314
305-
nwb = nwbRead('DataPipeTutorial_iterate.nwb'); %load the nwb file with partial data
315+
nwb = nwbRead('DataPipeTutorial_iterate.nwb', 'ignorecache'); %load the nwb file with partial data
306316
307317
% "load" each of the remaining 1/4ths of the large dataset
308318
for i = 2:4 % iterating through parts of data
309-
dataPart_i=randi(250, 10000, 1); % faked data chunk as if it was loaded
319+
dataPart_i=randi(250, 1, 10000); % faked data chunk as if it was loaded
310320
nwb.acquisition.get('time_series').data.append(dataPart_i); % append the loaded data
311321
end
312322
%%
@@ -320,7 +330,7 @@
320330
% Following is an example of how to compress and add a timeseries
321331
% to an NWB file:
322332
323-
fData=randi(250, 10000, 1); % create fake data;
333+
fData=randi(250, 1, 10000); % create fake data;
324334
325335
%assign data without compression
326336
nwb=NwbFile(...
@@ -343,12 +353,14 @@
343353
% Assign the data to appropriate module and write the NWB file
344354
fdataNWB=types.core.TimeSeries( ...
345355
'data', fData_compressed, ...
346-
'data_unit', 'mV');
356+
'data_unit', 'mV', ...
357+
'starting_time', 0.0, ...
358+
'starting_time_rate', 30.0);
347359
348360
ephys_module.nwbdatainterface.set('data', fdataNWB);
349361
nwb.processing.set('ephys', ephys_module);
350362
351-
%write the file
363+
% write the file
352364
nwbExport(nwb, 'Compressed.nwb');
353365
##### SOURCE END #####
354366
--></body></html>

0 commit comments

Comments
 (0)