i have data set electrophysiological recordings in hdf5 file in form of close numpy arrays understanding , trying access in efficient , fast way.
let me explain: dataset list of arrays (2d-array?); each array contains x number of channels (recording sites), around 32-64.
the problem following: there millions of arrays , it's taking forever loop through every individual array. moreover, have loop through each channel in each array in order retrieve values.
here code:
import h5py f_kwd = h5py.file("experiment1_100.raw.kwd", "r") # reads hdf5 file dset_data = f_kwd['recordings/0/data'] print (len(dset_data)) # prints 31646700 print (dset_data[0]) # prints following [ 94 1377 208 202 246 387 1532 1003 460 665 810 638 223 363 990 78 -139 191 63 630 763 60 682 1025 472 1113 -137 360 1216 297 -71 -35 -477 -498 -541 -557 27776 2281 -11370 32767 -28849 -30243] list_value = [] t_stamp in (dset_data): value in t_stamp: if value > 400: list_value.append(value)
is there way make lot more efficient , quick? have use numpy , if so, how can make happen? feel doing wrong here.
edit : here additional info first array in dataset following attributes:
.shape -> (42,)
.itemsize -> 2
.dtype -> int16
.size -> 42
.ndim -> 1
edit2 : ..and dataset itself:
.shape -> (31646700, 42)
.dtype -> int16
.size -> 1329161400
if guess t_stamp
1d array of varying length, collect elements >400 with:
list_value = [] t_stamp in (dset_data): list_value.append(t_stamp[t_stamp>400]) # list_value.extend()
use append
if want collect values in sublists. use extend if want 1 flat list.
it still iterates on 'rows' of dset_data
, selection each row faster.
if rows 42 long, dset_data.value
2d numpy array:
dset_data[dset_data>400]
will flat array of selected values
Comments
Post a Comment