speech recognition - Spectrograms generated using Librosa don't look consistent with Kaldi? -


i generated spectrogram of "seven" utterance using "egs/tidigits" code kaldi, using 23 bins, 20khz sampling rate, 25ms window, , 10ms shift. spectrogram appears below visualized via matlab imagesc function:

kaldi "seven" spectrogram

i experimenting using librosa alternative kaldi. set code below using same number of bins, sampling rate, , window length / shift above.

time_series, sample_rate = librosa.core.load("7a.wav",sr=20000) spectrogram = librosa.feature.melspectrogram(time_series, sr=20000, n_mels=23, n_fft=500, hop_length=200) log_s = librosa.core.logamplitude(spectrogram) np.savetxt("7a.txt", log_s.t) 

however when visualize resulting librosa spectrogram of same wav file looks different:

librosa "seven" spectrogram

can please me understand why these different? across other wav files i've tried notice librosa script above, fricatives (like /s/ in "seven" in above example) being cutoff , affecting digit classification accuracy. thank you!

kaldi applies lifter default on dct output, thats why upper coefficients attenuated. see details here.


Comments