c++ - How to find silent parts in audio track -
i have following code stores raw audio data wav file in byte buffer:
byte header[74]; fread(&header, sizeof(byte), 74, inputfile); byte * sound_buffer; dword data_size; fread(&data_size, sizeof(dword), 1, inputfile); sound_buffer = (byte *)malloc(sizeof(byte) * data_size); fread(sound_buffer, sizeof(byte), data_size, inputfile);
is there algorithm determine when audio track silent (literally no sound) , when there sound level?
well, "sound" array of values, whether integer or real - depends on format.
for file silent or "have no sound" values in array have zero, or close zero, or worst case scenario - if audio has bias - value stay same instead of fluctuating around produce sound waves.
you can write simple function returns delta range, in other words difference between largest , smallest value, lower delta lower sound volume.
or alternatively, can write function returns ranges in delta lower given threshold.
for sake of toying, wrote nifty class:
template<typename t> class silencefinder { public: silencefinder(t * data, uint size, uint samples) : sbegin(0), d(data), s(size), samp(samples), status(undefined) {} std::vector<std::pair<uint, uint>> find(const t threshold, const uint window) { auto r = findsilence(d, s, threshold, window); regionstotime(r); return r; } private: enum status { silent, loud, undefined }; void togglesilence(status st, uint pos, std::vector<std::pair<uint, uint>> & res) { if (st == silent) { if (status != silent) sbegin = pos; status = silent; } else { if (status == silent) res.push_back(std::pair<uint, uint>(sbegin, pos)); status = loud; } } void end(status st, uint pos, std::vector<std::pair<uint, uint>> & res) { if ((status == silent) && (st == silent)) res.push_back(std::pair<uint, uint>(sbegin, pos)); } static t delta(t * data, const uint window) { t min = std::numeric_limits<t>::max(), max = std::numeric_limits<t>::min(); (uint = 0; < window; ++i) { t c = data[i]; if (c < min) min = c; if (c > max) max = c; } return max - min; } std::vector<std::pair<uint, uint>> findsilence(t * data, const uint size, const t threshold, const uint win) { std::vector<std::pair<uint, uint>> regions; uint window = win; uint pos = 0; status s = undefined; while ((pos + window) <= size) { if (delta(data + pos, window) < threshold) s = silent; else s = loud; togglesilence(s, pos, regions); pos += window; } if (delta(data + pos, size - pos) < threshold) s = silent; else s = loud; end(s, pos, regions); return regions; } void regionstotime(std::vector<std::pair<uint, uint>> & regions) { (auto & r : regions) { r.first /= samp; r.second /= samp; } } t * d; uint sbegin, s, samp; status status; };
i haven't tested looks should work. however, assumes single audio channel, have extend in order work , across multichannel audio. here how use it:
silencefinder<audiodatatype> finder(audiodataptr, sizeofdata, samplerate); auto res = finder.find(threshold, scanwindow); // , output silent regions (auto r : res) std::cout << r.first << " " << r.second << std::endl;
also notice way implemented right now, "cut" silent regions abrupt, such "noise gate" type of filers come attack , release parameters, smooth out result. example there might 5 seconds of silence tiny pop in middle, without attack , release parameters, 5 minutes split in two, , pop remain, using can implement varying sensitivity when cut off.
Comments
Post a Comment