xml - Advanced boolean search of JSON files containing speech-to-text data? -


i have hundreds of automatic machine transcripts of video , audio files. have every transcript in 5 formats: json, xml, srt, vtt, txt. (click here see example files.) json , xml files contain comprehensive data, including speaker id, confidence level, , timecodes.

i looking way mine or search data find words , phrases. need able submit boolean search query, click result , play video/audio file @ timecode of text result. necessary boolean operators not, and, or (just online search engine). example search: ("baseball bat" , park) or soccer

i'm thinking of simple interface.

basic options:

  • search box
  • minimum confidence level slider

ideas advanced options:

  • speaker: "bob,joe,bill" (that is, speaker must 1 of these)
  • maximum time allowed between words in , search: x.x seconds
  • maximum time allowed between words in exact phrase search: x.x seconds
  • words in exact phrase search must have same speaker: on/off
  • words between , must have same speaker: on/off
  • words between or must have same speaker: on/off
  • words between , must found within chronological order: on/off
  • ignore punctuation: on/off

simply put, need agent ransack timecodes and, if possible, miscellaneous options. i know specific , complex request. :) can give me leads on idea? don't want reinvent wheel. software/command line program/engine comes closest being able this? perhaps can adapt there.

thanks!

you can implement such system on top of solr/lucene http://lucene.apache.org/solr, however, need more experience implement required features.

for open source implementation of speech archival , indexing can check matterhorn

you can find details on matterhorn speech indexing in presentation

however, not way implement such functionality, can proceed language of choice , simple tools. ruby/php or node.js work here.


Comments

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

[C++][SFML 2.2] Strange Performance Issues - Moving Mouse Lowers CPU Usage -

ios - Possible to get UIButton sizeThatFits to work? -