Loading ...
Sorry, an error occurred while loading the content.

[marf-dev] [Help] RE: Pronounciation detection

Expand Messages
  • SourceForge.net
    The following forum message was posted by shantrip at http://sourceforge.net/projects/marf/forums/forum/213052/topic/3782565: Hello Serguei Thanks for your
    Message 1 of 4 , Jul 28, 2010
    • 0 Attachment
      The following forum message was posted by shantrip at http://sourceforge.net/projects/marf/forums/forum/213052/topic/3782565:

      Hello Serguei

      Thanks for your reply and for providing definite direction to proceed.
      I will have to study the paper you provided and study some more before i can come up with specific project related questions ,..still had few doubts so decided to ask.

      >>[quote]you just have to change categories from speakers to words [/quote]
      Here do you mean change in just the \" speakers.txt \" or some change in code is required for changing the category??

      Also,in SpeakerIdentApp matches were found based on sound features of speaker and was independent of what was being said and how.Whereas in the pronunciation system,if we substitute \"words\" in place of \"speakers\" in the dictionary, wouldn\'t it become specific for the speaker of the words ??
      Yes ,I agree some experimentation is required to find right combination of algorithms ,though for comparison I think distance based algos would be better for the required accuracy .Here the problem is if we are too accurate it becomes a speaker or word identification software and if we are too fuzzy it wouldn\'t just check for the right pronunciation but any thing remotely near it ,guess some optimization would be required.
      Once again thanks for providing some lead ,do reply...
      bye and tc


      ------------------------------------------------------------------------------
      The Palm PDK Hot Apps Program offers developers who use the
      Plug-In Development Kit to bring their C/C++ apps to Palm for a share
      of $1 Million in cash or HP Products. Visit us here for more details:
      http://p.sf.net/sfu/dev2dev-palm
      _______________________________________________
      marf-devel mailing list
      marf-devel@...
      https://lists.sourceforge.net/lists/listinfo/marf-devel
    • SourceForge.net
      The following forum message was posted by mokhov at http://sourceforge.net/projects/marf/forums/forum/213052/topic/3782565: [quote][quote]you just have to
      Message 2 of 4 , Aug 1, 2010
      • 0 Attachment
        The following forum message was posted by mokhov at http://sourceforge.net/projects/marf/forums/forum/213052/topic/3782565:

        [quote][quote]you just have to change categories from speakers to words[/quote]
        Here do you mean change in just the \" speakers.txt \" or some change in code is required for changing the category??
        [/quote]

        No code changes are required, just speakers.txt. You list one word per line, give it a numeric ID, and all the recorded files that associate to it. The format is described in the manual, but it is essentially a CSV file, and the filenames for training and testing are separated by the vertical bar \"|\".

        Now if you also want the output to say \"Word\" instead of \"Speaker\", etc. or if the filename \"speakers.txt\" is too confusing, you\'d need to change the code to fix the output messages, but this is a cosmetic change.

        [quote]Also,in SpeakerIdentApp matches were found based on sound features of speaker and was independent of what was being said and how.Whereas in the pronunciation system,if we substitute \"words\" in place of \"speakers\" in the dictionary, wouldn\'t it become specific for the speaker of the words ??[/quote]

        No, the features extracted would be per categories that you pick. They are clustered (groupped) as such, so the training sets (your problem models) would be different from that of about speakers themselves (basically that means you cannot re-use my .gzbin files, which correspond to the speaker clusters -- you will have to train your own based on your own dictionary, independent of the speaker). The best algorithm combination(s) you\'ll find (that give you the highest accuracy) will likely to be also different from that of speaker identification.

        [quote]Yes ,I agree some experimentation is required to find right combination of algorithms ,though for comparison I think distance based algos would be better for the required accuracy .Here the problem is if we are too accurate it becomes a speaker or word identification software and if we are too fuzzy it wouldn\'t just check for the right pronunciation but any thing remotely near it ,guess some optimization would be required. [/quote]

        In general, I applied the MARF\'s pipeline in my recent works not only to audio, but for forensic file type analysis, natural language tasks (text analysis), writer recognition (handwriting attribution of scanned documents in images), and others. There are corresponding publications for all of those. I guess it\'d make sense at last to maintain the web page and list them there... Anyways. the approach is the same for all, but the selection of algorithms is different depending on the task at hand, and so is the dictionary of categories, etc. I was caught by surprise sometimes as to which combinations were the best for the task, so the experiments are definitively required.

        In your particular case my feeling (supported by the gender classification task in that article) is that you\'d probably get better accuracy if you will group word pronunciations by different genders/age groups at the low level of .txt and then in the application match it up to one word category later. This wait I hypothesize it\'d be more robust and gender/age group independent. Specifically, e.g. in the .txt file you have

        ...
        1,Hello_adult_female,...
        2,Hello_adult_male,...
        3,Hello_child,...
        ...

        and then in the application you match up IDs 1, 2, and 3 all into \"Hello\". It may turn out to be this step is not necessary, but I can\'t tell offhand for sure. It requires experiments for both with and without such separation.

        It may also be required that a better algorithm can be implemented to do a better job than the existing ones.

        I myself if I ever get time, I\'ll make something like that up as a demo app for MARF, but don\'t hold your breath.

        On the other hand, if you require any adjustments to MARF to make your life easier, let me know. If you come up with something new and you would like to share to contribute it to the project, you are most welcome to. :)

        Let me know how it goes.

        -s


        ------------------------------------------------------------------------------
        The Palm PDK Hot Apps Program offers developers who use the
        Plug-In Development Kit to bring their C/C++ apps to Palm for a share
        of $1 Million in cash or HP Products. Visit us here for more details:
        http://p.sf.net/sfu/dev2dev-palm
        _______________________________________________
        marf-devel mailing list
        marf-devel@...
        https://lists.sourceforge.net/lists/listinfo/marf-devel
      Your message has been successfully submitted and would be delivered to recipients shortly.