Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The README needs a hello_world example #275

Open
petercwallis opened this issue Aug 8, 2022 · 17 comments
Open

The README needs a hello_world example #275

petercwallis opened this issue Aug 8, 2022 · 17 comments

Comments

@petercwallis
Copy link

The instructions at:
https://cmusphinx.github.io/wiki/tutorialpocketsphinx
are now past their use by date. the README.md is fine on linux but for those of us who know what a lib file is could we have a hello_ps.c please. Using c again reminds me why we all switched to java back in the dark ages...

@dhdaines
Copy link
Contributor

dhdaines commented Aug 9, 2022

Hi! Thanks for pointing this out, as pocketsphinx_continuous.exe is, quite clearly, gone. And it was never useful for building applications in the first place.

I will fix this documentation as soon as possible, for now it will just be removed to avoid further confusion :)

@dhdaines
Copy link
Contributor

dhdaines commented Aug 9, 2022

In any case the preferred way to use the library will be through Python. Java is just as bad as C in my opinion ;-)

@petercwallis
Copy link
Author

petercwallis commented Aug 9, 2022 via email

@dhdaines
Copy link
Contributor

dhdaines commented Aug 9, 2022

Ah, good to know. The C API certainly won't go away... my plan is to integrate the WebRTC VAD code since it's the standard and its licence is compatible. The problem is that pocketsphinx_continuous existed as example code which never really worked well, but worked enough that people tried to build things with it, and then instead of doing live ASR correctly, it was decided to just keep hacking on the existing toy code.

Coqui has a lot of good examples of, in my opinion, the right way to do streaming ASR: https://github.com/coqui-ai/STT-examples.

For Java is it preferable to use SWIG or just JNI directly? I removed the SWIG code because with SWIG it was too difficult to make a good Python API, and other languages like Ruby weren't actually using it. Originally the SWIG wrapper was just there to support Java on Android. I certainly won't support anything Java as I'm already spending too much of my time on PocketSphinx which I consider to be obsolete in general...

Another long-standing problem is that the API isn't really designed correctly for callbacks. This is one of the reasons why I removed the audio code, as it was based around the thoroughly obsolete assumption that one gets audio by opening /dev/audio and doing blocking read() calls on it.

@petercwallis
Copy link
Author

petercwallis commented Aug 9, 2022 via email

@dhdaines
Copy link
Contributor

dhdaines commented Aug 9, 2022

Actually now that I think of it the preferred option for the microphone on Unix and possibly also Windows is just be to popen() sox, as it is nearly always there, usually works, and can do various other things too.

@jsalsman
Copy link
Contributor

jsalsman commented Oct 11, 2022 via email

@smbika007
Copy link

Vis a vis instructions, an example on how to run pocketsphinx.exe in "live" mode (presumably a microphone though I have no idea why the word microphone doesn't seem to appear anywhere in the code or documentation) would be useful including the command line parameters necessary to specify the lm and hmm...

The huge number of command line switches are rather daunting too. The bare minimum (language model and ancillary files) would be helpful.

Thanks

@dhdaines
Copy link
Contributor

Most of the command line switches are not useful to you, and I think this is mentioned in the documentation, but I will mention it quite a lot louder :-)

Microphone input is not an easy thing, and a lot of trouble came from giving people the impression that it was. The Python module makes everything quite simple in any case:

from pocketsphinx import LiveSpeech
for phrase in LiveSpeech():
    print(phrase)

@dhdaines
Copy link
Contributor

And as mentioned in the other issue, ask yourself the question: do I really want a command-line executable written in C that does live speech recognition from a microphone, on Windows?

Please let me know if this is actually a useful thing. I suspect it isn't.

@smbika007
Copy link

smbika007 commented Oct 20, 2022

And as mentioned in the other issue, ask yourself the question: do I really want a command-line executable written in C that does live speech recognition from a microphone, on Windows?

Please let me know if this is actually a useful thing. I suspect it isn't.

Well, you already know my opinion :-) although i might be the only one on the planet who does...LOL

Cheers!

@dhdaines
Copy link
Contributor

Actually you're not the only one! But what you need, if I'm not mistaken, is what pocketsphinx_continuous was originally intended to be: example code which you can incorporate into your application.

It seems that sox doesn't do microphone input on Windows, either. PortAudio is a pretty good solution, and actually quite simple to implement. You can either include portaudio_static.lib and portaudio.h into your project directly, or you can "install" it somewhere, then set the CMAKE_PREFIX_PATH environment variable for your CMake build to point to that location. I'll publish some instructions on https://cmusphinx.github.io/ shortly. Perhaps we can add it as a git submodule (it isn't very big)

The original ad_win32.c code can also be used - it uses the oldest and most awful of the many awful (they are all awful) Windows audio APIs but doesn't require any external dependencies. I'll put together an example of it as well.

@dhdaines
Copy link
Contributor

The example using PortAudio can be seen here: https://github.com/cmusphinx/pocketsphinx/blob/live_examples/examples/live_portaudio.c

@smbika007
Copy link

smbika007 commented Oct 20, 2022

Actually you're not the only one! But what you need, if I'm not mistaken, is what pocketsphinx_continuous was originally intended to be: example code which you can incorporate into your application.

It seems that sox doesn't do microphone input on Windows, either. PortAudio is a pretty good solution, and actually quite simple to implement. You can either include portaudio_static.lib and portaudio.h into your project directly, or you can "install" it somewhere, then set the CMAKE_PREFIX_PATH environment variable for your CMake build to point to that location. I'll publish some instructions on https://cmusphinx.github.io/ shortly. Perhaps we can add it as a git submodule (it isn't very big)

The original ad_win32.c code can also be used - it uses the oldest and most awful of the many awful (they are all awful) Windows audio APIs but doesn't require any external dependencies. I'll put together an example of it as well.

Thanks again! I will check out portaudio and see if I can use that instead. The phrase "quite simple to implement" is a very nice thing to see 👍 ! And I look forward to the example for ad_win32.c although it's probably what I use now. And, yes, pocketsphinx_continuous is where I got the guts of my code for our app...

@dhdaines
Copy link
Contributor

The ad_win32.c code actually has a number of problems and can be simplified for the new live speech API... particularly if it doesn't have to be fit into an existing framework. This is one of the reasons I removed the libsphinxad library, PortAudio or OpenAL do a better job of being a cross-platform library, so if you are targeting a particular platform it's probably better to go straight to the platform's API.

@petercwallis
Copy link
Author

petercwallis commented Oct 21, 2022 via email

@dhdaines
Copy link
Contributor

There are now examples for portaudio, pulseaudio, and Win32 wave input, see #319

I will however leave this issue open as we can always use more examples!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants