Skip to content

transcribe_streaming speech example does not respond after 10-30secs #530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sngc1 opened this issue Sep 16, 2016 · 14 comments
Closed

transcribe_streaming speech example does not respond after 10-30secs #530

sngc1 opened this issue Sep 16, 2016 · 14 comments
Labels

Comments

@sngc1
Copy link

sngc1 commented Sep 16, 2016

After a few successful streaming recognition, transcribe_streaming.py does not respond without any message, in many cases it hangs and requires to kill pyhton. Not quite clear why, but it seems that it occurs after 1-10 recognition or 10-30 secs.

Are there any way to investigate what is going on, e.g. to show logs/messages? Not clear it is because this sample code, or GCP speech service.

@puneith puneith added the ML label Sep 16, 2016
@bprasana
Copy link

I am facing a similar issue too. I tried chanings the DEADLINE_SECS, hoping to get better results. But it just freezes.

@puneith
Copy link
Contributor

puneith commented Sep 17, 2016

@sngc1 There has been similar issue reported #515 whose PR is in flight. Can you check in the mean time #527 if this works for you?

@jerjou
Copy link
Contributor

jerjou commented Sep 19, 2016

I just pushed a not-insignificant update to the streaming sample. Would you mind trying that, and seeing if that helps things? (it's not necessarily related, but might be)

@sngc1
Copy link
Author

sngc1 commented Sep 20, 2016

@jerjou As a quick answer, the commit 5fca324 still hangs after a few recognitions (or a few 10 seconds?) and needs to kill python. (on MacOS + built-in Microphone, Japanese recognition)

@puneith Thanks, the analysis in #515 makes sense, I will investigate different RATE/CHUNK settings.

BTW, I am now working on Japanese recognition"language_code='ja-JP', which looks too slow to get result with 'is_final: true', comparing to "language_code='en-US' (much faster than JP recognition & seems works fine). Maybe another issue, but is it relevant?

@jerjou
Copy link
Contributor

jerjou commented Sep 20, 2016

Re: japanese vs english

Hm... I took a clean repo, modified the language to be ja-JP, and ran it, and it seemed (subjectively) to perform comparably to english recognition, although I admit I only know a handful of short japanese phrases, so perhaps that skews my test. After setting interim_results to True, I was also able to see interim transcriptions at about the same rate I'm used to seeing them in english..

Hm.. some things for you to try:

  • Try using an external microphone - even the microphone on a webcam will usually capture better audio than the built-in microphone
  • Have you taken a look at the best practices page? See if any of that is relevant to your setup.

Re: hanging after a few recognitions

  • How much time has elapsed since you started your request? The API currently limits streaming recognition to 1 minute, so it will stop transcribing after a minute (grpc streaming example crashes or hangs due to deadline seconds #517).
  • Another possibility is - I see you've modified the sample code to add interim_results=True. Did you also set single_utterance=True? In that case it will stop transcribing when it detects the end of an utterance.

Miscellany

I added SIGINT handling to the sample code, so you should be able to use Ctrl-C to stop the script instead of having to kill it (though I've been having a bit of trouble handling if an unexpected exception is thrown..)

@sngc1
Copy link
Author

sngc1 commented Sep 22, 2016

By using an external (USB) microphone, the sample works better, while still stop responding sometimes.
Followed the best practices; checked that my microphone works fine under the Pyaudio settings in transcribe_streaming.py (tested in a simple Pyaudio script without GCP speech API); confirmed that single_utterance=True is not set.

One thing I have found is, when language_code=en-US it works fine, while in ja-JP, it sometimes stops recognition (response never returned) after a couple of recognitions. When it happens is not clear.
DEADLINE_SECS does not affect this issue, the transcribe always finishes after 60 seconds with Client GRPC deadline too short message & I believe it is proper behavior. The issue above happens before this message.
Possibly CGP side issue in ja-JP???

BTW is audio_stream thread-safe? Many times I still cannot shutdown the transcribe: I suspect the audio_stream object shared by fill_buffer_thread and main thread prevents shutting down when called audio_stream.stop_stream() while buff.put(audio_stream.read(chunk)) still ongoing?

@thecloudist
Copy link

I have working code now using the transcribe_streaming_thread.py version which was shared.
Should I be fetching and syncing with the latest repo updates to python-docs-samples?
That python file does not exist and I also have made some changes to 'listen_print_loop'. I now call mine listen_transcribe_loop.

I have not tried the OOB version of transcribe_streaming.py again to see if I cleared something else after you provided the threaded version.

By the way, it was a little tricky to convert transcription strings into dictionaries which I pass to my DriveTo method that moves the robo-car according to voice commands. But it does work.

The main puzzle to solve now is how to expand my overall control code without having to put everything inside the listen_transcribe_loop.

How do I call the transcribe loop without getting stuck there?

Suggestions?

@puneith
Copy link
Contributor

puneith commented Sep 22, 2016

@thecloudist I would recommend you get the updated sample code since it has final fixes.

re: How do I call the transcribe loop without getting stuck there?
If I am assuming correctly on what you mean here, you need to either put the transcribed response in a queue or create a stream where the output is written and read from it.

@thecloudist
Copy link

The queue idea is good but the way the transcribe_print_loop() was written, I cannot exit from that method until a terminal condition is reached such as 'quit' etc.

I am going to have to break that method apart so I can just ask for transcription when my main code loop is ready to ask for it rather than having it looping all the time. The app is kind of an asynchronous command and control app for a robot car.

Thanks!

@sngc1
Copy link
Author

sngc1 commented Oct 7, 2016

Regarding the recognition hanging issue, after dozens of trials, language_code=ja-JP recognition is still unstable: on average the recognition hangs after 3 utterances. Confirmed it is BEFORE the deadline 60 secs.

Due to running out of time to investigate, my workaround is to use single_utterance=True; continuously terminate & restart recognition for each utterance.

@jerjou
Copy link
Contributor

jerjou commented Oct 10, 2016

Hello!

Sorry for the long delay - the backend folks have confirmed that this is probably an issue on their end that they're actively working on. If you wouldn't mind providing a link to a sample of audio that stops transcribing after awhile, it'd help them in fixing the issue.

Thanks!

@theacodes
Copy link
Contributor

(Closing due to inactivity, feel free to comment and we'll re-open)

@khaledAX
Copy link

khaledAX commented Sep 25, 2018

i'm facing this issue today severely. The code isn't new, runs everyday for couple months. But today it's failing; not for all the files, rather some random, even smaller files.
Sample audio attached, zipped.
speech-recognition-freeze.zip

Please re-open this issue and check from your side.
ping for @theacodes, @jerjou

Application version

$ pip freeze | grep google
gapic-google-cloud-speech-v1==0.15.3
google-api-python-client==1.6.4
google-auth==1.1.1
google-auth-httplib2==0.0.2
google-cloud-core==0.27.1
google-cloud-speech==0.29.0
google-gax==0.15.15
googleapis-common-protos==1.5.2
proto-google-cloud-speech-v1==0.15.3

Executing code segment

def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
                                     sample_rate_hertz=8000,
                                     language_code='en-US',
                                     )

    operation = client.long_running_recognize(config, audio)

    print('Waiting for operation to complete...')
    response = operation.result(timeout=300)

    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    transcript = ''

    for result in response.results:
        if result.alternatives and result.alternatives[0] and result.alternatives[0].transcript:
            # The first alternative is the most likely one for this portion.
            transcript = transcript + result.alternatives[0].transcript
            # print('Confidence: {}'.format(result.alternatives[0].confidence))
            # print(u'Transcript: {}'.format(result.alternatives[0].transcript))
    return transcript

Stacktrace after CTRL+C:

  "selfLink": "https://www.googleapis.com/storage/v1/b/xyz_4819894648180946073.mp4.flac",
  "size": "10130"
}
gs://4bfee283169b-ax-aud-transcription/temp/1537464300950-1537464296043_4819894648180946073.mp4.flac
Transcribing file ...
Waiting for operation to complete...
\

Traceback (most recent call last):
  File "./app.py", line 387, in <module>
    schedule.run_pending()
  File "/usr/local/lib/python2.7/dist-packages/schedule/__init__.py", line 493, in run_pending
    default_scheduler.run_pending()
  File "/usr/local/lib/python2.7/dist-packages/schedule/__init__.py", line 78, in run_pending
    self._run_job(job)
  File "/usr/local/lib/python2.7/dist-packages/schedule/__init__.py", line 131, in _run_job
    ret = job.run()
  File "/usr/local/lib/python2.7/dist-packages/schedule/__init__.py", line 411, in run
    ret = self.job_func()
  File "./app.py", line 267, in start_service
    trans_txt = upload_and_transcribe(bucket=GCP_EOD_BUCKET, filename=flac_file_name)
  File "/home/ubuntu/webservice/google_transcribe.py", line 169, in upload_and_transcribe
    transcribed = transcribe_gcs(filepath)
  File "/home/ubuntu/webservice/google_transcribe.py", line 87, in transcribe_gcs
    response = operation.result(timeout=300)
  File "/usr/local/lib/python2.7/dist-packages/google/gax/__init__.py", line 595, in result
    if not self._poll(timeout).HasField('response'):
  File "/usr/local/lib/python2.7/dist-packages/google/gax/__init__.py", line 705, in _poll
    return retryable_done_check()
  File "/usr/local/lib/python2.7/dist-packages/google/gax/retry.py", line 121, in inner
    return to_call(*args)
  File "/usr/local/lib/python2.7/dist-packages/google/gax/retry.py", line 68, in inner
    return a_func(*updated_args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/google/gax/__init__.py", line 669, in _done_check
    if not self.done():
  File "/usr/local/lib/python2.7/dist-packages/google/gax/__init__.py", line 620, in done
    return self._get_operation().done
  File "/usr/local/lib/python2.7/dist-packages/google/gax/__init__.py", line 662, in _get_operation
    self._operation.name, self._call_options)
  File "/usr/local/lib/python2.7/dist-packages/google/gapic/longrunning/operations_client.py", line 213, in get_operation
    return self._get_operation(request, options)
  File "/usr/local/lib/python2.7/dist-packages/google/gax/api_callable.py", line 452, in inner
    return api_caller(api_call, this_settings, request)
  File "/usr/local/lib/python2.7/dist-packages/google/gax/api_callable.py", line 438, in base_caller
    return api_call(*args)
  File "/usr/local/lib/python2.7/dist-packages/google/gax/api_callable.py", line 376, in inner
    return a_func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/google/gax/retry.py", line 121, in inner
    return to_call(*args)
  File "/usr/local/lib/python2.7/dist-packages/google/gax/retry.py", line 68, in inner
    return a_func(*updated_args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line 506, in __call__
    credentials)
  File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line 500, in _blocking
    _handle_event(completion_queue.poll(), state,
  File "src/python/grpcio/grpc/_cython/_cygrpc/completion_queue.pyx.pxi", line 121, in grpc._cython.cygrpc.CompletionQueue.poll (src/python/grpcio/grpc/_cython/cygrpc.c:10611)
KeyboardInterrupt

@khaledAX
Copy link

khaledAX commented Sep 25, 2018

For me, it now completed once.
looks like any specific file wasn't responsible. Rather the service output was unusually slow and unpredictable.

gcp-speech-latency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants