Skip to content

Commit 494b689

Browse files
committed
Add speech streaming recognition.
1 parent 3de5bc0 commit 494b689

File tree

6 files changed

+478
-18
lines changed

6 files changed

+478
-18
lines changed

docs/speech-usage.rst

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,5 +151,84 @@ words to the vocabulary of the recognizer.
151151
transcript: Hello, this is a test
152152
confidence: 0.81
153153
154+
155+
Streaming Recognition
156+
---------------------
157+
158+
The :meth:`~google.cloud.speech.Client.streaming_recognize` method converts
159+
speech data to possible text alternatives on the fly.
160+
161+
.. note::
162+
Streaming recognition requests are limited to 1 minute of audio.
163+
164+
See: https://cloud.google.com/speech/limits#content
165+
166+
.. code-block:: python
167+
168+
>>> from google.cloud import speech
169+
>>> client = speech.Client()
170+
>>> with open('./hello.wav', 'rb') as stream:
171+
... sample = client.sample(content=stream,
172+
... encoding=speech.Encoding.LINEAR16,
173+
... sample_rate=16000)
174+
... response = list(client.streaming_recognize(sample))
175+
... print(response[0].transcript)
176+
'hello'
177+
... print(response[0].confidence)
178+
0.973458576
179+
180+
181+
By default the recognizer will perform continuous recognition
182+
(continuing to process audio even if the user pauses speaking) until the client
183+
closes the output stream or when the maximum time limit has been reached.
184+
185+
If you only want to recognize a single utterance you can set
186+
``single_utterance`` to ``True`` and only one result will be returned.
187+
188+
See: `Single Utterance`_
189+
190+
.. code-block:: python
191+
192+
>>> with open('./hello_pause_goodbye.wav', 'rb') as stream:
193+
>>> sample = client.sample(content=stream,
194+
... encoding=speech.Encoding.LINEAR16,
195+
... sample_rate=16000)
196+
>>> response = client.streaming_recognize(sample,
197+
... single_utterance=True)
198+
>>> results = list(response)
199+
>>> print(results[0].transcript)
200+
hello
201+
>>> print(results[0].confidence)
202+
0.96523453546
203+
204+
205+
If ``interim_results`` is set to ``True``, interim results
206+
(tentative hypotheses) may be returned as they become available.
207+
208+
.. code-block:: python
209+
210+
>>> from google.cloud import speech
211+
>>> client = speech.Client()
212+
>>> with open('./hello.wav', 'rb') as stream:
213+
... sample = client.sample(content=stream,
214+
... encoding=speech.Encoding.LINEAR16,
215+
... sample_rate=16000)
216+
... for response in client.streaming_recognize(sample,
217+
... interim_results=True):
218+
... print('=' * 20)
219+
... print(response[0].transcript)
220+
... print(response[0].confidence)
221+
====================
222+
'he'
223+
None
224+
====================
225+
'hell'
226+
None
227+
====================
228+
'hello'
229+
0.973458576
230+
231+
232+
.. _Single Utterance: https://cloud.google.com/speech/reference/rpc/google.cloud.speech.v1beta1#streamingrecognitionconfig
154233
.. _sync_recognize: https://cloud.google.com/speech/reference/rest/v1beta1/speech/syncrecognize
155234
.. _Speech Asynchronous Recognize: https://cloud.google.com/speech/reference/rest/v1beta1/speech/asyncrecognize

speech/google/cloud/speech/_gax.py

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,95 @@ def async_recognize(self, sample, language_code=None,
101101

102102
return Operation.from_pb(response, self)
103103

104+
def streaming_recognize(self, sample, language_code=None,
105+
max_alternatives=None, profanity_filter=None,
106+
speech_context=None, single_utterance=False,
107+
interim_results=False):
108+
"""Streaming speech recognition.
109+
110+
.. note::
111+
112+
Streaming recognition requests are limited to 1 minute of audio.
113+
See: https://cloud.google.com/speech/limits#content
114+
115+
Yields :class:`~streaming_response.StreamingSpeechResponse` containing
116+
results and metadata from the streaming request.
117+
118+
:type sample: :class:`~google.cloud.speech.sample.Sample`
119+
:param sample: Instance of ``Sample`` containing audio information.
120+
121+
:type language_code: str
122+
:param language_code: (Optional) The language of the supplied audio as
123+
BCP-47 language tag. Example: ``'en-GB'``.
124+
If omitted, defaults to ``'en-US'``.
125+
126+
:type max_alternatives: int
127+
:param max_alternatives: (Optional) Maximum number of recognition
128+
hypotheses to be returned. The server may
129+
return fewer than maxAlternatives.
130+
Valid values are 0-30. A value of 0 or 1
131+
will return a maximum of 1. Defaults to 1
132+
133+
:type profanity_filter: bool
134+
:param profanity_filter: If True, the server will attempt to filter
135+
out profanities, replacing all but the
136+
initial character in each filtered word with
137+
asterisks, e.g. ``'f***'``. If False or
138+
omitted, profanities won't be filtered out.
139+
140+
:type speech_context: list
141+
:param speech_context: A list of strings (max 50) containing words and
142+
phrases "hints" so that the speech recognition
143+
is more likely to recognize them. This can be
144+
used to improve the accuracy for specific words
145+
and phrases. This can also be used to add new
146+
words to the vocabulary of the recognizer.
147+
148+
:type single_utterance: bool
149+
:param single_utterance: (Optional) If false or omitted, the recognizer
150+
will perform continuous recognition
151+
(continuing to process audio even if the user
152+
pauses speaking) until the client closes the
153+
output stream (gRPC API) or when the maximum
154+
time limit has been reached. Multiple
155+
SpeechRecognitionResults with the is_final
156+
flag set to true may be returned.
157+
If true, the recognizer will detect a single
158+
spoken utterance. When it detects that the
159+
user has paused or stopped speaking, it will
160+
return an END_OF_UTTERANCE event and cease
161+
recognition. It will return no more than one
162+
SpeechRecognitionResult with the is_final flag
163+
set to true.
164+
165+
:type interim_results: bool
166+
:param interim_results: (Optional) If true, interim results (tentative
167+
hypotheses) may be returned as they become
168+
available (these interim results are indicated
169+
with the is_final=false flag). If false or
170+
omitted, only is_final=true result(s) are
171+
returned.
172+
173+
:raises: :class:`EnvironmentError` if gRPC is not enabled and
174+
:class:`ValueError` if stream has closed.
175+
176+
:rtype: :class:`~google.cloud.grpc.speech.v1beta1\
177+
.cloud_speech_pb2.StreamingRecognizeResponse`
178+
:returns: ``StreamingRecognizeResponse`` instances.
179+
"""
180+
if sample.content.closed:
181+
raise ValueError('Stream is closed.')
182+
183+
requests = _stream_requests(sample, language_code=language_code,
184+
max_alternatives=max_alternatives,
185+
profanity_filter=profanity_filter,
186+
speech_context=speech_context,
187+
single_utterance=single_utterance,
188+
interim_results=interim_results)
189+
api = self._gapic_api
190+
responses = api.streaming_recognize(requests)
191+
return responses
192+
104193
def sync_recognize(self, sample, language_code=None, max_alternatives=None,
105194
profanity_filter=None, speech_context=None):
106195
"""Synchronous Speech Recognition.

speech/google/cloud/speech/client.py

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,91 @@ def speech_api(self):
159159
self._speech_api = _JSONSpeechAPI(self)
160160
return self._speech_api
161161

162+
def streaming_recognize(self, sample, language_code=None,
163+
max_alternatives=None, profanity_filter=None,
164+
speech_context=None, single_utterance=False,
165+
interim_results=False):
166+
"""Streaming speech recognition.
167+
168+
.. note::
169+
170+
Streaming recognition requests are limited to 1 minute of audio.
171+
See: https://cloud.google.com/speech/limits#content
172+
173+
Yields: list of :class:`~google.cloud.speech.alternative.Alternatives`
174+
containing results and metadata from the streaming request.
175+
176+
:type sample: :class:`~google.cloud.speech.sample.Sample`
177+
:param sample: Instance of ``Sample`` containing audio information.
178+
179+
:type language_code: str
180+
:param language_code: (Optional) The language of the supplied audio as
181+
BCP-47 language tag. Example: ``'en-GB'``.
182+
If omitted, defaults to ``'en-US'``.
183+
184+
:type max_alternatives: int
185+
:param max_alternatives: (Optional) Maximum number of recognition
186+
hypotheses to be returned. The server may
187+
return fewer than maxAlternatives.
188+
Valid values are 0-30. A value of 0 or 1
189+
will return a maximum of 1. Defaults to 1
190+
191+
:type profanity_filter: bool
192+
:param profanity_filter: If True, the server will attempt to filter
193+
out profanities, replacing all but the
194+
initial character in each filtered word with
195+
asterisks, e.g. ``'f***'``. If False or
196+
omitted, profanities won't be filtered out.
197+
198+
:type speech_context: list
199+
:param speech_context: A list of strings (max 50) containing words and
200+
phrases "hints" so that the speech recognition
201+
is more likely to recognize them. This can be
202+
used to improve the accuracy for specific words
203+
and phrases. This can also be used to add new
204+
words to the vocabulary of the recognizer.
205+
206+
:type single_utterance: bool
207+
:param single_utterance: (Optional) If false or omitted, the recognizer
208+
will perform continuous recognition
209+
(continuing to process audio even if the user
210+
pauses speaking) until the client closes the
211+
output stream (gRPC API) or when the maximum
212+
time limit has been reached. Multiple
213+
SpeechRecognitionResults with the is_final
214+
flag set to true may be returned.
215+
If true, the recognizer will detect a single
216+
spoken utterance. When it detects that the
217+
user has paused or stopped speaking, it will
218+
return an END_OF_UTTERANCE event and cease
219+
recognition. It will return no more than one
220+
SpeechRecognitionResult with the is_final flag
221+
set to true.
222+
223+
:type interim_results: bool
224+
:param interim_results: (Optional) If true, interim results (tentative
225+
hypotheses) may be returned as they become
226+
available (these interim results are indicated
227+
with the is_final=false flag). If false or
228+
omitted, only is_final=true result(s) are
229+
returned.
230+
"""
231+
if not self._use_gax:
232+
raise EnvironmentError('gRPC is required to use this API.')
233+
234+
responses = self.speech_api.streaming_recognize(sample, language_code,
235+
max_alternatives,
236+
profanity_filter,
237+
speech_context,
238+
single_utterance,
239+
interim_results)
240+
for response in responses:
241+
results = getattr(response, 'results', [])
242+
if results or interim_results:
243+
for result in results:
244+
yield [Alternative.from_pb(alternative)
245+
for alternative in result.alternatives]
246+
162247
def sync_recognize(self, sample, language_code=None,
163248
max_alternatives=None, profanity_filter=None,
164249
speech_context=None):

speech/google/cloud/speech/sample.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,8 @@ class Sample(object):
4747
default_encoding = Encoding.FLAC
4848
default_sample_rate = 16000
4949

50-
def __init__(self, content=None, source_uri=None,
51-
encoding=None, sample_rate=None):
50+
def __init__(self, content=None, source_uri=None, encoding=None,
51+
sample_rate=None):
5252

5353
no_source = content is None and source_uri is None
5454
both_source = content is not None and source_uri is not None

0 commit comments

Comments
 (0)