Embedding google speech recognition in python desktop application without limits.

in programming •  7 years ago  (edited)

google has developed google speech recognition api for desktop applications but you need key for it and free key comes with 60 min for 1 month. it's easier to use api with limited minor usage.

import speech_recognition as sr
from mtranslate import translate
r = sr.Recognizer()
def run():
 with sr.Microphone() as source:
    print('listening')
    audio = r.listen(source)
    result = r.recognize_google(audio,language='en-US')
    print(result)
    print(translate(result))
    run()
run()

just couple lines of code but as i mentioned it's limited you can process 60 min audio file to text.


here starts main part of tutorial. since google javascript api is completely free and without limitations we are going to be using this. we will be using https://www.google.com/intl/en/chrome/demos/speech.html as server which will take our requests. also speechrecognition by google javascript api works only in google chrome so we need to stick with this. and in order to embed chrome with our desktop application we will need selenium as automation for chrome and for interacting with DOM elements.
you will need pip install selenium and also download chromedriver.exe. then place it into python>scripts folder if want to avoid path assignment for selenium. when you open this code from interpreter black window popup chromedriver which. then browser will start and then you will to start dictating and after that you have 4 sec to say all you want. then it will recognize and print results. after that you can rework this code as you wish and embed as you want. you got the general idea.


from selenium import webdriver
from selenium.webdriver.common import by
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from time import sleep

print 'initializing'
option = webdriver.ChromeOptions()
option.add_argument("--incognito")
option.add_argument("--use-fake-ui-for-media-stream")
browser = webdriver.Chrome(chrome_options=option) # there you see i dont have path set. my chrome driver lays in python>scripts folder
browser.get("https://www.google.com/intl/en/chrome/demos/speech.html")
browser.execute_script('return document.getElementById("select_language").selectedIndex = 11')
browser.execute_script('return updateCountry()')
print 'listening'
browser.execute_script('return document.getElementById("start_img").click()')
sleep(4)
browser.execute_script('return document.getElementById("start_img").click()')
print browser.execute_script("return document.getElementById('final_span').innerText")

i know this tutorial was not nicely done because i had not enough time and was rushing, but i wanted to share.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Congratulations @celestialme! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of upvotes

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!