Python Speech Software for Mac

View 65 business solutions

Browse free open source Python Speech Software for Mac and projects below. Use the toggles on the left to filter open source Python Speech Software for Mac by OS, license, language, programming language, and project status.

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    DeepSpeech

    DeepSpeech

    Open source embedded speech-to-text engine

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers. DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. A pre-trained English model is available for use and can be downloaded following the instructions in the usage docs. If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the DeepSpeech releases page.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 2
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details). Listening to a microphone in the background, various other useful recognizer features. The easiest way to install this is using pip install SpeechRecognition. The first software requirement is Python 2.6, 2.7, or Python 3.3+. This is required to use the library. PyAudio is required if and only if you want to use microphone input (Microphone). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations. To hack on this library, first make sure you have all the requirements listed in the "Requirements" section.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 3
    VCClient

    VCClient

    Software that uses AI to perform real-time voice conversion

    VCClient is a real-time voice conversion system that uses machine learning models to transform a speaker’s voice into another voice with minimal latency. It is designed for live applications such as streaming, gaming, and virtual communication, where immediate feedback is essential. The system supports multiple voice conversion models, including RVC and other neural network-based approaches, allowing users to switch between different voices or customize their output. It provides both a graphical user interface and API access, making it suitable for casual users as well as developers who want to integrate voice transformation into their own applications. The project also supports GPU acceleration, enabling faster inference and smoother real-time performance on compatible hardware. Additionally, it includes tools for training and managing voice models, giving users the ability to create personalized voice profiles.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 4
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    PersonaPlex is an open-source real-time conversational speech AI model that goes beyond traditional text chat by providing full-duplex speech-to-speech interaction, meaning it can listen and talk at the same time instead of waiting for you to finish speaking before responding. This architectural approach eliminates awkward pauses and makes conversations feel much more human-like, with natural behaviors such as overlapping speech, interruptions, and fluent turn-taking, traits that traditional AI assistants typically lack. PersonaPlex also supports persona and voice control, allowing developers to define the role and speaking style of the agent using text prompts and voice conditioning, making it suitable for applications like customized voice assistants, interactive character agents, or domain-specific conversational tools. Internally, it processes continuous audio streams in a hybrid input format so that speech understanding and generation occur jointly.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    VoiceCode is an Open Source initiative started by the National Research Council of Canada, to develop a programming by voice toolbox. The aim of the project is to make programming through voice input as easy and productive as with mouse and keyboard. For install, Use subversion, as described in this page: http://sourceforge.net/apps/mediawiki/voicecode/index.php?title=VCode_1_Doc/InstallationManual.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    FM2TXT

    FM2TXT

    RtlSdr listen to radio, recognize audio, and writes text file log

    Just log your favorite FM station speech to a text file using rtl-sdr dongle and speech recognition. Cross-platform tool. Follow the README on the download page for Windows installation. https://sourceforge.net/projects/fm2txt-rtlsdr/files/ If you prefer GitHub source, not SF: https://github.com/randaller/fm2txt For those, who want to recognize from soundcard, not from rtl-sdr (this allows to transcribe NFM etc): https://github.com/randaller/souncard2txt
    Downloads: 10 This Week
    Last Update:
    See Project
  • 7
    Yet Another Audio Feature Extractor is a toolbox for audio analysis. Easy to use and efficient at extracting a large number of audio features simultaneously. WAV and MP3 files supported, or embedding in C++, Python or Matlab applications.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Performs actions on detected volume threshold Examples : - Launch music on clap - Launch speech recording when you start speaking - Launch guard webcam when a significant sound is detected - Increase or decrease headphones volume when ambient noise pass
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    QWave: Qt-based waveform display and audio playback class library.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 10
    ASR-Builder provides an easy-to-use interface to the HTK toolkit, that allows users to build ASR systems. ASR-Builder provides a platform that performs house-keeping tasks when using HTK and also provides default training/testing/recognition scripts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    AarTon
    AarTon is an automated text-to-speech application. It allows user to enter text in a web-based front-end and render these texts via a multi-channel sound card.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    AGTK is a suite of software components for building tools for annotating linguistic signals, time-series data which documents any kind of linguistic behavior (e.g. audio, video). The internal data structures are based on annotation graphs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Eve is a AI project written in python that takes commands verbally or textually to control the computer and eveyday functions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    InproTK

    InproTK

    An Incremental Spoken Dialogue Processing Toolkit

    InproTK is an Incremental Spoken Dialogue Processing Toolkit, that is, a toolkit to help you build dialogue systems that listen and talk incrementally, allowing for advanced interactional behaviour. Please see our Wiki for more information: http://sourceforge.net/p/inprotk/wiki/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Moshi

    Moshi

    A speech-text foundation model for real time dialogue

    Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size), yet performs better than existing, non-streaming, codecs like SpeechTokenizer (50 Hz, 4kbps), or SemantiCodec (50 Hz, 1.3kbps). Moshi models two streams of audio: one corresponds to Moshi, and the other one to the user. At inference, the stream from the user is taken from the audio input, and the one for Moshi is sampled from the model's output. Along these two audio streams, Moshi predicts text tokens corresponding to its own speech, its inner monologue, which greatly improves the quality of its generation. A small Depth Transformer models inter codebook dependencies for a given time step, while a large, 7B parameter Temporal Transformer models the temporal dependencies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    PhoneBlogger allows you to post to a weblog by phone. PhoneBlogger is written in VoiceXML, Python, and JavaScript.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    The PyGE (Python Gutenberg E-text) project is a suite of GUI desktop utilities written in Python to promote and facilitate awareness and enjoyment of works of literature that are available from the archives of Project Gutenberg.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    RNNLIB is a recurrent neural network library for sequence learning problems. Applicable to most types of spatiotemporal data, it has proven particularly effective for speech and handwriting recognition. full installation and usage instructions given at http://sourceforge.net/p/rnnl/wiki/Home/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    This is a fast C implementation of Arturo Camacho's SWIPE' pitch extraction algorithm. See the project homepage for more about the advantages of the SWIPE' algorithm. swipe-1.0.tar.gz contains the current source, which should compile quite neatly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    A collection of tools for generating audio and visual (PNG/HTML/WAVE) for use in web sites including CAPTCHA challenges and PNG image creation tools with Javascript mouse tracking support.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    SoccerPhone provides lives soccer scores by phone. The only league currently supported is US Major League Soccer. Support for Soccernet is under development. SoccerPhone is written in VoiceXML, Python, and JavaScript.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Speect
    Speect is a multilingual TTS system. It offers a full text-to-speech system with various API's, as well as an environment for research and development of TTS systems and voices. It is written in ANSI C and uses a plug-in mechanism for extensions. Speect also includes an extensive set of Python bindings for quick implementation of new ideas, these bindings are derived from SWIG interface files and can easily be extended for other languages supported by SWIG. Speect is free and open source software. As a collection it is distributed under a MIT license.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    Steel TTS

    A cross-platform wrapper for common text-to-speech engines in Python

    Steel is a cross-platform package for using common text-to-speech (speech synthesis) engines in Python. Steel currently supports the following TTS software: - Microsoft Speech API 5 (SAPI5) - eSpeak - NS Speech Synthesis - FreeTTS Documentation: http://sourceforge.net/p/steeltts/wiki/ Bug Tracker: http://sourceforge.net/p/steeltts/tickets/ If you are interested in contributing to the Steel TTS codebase, or would like to make a feature-request, please contact the lead developer, Jasper Danielson, at jrd4@rice.edu.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Software to fit whole-sentence language models using the principle of maximum entropy. For developers of speech recognizers, text prediction interfaces, OCR, machine translation software.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    pyespeak

    Python to eSpeak speech synthesis

    ctypes Python module for eSpeak http://espeak.sf.net speech synthesis
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB