voicechat2 - Open source voice chat infra that rivals GPT-4o

in steemhunt •  6 months ago 

voicechat2

Open source voice chat infra that rivals GPT-4o


Screenshots

z.png


Hunter's comment

AI voice chat infrastructure that uses WebSockets. It can achieve voice-to-voice latency as low as 300ms (what GPT-4o does) without a unified voice codec. Everything runs on a single high-end consumer GPU.
On an 7900-class AMD RDNA3 card, voice-to-voice latency is in the 1 second range:

Whisper large-v2 (Q5)
Llama 3 8B (Q4_K_M)
tts_models/en/vctk/vits (Coqui TTS default VITS models)
On a 4090, using Faster Whisper with faster-distil-whisper-large-v2 we can cut the latency down to as low as 300ms:
These installation instructions are for Ubuntu LTS and assume you've setup your ROCm or CUDA already.

I recommend you use conda or (my preferred), mamba for environment management. It will make your life easier.


Link

https://github.com/lhl/voicechat2?ref=producthunt



Steemhunt.com

This is posted on Steemhunt - A place where you can dig products and earn STEEM.
View on Steemhunt.com

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Nice Open source voice chat infra that rivals GPT-4o.

Very cool Open source voice chat infra that rivals GPT-4o.

Upvoted! Thank you for supporting witness @jswit.

image.png

Congratulations!

We have upvoted your post for your contribution within our community.
Thanks again and look forward to seeing your next hunt!

Want to chat? Join us on: