Found 20 results for “Speech Recognition”
OpenGPT is a tool platform for building ChatGPT applications based on APIs, supporting capabilities such as multilingual support, instant messaging, speech recognition, and natural language processing, while also providing reference application examples and open-source code.
GPUX.AI provides resource services for GPU computing tasks, supports running various GPU applications in Docker containers, and offers automatically scalable inference capabilities.
RunPod is a GPU cloud service for AI and high-performance computing scenarios, providing on-demand rental, serverless GPU computing, managed AI endpoints, and Jupyter Notebook capabilities.
AI Depot is a platform that aggregates various types of artificial intelligence tools, covering areas such as text analysis, speech recognition, image recognition, and predictive analytics, helping users find suitable machine learning capabilities for different types of applications.
NeuroSpell is a deep learning-based spelling and grammar auto-correction tool that supports more than 30 languages and provides capabilities such as speech-to-text, OCR error correction, and customizable terminology training.
Miniapps.ai is a website that aggregates a variety of free AI mini apps and tools, covering areas such as health, social media, and SEO, and also supports exploring and creating simple AI applications for quick and easy use.
Supertranslate is a video subtitle tool that can automatically transcribe videos in more than 100 languages and generate English subtitles, suitable for creators and teams that need to distribute content across languages.
Wisecut is an online AI video editing tool that uses speech recognition to automatically process video content, remove pauses, generate subtitles, and add background music. It is suitable for quickly organizing talking-head, interview, and podcast videos.
AI models for transcribing and understanding speech
iFLYTEK Meeting is an intelligent video conferencing software launched by iFLYTEK, featuring high definition, low latency, and multi-party collaboration. It supports screen sharing, real-time multilingual subtitles, automatic meeting record generation, and AI noise reduction technology, providing a high-definition and stable audio and video experience. Users can join from multiple terminals such as PC, mobile phone, and smart screens, enjoying convenient remote collaboration and meeting experiences.
Feishu Minutes offers intelligent meeting notes and fast AI speech-to-text transcription.
Your free portable translator
Qimiaoyuan is an AI digital human short-video and livestreaming solution launched by Mobvoi. With this digital avatar creation and livestreaming platform, users can create their own digital avatars and conduct livestreaming activities through them. The Qimiaoyuan platform currently has more than 100 digital humans and more than 1,000 3D digital assets, providing users with a wide range of choices.
Cohere is a platform that provides large language models, helping developers and enterprises build high-performance AI products. The platform mainly offers AI-powered search text services (multilingual embeddings, neural search, search ranking), text classification, and text generation, helping enterprises quickly deploy conversational AI chatbots, generative search engines, text summarization, and enhanced vector retrieval.
Label Studio is an open-source data labeling platform launched by Human Signal (formerly Heartext). The project has nearly 14,000 stars on GitHub and can help developers fine-tune large language models, prepare training data, or validate AI models.
ElevenLabs is an AI text-to-speech platform that provides realistic voice synthesis solutions for developers, creators, and enterprises. Its core products include text-to-speech (supporting 29+ languages including Chinese and 10,000+ voices), AI dubbing, voice cloning, music generation, and more.
Zidong Taichu is a multimodal large model jointly launched by the Institute of Automation, Chinese Academy of Sciences, and the Wuhan Institute of Artificial Intelligence. It is the upgraded 2.0 version built on the 100-billion-parameter multimodal large model “Zidong Taichu 1.0.” The Zidong Taichu large model supports comprehensive question-answering tasks such as multi-turn Q&A, text creation, image generation, 3D understanding, and signal analysis. It has strong cognitive, comprehension, and creative capabilities, and can deliver a brand-new interactive experience.
Deepgram is a platform that provides advanced AI speech recognition and natural language processing technology. Its core products are powerful Speech-to-Text (STT) and Text-to-Speech (TTS) APIs, enabling developers to quickly integrate voice transcription and understanding capabilities into their own applications and services.
SoundView is an AI video localization tool that supports video dubbing and video translation. SoundView integrates multilingual translation, speech synthesis, speech recognition, and large-model technology to simplify and accelerate the creation of product marketing videos. SoundView supports dubbing and subtitle editing in 100 languages, increasing video production efficiency by 10 times and reducing video translation costs by 90%.
iFLYTEK Simultaneous Interpretation is a professional AI simultaneous interpretation product launched by iFLYTEK. Based on its world-leading intelligent speech and language technologies, it provides integrated simultaneous interpretation services for multiple scenarios and languages, including real-time transcription and translation, simultaneous interpretation, live subtitle display, and meeting record sharing.
