Audio & Video
Audio, speech, music, and video tools for generation, editing, dubbing, and production.
Wondershare Filmora 2023 is a domestic video editing software that is easy to use and feature-rich, supporting one-click import of SRT subtitles, with a simple and stylish interface, flexible timeline editing functions, and abundant resource effects.
MyVocal.ai is a tool that provides voice synchronization and voice cloning features. Users can synchronize their own voice with popular music and complete voice cloning in a relatively short time.
Pod Genie is an AI podcast tool that can convert RSS feeds into personalized podcast content, and provides customized news broadcasts, newsletters, and summary services, making it convenient for users to access audio information based on their interests.
Lovo is an AI voice generation and text-to-speech tool that supports converting text into natural speech, suitable for audio content production, voiceover, and various creative scenarios, helping reduce manual recording costs and time investment.
YouWhisper is a machine-learning-based video production and editing tool for users who need to quickly process video footage, offering multiple editing options to help create higher-quality video content.
Mubert is an AI music generation tool that provides royalty-free tracks for content creators and app developers, and can generate music by style, mood, use case, and duration.
Muse.ai is an ad-free video hosting platform that offers 4K playback, an embedded player, video analytics, and AI search, suitable for video publishing, management, and content retrieval.
Koe Recast is an AI voice conversion tool that can transform a user's voice into different styles, suitable for scenarios such as voice chat, virtual social interaction, and real-time interaction.
Ad Auris is a tool that lets you create article playlists on Spotify, Apple Podcasts, and Google Podcasts
imajinn.ai provides an AI-based couple portrait generation service. After users upload photos of both people, it can generate romantic-style portrait artworks and can also be used for canvas or poster production.
Genmo AI is an AI-based video generation tool focused on quickly turning ideas into visually expressive content, and it also provides community-generated works for browsing and reference.
AI Voice Detector is an audio authenticity detection tool used to identify whether speech is generated by AI. Users can upload audio files for verification, making it suitable for scenarios involving evidence review, media judgment, and authenticity analysis in customer communications.
MusicLM is a music generation model proposed by the Google Research team that can generate high-fidelity music from text descriptions. It is mainly used to demonstrate research capabilities in text-to-music generation and examples of generated results.
Soundful is an AI music generator platform that enables content creators and music artists to create an unlimited number of tracks and make money.
Lemonaid Music is an AI tool that provides creative inspiration for musicians. It can generate MIDI music elements such as melodies, chords, and drum patterns, helping users quickly start arranging and composing.
2short.ai is a short-form video generation tool for YouTube content creators that can extract highlight clips from long videos and convert them into Shorts, while also supporting automatic subtitle export.
Article.Audio is an online service that converts article content into spoken audio, supporting the transformation of text articles into listenable audio for convenient access to information when reading is inconvenient.
Reachout.ai is an AI video outreach tool that helps entrepreneurs and sales teams create personalized talking videos for email outreach and prospect communication, and supports video performance tracking and analysis.
Vidon.ai covers two capabilities: AI video generation and video analytics. On one hand, it can turn blog content into videos; on the other hand, it can be used for analytics scenarios such as visitor counting, queue monitoring, VIP recognition, and age and gender estimation of crowds.
AnthemScore is an automatic music transcription software that uses AI to convert MP3 and WAV audio files into sheet music or guitar tablature, and supports Windows, macOS, and Linux.
Transkribieren is an audio transcription tool that supports uploading multiple audio formats, provides a relatively convenient speech-to-text service, and extends to use on mobile, in the browser, and in meeting scenarios.
Visla's AI video generator is a video tool for presentation and narrative content creation. It can quickly generate and edit videos with AI, and provides features such as transcription and asset suggestions to help users achieve professional expression.
HeyGen is an online AI video generation tool that supports creating talking avatar videos and provides customizable avatars and voiceover features, suitable for content production scenarios such as training, teaching, explanations, and marketing.
Adobe Speech Enhancer is an AI audio enhancement tool for improving the quality of voice recordings. It can reduce background noise and highlight voices, making ordinary spoken recordings sound clearer and closer to a studio effect.
Playlistable is an AI playlist generation tool that creates personalized playlists based on mood, occasion, and music preferences, and supports listening through Spotify integration.
Nuro.video is an AI video editing tool that can automatically transcribe, analyze, and organize long raw video footage into finished videos with titles, transitions, and animations.
Hit'n'Mix is an audio processing tool that can be used to remove vocals, create tracks, mix, and repair audio. Try it free for 21 days and download now.
Inksprout Video is a tool that uses AI to turn blog content into social videos, automatically generates captions, and combines images, music, and other elements to create video content better suited for social media distribution.
Magenta Studio is a set of music creation plugins built on Magenta open-source models, mainly used to generate, complete, and transform MIDI music clips with AI, helping users explore melody and rhythm ideas.
Novels AI is a tool for generating personalized audio adventure stories, allowing users to customize characters and plot choices and experience AI-driven immersive story content in audiobook form.
Brain.fm is an AI music app aimed at improving focus, meditation, and sleep experiences, helping users maintain attention or relax through background music with rhythmic pulses.
Fathom.fm is an AI podcast player that supports smart recommendations, clip sharing, podcast content search, transcript reading, and chapter navigation, making it easier for users to listen to and retrieve podcast content more efficiently.
Podsqueeze is an AI content repurposing tool for podcast creators that can generate supporting content such as show descriptions, timestamps, and newsletters around podcast audio, helping improve post-production organization efficiency.
Murf AI is an AI voice generation tool that converts text into natural, lifelike human speech, suitable for creating podcasts, video voiceovers, presentation narration, and other audio content.
NarrationBox is an AI voice generation tool that offers more than 700 AI narrator voices for creating audio content such as podcasts, audiobooks, and dubbing.
Latte is a video editing tool that can edit videos into the formats required by short-video platforms.
Revocalize AI is an AI voice synthesis tool that supports voice cloning, voice protection, and voice creation, offers multilingual voice options, and is suitable for audio content production and personalized voice applications.
Krisp is an AI-based noise cancellation app mainly used to improve the quality of online meetings and voice communication. It supports Mac and Windows, and offers voice productivity-related features and a free version.
Moises App is an AI music tool that supports adjusting song key and speed, separating vocals and instruments, and provides mastering and audio extraction features.
Hify is a website that provides video messaging services, helping salespeople create beautiful sales videos from the browser through dozens of personalized templates. Hify also offers automation, delivery, and CTA services. The website also has a help desk with articles about video production, lead generation, and getting started. Hify is also listed on DoMore.ai as a powerful sales video tool for companies of all sizes.
Groot Music is a music bot that runs on Discord, supports multilingual use, and provides advanced features including AI tools, making it suitable for voice interaction and music playback needs in communities.
Voiceful provides game character voice generation and speech synthesis demos, and supports integration into Unity via SDK, making it suitable for development and testing scenarios that require character voice capabilities.
FineShare FineVoice is an AI real-time voice changer that supports instant voice adjustment and personalized processing during meetings, live streams, chats, and gaming.
Songmastr is an AI-based online music mastering tool that can process users' uploaded songs by matching them to a reference track's loudness, spectral characteristics, peak amplitude, and stereo width, supporting MP3 and WAV formats.
Steve AI is an AI video creation tool for social media operations and content marketing scenarios, capable of quickly converting scripts, blogs, or text into short videos and animated content.
Genmo is an AI tool that generates videos from text. After users enter text content, they can quickly create videos with style and music options, lowering the professional skill requirements for video creation.
Harmonai is a community-driven open-source generative audio project dedicated to providing AI tools for music creation, enabling more people to participate in sound and music generation practice.
Doctor Mix AI Synth is a software tool for the AX73 synthesizer, described as synthesizer software written by AI, and provides related feature explanations, a download entry, and usage tutorials.
Listener.fm is an AI-assisted editing tool for podcast creators that can automatically generate podcast titles, content descriptions, and show notes, helping simplify post-production organization and publishing workflows.
Replica Studios is an AI voice generation tool for creative projects, offering virtual voice actors with emotional expression that can be used to create more natural voice performance content.
Free Text to Speech Generator is an online TTS tool that supports multiple languages, multiple dialects, and mixed Chinese-English reading, and can convert text into speech and export MP3 files.
Flowjin is an AI tool that automatically converts long-form audio content into short video clips. It is suitable for turning interviews, podcasts, or Spaces content into shareable short-form content, and it also supports maintaining an online homepage and profile information.
FolkTalk is an AI video dubbing platform that supports multilingual dubbing, helping creators and organizations distribute video content to audiences in different languages while preserving the original expression style as much as possible.
Omniverse Audio2Face is an AI facial animation tool launched by NVIDIA that can automatically generate matching character facial expressions and lip-sync animation from audio, suitable for real-time and traditional character production workflows.
LALAL.AI is an audio separation tool that can extract vocals or multiple instrument tracks from songs, supports high-quality audio processing, and is suitable for music editing, practice, and asset creation.
Splashmusic is a project that lets everyone enjoy the fun of music creation. It provides easy-to-use music production tools, allowing users to easily create, record, and share their own music works. Whether you are a music enthusiast or a professional musician, Splashmusic can meet your needs.
Summarize Tech is an AI-based video summarization tool mainly used to generate concise text summaries for long videos, helping users quickly understand key video content and reduce the time required for full viewing.
Tapesearch is a search engine that uses artificial intelligence to quickly search podcast transcripts. Download transcripts of your favorite podcasts today.
Taption is a video and audio transcription tool that supports automatic subtitle generation and translation, covers more than 40 languages, and provides built-in editing features for organizing transcription content.
Voicemod is a website that provides real-time voice changing and custom sound effects, serving desktop applications and games such as Discord, ZOOM, Google Meet, and Minecraft.
Koolio.ai is a web-based AI podcast creation and collaboration tool that helps users quickly complete podcast content from ideation to editing. It supports audio transcription, collaborative editing, automatic matching of sound effects or music, and common audio processing operations.
Natural Language Playlist is an AI tool that generates music playlists through natural language. Users only need to enter a single description to get playlist recommendations matching a corresponding style or mood.
NaturalReader is a tool that provides AI text-to-speech services, supporting online use, mobile apps, commercial licensing, and educational scenarios, suitable for converting text content into audio for listening.
Sonify is a provider of tools and solutions focused on combining audio and data, with a core direction of turning data into sound to help users understand, analyze, and experience information more intuitively through hearing.
SpeechEasy provides high-quality text-to-speech services
VEED is an online video editing tool. It offers features such as subtitles, editing, encoding, a stock library, as well as music and waveforms. The tool provides a full-featured, user-friendly experience without requiring complex training. VEED also offers video recording, storage, and sharing functions integrated in one place. The tool is suitable for various users, such as marketing teams, HR professionals, and content creators. VEED is easy to use, enables one-click editing, and can be tried without signing up.
Gling is an AI tool designed specifically for video content creators. Its AI technology can automatically cut out silence and mistakes in videos, letting you focus on producing high-quality YouTube videos. With Gling, you can skip tedious post-production work and create high-quality video content more easily.
Nonoisy is an audio post-processing tool mainly used to remove background noise, optimize audio quality, and adjust volume. It can also be used to improve audio performance in videos.
Riffusion is an AI project that generates music based on text prompts, using the Stable Diffusion model approach to convert text input into audio content and helping users quickly try music creation.
This is a website that provides AI-powered rapid short video creation, but currently does not have any content.
Voicera is an article-to-speech tool that can automatically detect content and generate a playable audio version. It supports multiple languages and voice options, making it convenient for users to access information by listening.
Timebolt Static Filter is a tool for automatically editing videos. It can identify and remove silent and pause segments, and supports quick editing of scenes or words, making it suitable for post-production organization of long videos.
Tavus is an AI personalized video generation tool for product, marketing, and sales teams. It can mass-produce customized videos for different audiences based on templates, and use voice variables to deliver communication content that better matches the recipient.
Musico is an AI music generation engine that can automatically create royalty-free music in a variety of styles and respond in real time to gestures, movements, code, or other audio inputs.
Audioshake is an audio separation and processing tool that can extract track elements such as rhythm, harmony, and melody, making remixing, sampling, restoration, and recreation easier. It is suitable for music production and copyright-related workflows.
Podium is a tool for organizing podcast content. It can generate transcripts, chapters, show notes, and content clips, and improve podcast post-production and distribution efficiency through simple drag-and-drop operations.
Pop2Piano is a research project that generates piano versions of pop songs from audio input, providing papers, demo videos, and sample data that can be used to understand progress in music generation and automatic piano arrangement.
Soundraw is an AI music generator built for creators that can quickly generate background music based on user-set parameters such as genre, mood, instruments, and duration. Users can select music styles such as pop, hip-hop, and classical, and adjust tempo, volume, and instrument combinations to generate music clips that meet their needs.
AI models for transcribing and understanding speech
IBM Watson Text to Speech
An AI video color grading tool also used by Hollywood
AI one-click video background removal
AI automatically removes video backgrounds
Xunfei Zhizuo is a one-stop AIGC content creation platform launched by iFLYTEK, providing services such as text-to-speech and virtual digital human video production based on artificial intelligence technology. Users can easily achieve rapid generation of audio and video content and create high-quality media works without professional skills.
AI text-to-speech generation tool
AI real-time voice changing tool
AI digital human talking-head video generation tool
Wondershare Virbo is an AI digital human talking-video marketing tool launched by Wondershare Technology, focused on providing video creators and cross-border e-commerce practitioners with a full-chain AIGC creation experience. The software uses advanced AI technology to allow users to quickly generate HD videos containing digital human characters, dynamic scenes, and precise backgrounds through simple text input or voice files.
Miaochuang (formerly "Yizhen Miaochuang") is an intelligent AI content generation platform based on the Miaochuang AIGC engine, providing creators and organizations with AI generation services including text continuation, text-to-speech, text-to-image, and image-and-text-to-video. Yizhen Miaochuang intelligently analyzes copy, assets, AI voice, subtitles, and more to quickly produce finished videos, enabling zero-threshold video creation.
Uberduck is an open-source community for AI voice generation and synthesis. The platform offers more than 5,000 voices to help users create AI dubbing and speech, and you can even use your own custom voice clone for synthesis.
Moyin Workshop is a professional AI voiceover tool with more than 800 voices and over 1,000 styles, meeting a wide range of needs from video dubbing to audiobooks. Moyin Workshop offers rich features, including speech rate adjustment, polyphonic character selection, and pause control, ensuring realistic and natural text-to-speech results. Users can easily download lossless audio files and enjoy a convenient voiceover experience.
Qimiaoyuan is an AI digital human short-video and livestreaming solution launched by Mobvoi. With this digital avatar creation and livestreaming platform, users can create their own digital avatars and conduct livestreaming activities through them. The Qimiaoyuan platform currently has more than 100 digital humans and more than 1,000 3D digital assets, providing users with a wide range of choices.
beatoven.ai is an AI music generation platform designed to provide royalty-free background music for video, podcast, and game creators. Users only need to enter their music ideas to quickly generate music in more than 250 styles. The platform supports personalized customization, including music length, style, mood, and instrument selection, to meet different creative needs.
ElevenLabs is an AI text-to-speech platform that provides realistic voice synthesis solutions for developers, creators, and enterprises. Its core products include text-to-speech (supporting 29+ languages including Chinese and 10,000+ voices), AI dubbing, voice cloning, music generation, and more.
Pika is an AI video generation and editing tool launched by the recently popular artificial intelligence startup Pika Labs. It can turn any idea into video. Users only need to enter text or images to quickly generate videos in styles such as 3D animation, anime, cartoon, and cinematic visuals.
Sora is an AI video generation model developed by OpenAI, capable of converting text descriptions into video and creating video scenes that are both realistic and imaginative. The model focuses on simulating motion in the physical world, aiming to help people solve problems that require interaction with the real world. Sora can generate videos up to one minute long while maintaining visual quality and a high degree of fidelity to user input.
Deepgram is a platform that provides advanced AI speech recognition and natural language processing technology. Its core products are powerful Speech-to-Text (STT) and Text-to-Speech (TTS) APIs, enabling developers to quickly integrate voice transcription and understanding capabilities into their own applications and services.
Youyan is a one-stop AIGC video creation and 3D digital human generation platform launched by Mocap Technology, helping users create videos without real people appearing on camera by providing a large number of ultra-realistic 3D virtual human characters.
Spikes Studio is an AI-powered video auto-editing tool that can automatically analyze and summarize long videos, extract key clips, and generate multiple short videos, aiming to simplify the editing workflow for video content creators. It is especially suitable for fast-paced social media platforms.
Chanjing is an AI digital human short video and livestreaming platform launched under the marketing data analysis platform Chanmama. Through ultra-fast cloning technology and an efficient content production workflow, it enables users to quickly create and publish digital human short videos.
Xiling Digital Human is a digital human platform launched by Baidu AI Cloud based on artificial intelligence technology, providing enterprises and individual developers with high-performance, easy-to-integrate, and diverse digital human component capabilities. The platform supports digital human avatar customization, video synthesis, interactive dialogue, livestreaming, and other multi-scenario applications to meet the needs of different industries.
Xunfei Huijing (formerly Xinghuo Huijing) is an AI short-video creation platform launched by iFLYTEK that can automatically convert user-entered text descriptions into video content (such as short dramas, trailers, and MVs), including generating video scripts and storyboards, ultimately forming complete short videos.
KreadoAI is an AIGC digital marketing video creation platform focused on using artificial intelligence technology to simplify and optimize the video content creation process. Users only need to enter text or keywords, and Kreado AI can create video content with real or virtual people.
Wondercraft is a versatile AI audio content creation platform that uses generative AI voice technology to allow users to quickly convert text content into podcasts, audiobooks, ads, and other audio formats.
Bairimeng AI is an AI video creation platform launched by Guangmo Technology. Through natural language processing technology, it allows users to input text content and quickly generate videos, with a maximum length of up to 6 minutes. The platform supports text-to-video, dynamic visuals, AI character generation, and other features, while maintaining consistency in characters and scenes.
LangLang Voiceover is an intelligent text-to-speech tool that provides voice synthesis services. It supports more than 30 languages, including Chinese, English, German, and French, as well as more than 10 emotional styles such as happy, sad, and excited. The platform is feature-rich and easy to use, supporting SSML tags to enable advanced functions such as polyphonic character handling and multi-speaker dubbing.
Jurilu is a one-stop AI anime video creation platform that uses natural language processing and image generation technologies to help users turn text into comic videos and short videos with coherent plots. Jurilu supports one-stop production from copywriting to video, offers multiple art style options and dubbing/background music services, and is suitable for various types of creators.
SoundView is an AI video localization tool that supports video dubbing and video translation. SoundView integrates multilingual translation, speech synthesis, speech recognition, and large-model technology to simplify and accelerate the creation of product marketing videos. SoundView supports dubbing and subtitle editing in 100 languages, increasing video production efficiency by 10 times and reducing video translation costs by 90%.
JoyPix is an AI creation tool focused on digital humans and speech synthesis. Users can create personalized virtual avatars by uploading photos, with support for voice conversations with virtual avatars.
Pollo AI is an all-in-one AI image and video creation platform. The platform integrates a variety of advanced AI models, such as Pollo 2.5 and Veo 3, and supports multiple functions including text-to-video, image-to-video, and video style transfer.
Keevx is an AI digital human video creation tool that helps users quickly generate multilingual, high-quality video content. Keevx offers a wide range of features, including viral video repurposing, URL-to-video, video translation, AI script generation, PPT/PDF-to-video, and more. Users can use 210+ native overseas digital human avatars and 40+ video templates, combined with 70+ languages and 180+ accents, to easily produce videos that meet their needs.
Xmov Nebula is an embodied intelligent 3D digital human open platform launched by Xmov Technology, dedicated to upgrading AI from “having a brain” to “having a body” to enable natural expression and interaction. Based on text input, Xmov Nebula can generate a 3D digital human’s voice, expressions, and movements in real time, supporting multimodal generation, low-cost operation, low-latency interaction, and multi-terminal adaptation.
Yunmu Tongsheng is a new-generation professional AI video translation tool with original-voice-level quality, suitable for short drama overseas expansion, cross-border e-commerce, and other fields. Its AI voice cloning with 98% voice restoration, precise audio-video synchronization algorithms, and AI vocal separation model can fully preserve background music and emotional details, making translated videos as natural as the original.
Vemus is Tencent Music's first one-stop AI music creation tool, offering zero-threshold multimodal music creation so everyone can make music. It compresses "songwriting" into three steps: enter a sentence, an image, or a humming clip, and AI automatically completes lyric writing, composition, arrangement, and singing within seconds, with instant switching across styles such as pop, Chinese style, and electronic.
