China’s $9 AI Video Tool Kling 2.1 Adds Audio—Can It Beat Google’s $250 Veo 3?

Briefly

-Cinene AI tool Kling 2.1 now generates videos with synchronized sound, including steps, rain and ambience effects.

With only $ 9 a month, the cling greased the Google Veli 3 by more than 20 times.
We examined both tools with our heads to the head: the cling shines on prices and flexibility, but the veil still leads to the quality of dialogue and sound.

The Chinese short video platform Kuaishou has added a feature of a generation of sound Kling 2.1, its tool to create video with AI drive, allowing users to produce clippings with synchronized sound effects such as feet, rain and environmental noise.

The feature, which was calmly launched last week, is available in Klingovo mode with pictures, where users convey a quiet image, and the platform is animated by the movement and sound generated artificial intelligence.

Time occurs against Google’s Veli 3, which has launched integrated audio opportunities from day one.

Early users on X praised Kling’s merciless audio-visual synchronization, and Creator Roberto Nickson calls it “one of the most useful models on the market” for the production of generative video content.

The feature is free during the initial presentation, available through Kling’s websites and mobile applications.

Kling 2.1 one of the most useful models on the market

– Roberto Nickson (@rpnickson) June 12, 2025

Kling 2.1 generates clippings 5 to 10 seconds to 1080p resolution, using what the company describes as “3D spatial attention mechanisms” to synchronize the sounds with the visuals.

The audio tool currently only generates sound effects – without dialogue or music – and creates something similar to the audio language of Southeast Asia when it comes to text – a very tone and completely incomprehensible. But that is not enough to crown Google as an undisputed king of the generative video.

We tested new audio features Kling 2.1 against Google’s Veli 3 to see how he is training.

Creation price

The gap price between the two platforms turns out to be massive.

The Kling 2.1 audio feature is compatible with the standard version, not a higher grade edition. However, at current prices, users can generate more than 20 video on the wedge for each veil creation 3.

For example, using a freepik credit system, one generation with Google Veli 3 is currently on sale for 4,000 loans (with the usual price of 8,000 loans per video), while Kling 2.1 costs 300 loans per video.

The Google model passes exclusively through its $ 250 per month Ultra subscription. Kling is available on its Official placeBy offering some free generations, they start with a subscription from about $ 9 a month.

Even with Google’s current promotional prices, Veli 3 remains ten times more expensive than the cling.

For creatives who know that video formation involves plenty of attempts and mistakes, with failure rates that frustrate even patients users, the clingic economy makes experimenting with a feasible.

The Premium Plan for Kling unlocks 1080p resolution, improveing the overall quality of the video while maintaining the cost advantage.

Audio capabilities

But you get what you pay. Veli 3 offers sophisticated sound creation, exactly synthesizing speech and reconciling complex audio elements with visual scenes.

His understanding of spatial sound and context sounds surpassed the wedges offering with a wide margin.

While Kling 2.1 cannot be quite competing, it is focused on something different: the sounds of the environment and the background effects – without dialogue, without music. So forget about the ones Viral AI street interviews For now. Attempts to generate sound to make speech.

Still, for scenes or videos that require atmospheric sound, its results were helpful.

2. Off-road SUV drives through a rocky, mud and wet forest terrain.

You hear a crumb, a spray, a bushing of the engine. He felt like a real shot. pic.twitter.com/s0gvhcaqjk

– Zoya ✪ (@zoya_ai) June 12, 2025

The new platform capacity to add effects with existing quiet videos gives it an advantage with which the veil 3 could not match.

Customers can upload ready -made videos and divert them with appropriate sound records, a body of work that the Google model does not support. Strange, the veil can create videos, but it can’t arrange them.

In addition to the possibility of creating sounds for quiet videos, Kling also offers a feature for lip synchronization.

Users can separately convey photography and speech or dialogue, and the model will make a video in which subjects communicate naturally, as if they were talking to each other according to the conveyed sound.

【Kling Ai (@Kling_ai)】 Update of lip synchronization !! 📢
Added function of editing of the lip synchronization that allows you to choose signs that appear in the video, choose which person speaks and adjusts the sound time. … pic.twitter.com/brvguoglks

– Seiiir😈video Generation AI X Afterfects (@seiiiiiiiiiir) June 10, 2025

The ratio of twenty to one generation meant that the creators could experiment with different audio approaches on the wedge, while users Veli 3 have to collect their sound design in fewer attempts.

For hobbyists and those who learn generative video, Kling’s approach offers more space for attempts and errors.

But professional creatives who need precise audio-visual synchronization and dialogue will find a sophisticated sound engine veil 3 worth premium.

The quality of generation of video generation

Video quality testing gave unexpected results. In the test scene with a woman who runs away from a giant spider, the standard Kling 2.1 version also outweighed the veil 3 and her main edition.

The standard model accurately represented the dynamics of the scene, showing the movement of fluid and the proper movement of the directed. Veli 3 inexplicably created a woman running toward a spider instead of further than him.

The main edition usually produces sharper, fresh visuals, but the standard version has shown a superior understanding of the scene and more flute movement.

This is unusual because greater resolution should always be translated to better results, but maybe the problem has been lowered to speed up the problems with technique or simply bad happiness in the generation.

Accordingly, Kling 2.1 Standard with 1080p generations is a great model that holds your Google Veli 3 here.

Working Platforms and restrictions

The platform limitations are differently shaped by the course of each tool. Audio Kling 2.1 feature only works with the generation of the image-video-video, not the text-video-video, which remains exclusively for the main edition without audio support-this is unusual, but that is what it is.

The best bypassing the use of colors, the Kuaishou generator of the paintings, to create the initial frames before turning them into a video with synchronized sound. Color produces very realistic images that serve as an excellent starting point to create a video.

However, you may find that models, including Reve, Midjourney, Recraft, Flux, and even chatgpt are easier.

Veli 3 took the opposite approach, offering only a generation of text to the video vigor without any options for video vigils.

This forces users to fully rely on fast engineering, without a way to control the initial visual visual.

Google’s decision also seems particularly strange considering that the previous veil 2 actually supports the video-video-video through its hamlet Flow platform.

Visual control deficiency means that users have to blindly generate videos blindly, hoping that their textual data will produce the desired initial frames.

Approaches to the art of content

The moderation of content has discovered contrasting philosophy. Veli 3 uses aggressive filtering keywords and checks after a generation, blocking the content that violates Google policy.

System markings can potentially problematic instructions before generating and analyzing completed videos due to policies violation.

Kling applies more liberal restrictions, allowing a content that will block the veil directly.

However, the training on the model training has naturally excluded explicit content – the model generates figures without anatomical details and violence without above.

Thus, users can generate certain types of content that bypasses the keyword filters while maintaining safety boundaries.

Both platforms restore loans when censorship blocks the video after a generation, but Klingov lighter touch allows more creative freedom within the borders.

Conclusion

Veli 3 may still be the king, but Kling 2.1 is definitely close to populist in the mission to overthrow the monarchy.

His audio feature is quite revolutionary when you think the $ 9 tool competing against a $ 250 subscription.

The atmospheric sounds act, the rain sounds like rain, the steps match the movement most of the time, and you can generate twenty attempts while users of the veil are carefully making their individual shot.

This feature of a subsequent installation, where you add sound to finished videos, is something that Google does not offer, and is really useful for rescuing quiet clips.

Things will look completely different if your main goal is speech. Kling’s Gibeni will not deceive anyone.

For this type of special request, Google Veli 3 is an obvious and only choice. The king is (almost) dead. Live a cling!

Edited by Josh Quittner and Sebastian Sinclair

Generally intelligent Bulletin

Weekly AI journey narrated by gene, generative AI model.

Source link

China’s $9 AI Video Tool Kling 2.1 Adds Audio—Can It Beat Google’s $250 Veo 3?

Briefly

Creation price

Audio capabilities

The quality of generation of video generation

Working Platforms and restrictions

Approaches to the art of content

Conclusion

Generally intelligent Bulletin

Leave a ReplyCancel Reply

Vitalik Buterin proposes to cap gas usage per Ethereum transaction to boost zkVM compatibility, security

SYRUP hits $2.58B TVL milestone – But here’s what’s stopping the rally

This Week in Crypto Games: Planetside Dev’s ‘Reaper Actual’, What’s Next for ‘MapleStory Universe’

Briefly

Creation price

Audio capabilities

The quality of generation of video generation

Working Platforms and restrictions

Approaches to the art of content

Conclusion

Generally intelligent Bulletin

Leave a ReplyCancel Reply

Trending now

Vitalik Buterin proposes to cap gas usage per Ethereum transaction to boost zkVM compatibility, security

SYRUP hits $2.58B TVL milestone – But here’s what’s stopping the rally

This Week in Crypto Games: Planetside Dev’s ‘Reaper Actual’, What’s Next for ‘MapleStory Universe’