AI open-source lip-sync solution testing: aigcpanel using wav2lip, latentSync, and MuseTalk.

Let’s talk about AI deployment

LatentSync 1.5 has a one-click Windows executable.

I tested it in an internet cafe with a GTX 4070 graphics card and my own P06-100 6GB SSD, and it worked perfectly, with relatively good performance.

MuseTalk is a new AI recommended by XAI and ChatGPT, and is said to be better than LatentSync.

However, MuseTalk doesn’t have a Windows executable or a one-click Windows executable.

I tried deploying a terminal using Python conda on Windows, and I also tried Python on Ubuntu Server. Deploying via conda failed repeatedly for various inexplicable reasons, just throwing errors. I have no idea how those who successfully deployed it online achieved their goals. After all, I’m a programmer with years of experience, yet I couldn’t get it to work.

I didn’t study wav2lip in detail. By the time I did, I had already discovered aigcpanel. It integrates mainstream large models, can be used offline, and has a GUI. So I used aigcpanel to test wav2lip, latentSync, and MuseTalk. Unfortunately, my graphics card’s processing power is too weak; a video of just a few seconds usually takes several hours to process.

GitHub address: https://github.com/modstart-lib/aigcpanel

Actual effect

LatentSync performs the best, but it’s also the most demanding. If a frame lacks a face or is blurry in 480p, it throws an error and stops processing immediately. It scores 85 points.

MuseTalk performs similarly to LatentSync, processing 480p video well. It scores around 75 points.

Wav2lip performs the worst. Teeth are noticeably white, and the mouths are out of sync, creating a very unnatural feel. It scores around 35-50 points.

Why did I mention 480p processing?

Because my graphics card is a P106-100 6GB, it’s super old, doesn’t support AI acceleration, and has limited VRAM.

A 1080p 18-second video takes MuseTalk 8 hours to process.

A 480p video takes MuseTalk 3 hours to process (later, I can use AI video over-resolution tools like Video2X Qt6 to upgrade the 480p to 1080p).

I’ve tested it; my computer takes 1 hour to over-resolution a 480p 18-second video to 1080p.

So I have two options:

One is a 1080p 18-second video (8 hours);

The other is a 480p 18-second video (3 hours + 080p over-resolution 1 hour), totaling 4 hours.

For a 1-minute video, multiply all times by 3.

Basically, I only need to process the video lip-sync and over-resolution overnight, and I can get a 1-minute 1080p video the next day.

Once I start earning revenue from YouTube, I’ll switch to a 3060. A 16GB RAM and a 4070 16G graphics card can increase processing speed by 5-10 times.

By the way, processing a 18-second 480p video using wav2lip takes 6 minutes, but the quality is indeed difficult to achieve.

Some ideas

I spent several days deploying MuseTalk, but thankfully I found AIGCpanel, which allows direct use of the GUI on Windows.

This also unexpectedly allowed me to test the wav2lip effect simultaneously, without needing to deploy it separately or find a pre-built package.

I don’t need to ask AI tools, browse forums, or search engines to find out which model performs best, because there are many scams and misleading claims online.

However, commercial software proactively curates mainstream models, such as latentSync, MuseTalk, wav2lip, and Heygem—these are carefully selected by companies.

Secondly, in terms of which model performs best, the paid models generally do, like the VIP models shown in the image.

With Heygem models, nobody is willing to pay for models with poor performance; this is the most reliable result of monetary selection.

Currently, I haven’t paid to try Heygem, partly because my graphics card is underpowered, and partly because I’m already very satisfied with MuseTalk’s free offline local effects.

Tips

The above are some tests and results of my video lip-syncing.

Leave a Reply

Your email address will not be published. Required fields are marked *