Generating Lip Sync

Core Guides

Learn how to generate AI lip sync videos using URLs or direct file uploads.

CHAMELAION provides two endpoints for generating lip sync videos: one accepts media URLs and another accepts direct file uploads. Both create an asynchronous job and return a request_id for tracking.

Option 1: Generate from URLs

Use POST /v1/lipsync/generate when your video and audio are hosted at publicly accessible URLs. This is the most common integration pattern.

Request format

{
  "reference_id": "normal-lipsync-demo",
  "disable_active_speaker_detection": false,
  "inputs": [
    {
      "type": "video",
      "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.mp4"
    },
    {
      "type": "audio",
      "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.wav"
    }
  ]
}

Parameters

Parameter	Type	Required	Description
`inputs`	array	Yes	Exactly two items: one `video` and one `audio` input (order doesn’t matter)
`inputs[].type`	string	Yes	Either `"video"` or `"audio"`
`inputs[].url`	string	Yes	Publicly accessible URL to the media file
`reference_id`	string	No	Your own identifier for this request, useful for linking to your systems
`disable_active_speaker_detection`	boolean	No	Set to `true` to skip active speaker detection and use max-face mode (default: `false`)

cURL example

curl -X POST https://api.chamelaion.com/api/v1/lipsync/generate \
  -H "Authorization: Bearer $CHAMELAION_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "reference_id": "normal-lipsync-demo",
    "inputs": [
      {"type": "video", "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.mp4"},
      {"type": "audio", "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.wav"}
    ]
  }'

Python example

import requests
import os

response = requests.post(
    "https://api.chamelaion.com/api/v1/lipsync/generate",
    headers={
        "Authorization": f"Bearer {os.environ['CHAMELAION_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "reference_id": "normal-lipsync-demo",
        "inputs": [
            {"type": "video", "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.mp4"},
            {"type": "audio", "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.wav"},
        ],
    },
)

result = response.json()
print(f"Request ID: {result['request_id']}")

TypeScript example

const response = await fetch(
  "https://api.chamelaion.com/api/v1/lipsync/generate",
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.CHAMELAION_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      reference_id: "normal-lipsync-demo",
      inputs: [
        {
          type: "video",
          url: "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.mp4",
        },
        {
          type: "audio",
          url: "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.wav",
        },
      ],
    }),
  },
);

const result = await response.json();
console.log(`Request ID: ${result.request_id}`);

Response

{
  "status": "success",
  "request_id": "6f82a2d8-a6d4-4e8a-a0fa-e8b09823a2d8"
}

Option 2: Generate from uploaded files

Use POST /v1/lipsync/generate-with-media when you need to upload files directly rather than providing URLs. This endpoint accepts multipart/form-data.

Parameters

Field	Type	Required	Description
`video`	file	Yes	Source video file (MP4)
`audio`	file	Yes	Target audio file (WAV or MP3)
`model`	string	No	Model to use (currently `"lipsync-2"`)
`reference_id`	string	No	Your own identifier for this request
`disable_active_speaker_detection`	boolean	No	Set to `true` to skip speaker detection (default: `false`)

cURL example

curl -X POST https://api.chamelaion.com/api/v1/lipsync/generate-with-media \
  -H "Authorization: Bearer $CHAMELAION_API_KEY" \
  -F "video=@/path/to/source-video.mp4" \
  -F "audio=@/path/to/target-audio.wav" \
  -F "reference_id=upload-demo-01"

Python example

import requests
import os

with open("source-video.mp4", "rb") as video, open("target-audio.wav", "rb") as audio:
    response = requests.post(
        "https://api.chamelaion.com/api/v1/lipsync/generate-with-media",
        headers={"Authorization": f"Bearer {os.environ['CHAMELAION_API_KEY']}"},
        files={
            "video": ("video.mp4", video, "video/mp4"),
            "audio": ("audio.wav", audio, "audio/wav"),
        },
        data={
            "reference_id": "upload-demo-01",
        },
    )

result = response.json()
print(f"Request ID: {result['request_id']}")

TypeScript example (Node.js)

import { createReadStream } from "fs";

const formData = new FormData();
formData.append("video", createReadStream("source-video.mp4"));
formData.append("audio", createReadStream("target-audio.wav"));
formData.append("reference_id", "upload-demo-01");

const response = await fetch(
  "https://api.chamelaion.com/api/v1/lipsync/generate-with-media",
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.CHAMELAION_API_KEY}`,
    },
    body: formData,
  },
);

const result = await response.json();
console.log(`Request ID: ${result.request_id}`);

The response format is identical to the URL-based endpoint.

Active speaker detection

Active speaker detection (ASD) decides which visible face is currently speaking before applying lip sync. When ASD is enabled, CHAMELAION analyzes the video and audio together, chooses the active speaker, and syncs only that person’s mouth. This is the default behavior.

ASD is useful for:

Multi-person interview or conversation videos
Videos with background faces or audience members
News broadcasts with anchor and on-screen graphics

Example with active speaker detection

Omit disable_active_speaker_detection or set it to false to use ASD. This example uses the active-speaker test media:

{
  "reference_id": "with-asd-demo",
  "disable_active_speaker_detection": false,
  "inputs": [
    {
      "type": "video",
      "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.mp4"
    },
    {
      "type": "audio",
      "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.wav"
    }
  ]
}

Example without active speaker detection

Set disable_active_speaker_detection to true to skip ASD and use max-face mode instead. In max-face mode, CHAMELAION syncs the largest detected face in the video. This can be faster and works well for simple single-speaker videos where the main face is clearly the target.

{
  "reference_id": "without-asd-demo",
  "disable_active_speaker_detection": true,
  "inputs": [
    {
      "type": "video",
      "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Video.mp4"
    },
    {
      "type": "audio",
      "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Audio.wav"
    }
  ]
}

Using reference IDs

The reference_id field lets you tag requests with your own identifiers. This is useful for:

Linking CHAMELAION requests to your internal database records
Retrieving requests by your own ID instead of the CHAMELAION UUID
Batch tracking and reporting

{
  "reference_id": "order-12345-japanese-dub",
  "inputs": [
    {
      "type": "video",
      "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.mp4"
    },
    {
      "type": "audio",
      "url": "https://storage.googleapis.com/chamelaion-test-media/Example_Active_Speaker_Detection.wav"
    }
  ]
}

You can then retrieve the request using:

curl https://api.chamelaion.com/api/v1/lipsync/requests/order-12345-japanese-dub \
  -H "Authorization: Bearer $CHAMELAION_API_KEY"

Or filter your request list:

curl "https://api.chamelaion.com/api/v1/lipsync/requests?reference_id=order-12345-japanese-dub" \
  -H "Authorization: Bearer $CHAMELAION_API_KEY"

Which endpoint should I use?

Use URL-based generation when…

Your media is already hosted (S3, GCS, CDN, etc.)
You’re building a server-side pipeline
You want to avoid upload overhead
You’re processing many files in a batch workflow

Use file upload when…

You’re working with local files
Your media isn’t publicly accessible
You’re building a desktop or CLI tool
You need to process files before they’re stored permanently

Generating Lip Sync

Option 1: Generate from URLs

Request format

Parameters

cURL example

Python example

TypeScript example

Response

Option 2: Generate from uploaded files

Parameters

cURL example

Python example

TypeScript example (Node.js)

Active speaker detection

Example with active speaker detection

Example without active speaker detection

Using reference IDs

Which endpoint should I use?

What can I help you with?

Suggestions