Skip to main content

✍️ Editing Transcripts Before Rendering

Auto-transcription is accurate, but it won't always get brand names, people's names, or industry-specific terms right. This guide walks through editing the transcript before the video is rendered, so corrections appear in the final output without any post-production cleanup.

Flow overview

POST /videos/{videoId}/task   (autoApprove: false)

GET /videos/{videoId}/task/{taskId} (poll until transcriptionCompleted)

Download transcript JSON from the `transcript` URL

Edit the word entries in your app/UI

PUT /videos/{videoId}/task/{taskId}/transcript (save edits)

POST /videos/{videoId}/task/{taskId}/approve-transcript (trigger render)
Common mistake

POST /approve-transcript does not accept a request body. It just triggers rendering using whatever transcript is currently stored on the task. If you skip the PUT step and send your edits to approve-transcript, the edits are silently ignored and the video renders with the original output.

Step 1: Create the task with autoApprove: false

See API Reference: Create Video Task

curl -X POST "https://api.zapcap.ai/videos/YOUR_VIDEO_ID/task" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"templateId": "YOUR_TEMPLATE_ID",
"autoApprove": false,
"language": "en"
}'

Save the returned taskId.

Step 2: Poll until transcription completes

curl -X GET "https://api.zapcap.ai/videos/YOUR_VIDEO_ID/task/YOUR_TASK_ID" \
-H "x-api-key: YOUR_API_KEY"

Wait for status to become transcriptionCompleted. A 2–5 second poll interval is fine.

The response includes a transcript field pointing to a signed URL where the generated transcript JSON can be downloaded.

Step 3: Download and edit the transcript

Download the transcript JSON from the URL returned in Step 2. It's an array of word entries:

[
{
"text": "Acme",
"type": "word",
"start_time": 0.12,
"end_time": 0.48,
"confidence": 0.82
},
{
"text": "Corp",
"type": "word",
"start_time": 0.52,
"end_time": 0.86,
"confidence": 0.77
}
]

Each entry has:

  • text (string) — the word itself
  • type ("word" | "punctuation")
  • start_time / end_time (number, seconds)
  • emoji (string, optional)
  • important (boolean, optional — flags the word for highlight rendering)
  • fontId (string, optional — per-word font override; see Custom Fonts)

Present this array to your editing UI. Users typically only change text, but all fields except confidence can be updated.

Step 4: PUT the edited transcript

See API Reference: Update Transcript

Send the edited array back as the request body:

curl -X PUT "https://api.zapcap.ai/videos/YOUR_VIDEO_ID/task/YOUR_TASK_ID/transcript" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d @edited-transcript.json

Entries must stay time-ordered and non-overlapping: each end_time >= start_time, and each entry's start_time must be >= the previous entry's end_time.

Step 5: Approve and render

See API Reference: Approve Transcript

curl -X POST "https://api.zapcap.ai/videos/YOUR_VIDEO_ID/task/YOUR_TASK_ID/approve-transcript" \
-H "x-api-key: YOUR_API_KEY"

No body. This promotes the task out of the transcriptionCompleted state and kicks off rendering with the transcript you just saved. Poll GET /videos/{videoId}/task/{taskId} again until status === 'completed', then download from downloadUrl.

Node.js example

const BASE = "https://api.zapcap.ai";
const headers = {
"x-api-key": process.env.ZAPCAP_API_KEY,
"Content-Type": "application/json",
};

async function captionWithEdits(videoId, templateId, editFn) {
// 1. Create task
const { taskId } = await fetch(`${BASE}/videos/${videoId}/task`, {
method: "POST",
headers,
body: JSON.stringify({ templateId, autoApprove: false, language: "en" }),
}).then((r) => r.json());

// 2. Poll until transcribed
let task;
while (true) {
task = await fetch(`${BASE}/videos/${videoId}/task/${taskId}`, {
headers,
}).then((r) => r.json());
if (task.status === "transcriptionCompleted") break;
if (task.status === "failed") throw new Error("Transcription failed");
await new Promise((r) => setTimeout(r, 3000));
}

// 3. Download transcript
const transcript = await fetch(task.transcript).then((r) => r.json());

// 4. Let the caller edit it
const edited = await editFn(transcript);

// 5. PUT edits
await fetch(`${BASE}/videos/${videoId}/task/${taskId}/transcript`, {
method: "PUT",
headers,
body: JSON.stringify(edited),
});

// 6. Approve
await fetch(`${BASE}/videos/${videoId}/task/${taskId}/approve-transcript`, {
method: "POST",
headers,
});

return taskId;
}

Tips

  • Pre-seed corrections. If you already know the brand names and jargon that is often wrong, pass them via the dictionary field on POST /task to improve first-pass accuracy. This reduces how much your reviewers need to edit.
  • Bring your own transcript. If you already have word-level timing from another system, pass it via the transcript field on POST /task and skip transcription entirely.
  • Per-word styling. Setting important: true on a word makes the template apply its highlight style. Per-word fontId also works for mixing fonts within a single caption.