Skip to main content

⚙️ Advanced Options

This guide covers advanced features for power users who need more control over transcription, translation, and export settings.

Translation

Translate your video's subtitles into a different language using the translateTo parameter.

{
"templateId": "a51c5222-47a7-4c37-b052-7b9853d66bf6",
"autoApprove": true,
"translateTo": "es"
}

The API will transcribe your video in its original language, then translate the subtitles to the target language.

Supported Languages

Both language (source) and translateTo (target) support the following language codes:

af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh, yue

Dictionary

Improve transcription accuracy for domain-specific terms using the dictionary parameter. This is useful for brand names, product names, technical terms, or any words the transcription engine might not recognize correctly.

{
"templateId": "a51c5222-47a7-4c37-b052-7b9853d66bf6",
"autoApprove": true,
"dictionary": ["ZapCap", "OpenAI", "GPT-4", "Kubernetes"]
}

The transcription engine will use these words as hints to improve recognition accuracy.

Reference Transcript

Use a reference transcript for spelling correction when you have phrase-level timing (like an SRT file) but need word-level timestamps.

{
"templateId": "a51c5222-47a7-4c37-b052-7b9853d66bf6",
"autoApprove": true,
"referenceTranscript": [
{
"text": "Hello world, how are you?",
"start_time": 1.5,
"end_time": 3.2
},
{
"text": "Welcome to ZapCap.",
"start_time": 3.5,
"end_time": 5.0
}
]
}

The API will transcribe the video to generate word-level timing, then use your reference phrases to correct spellings via matching. The timing bounds improve matching confidence.

tip

If you already have word-level timing, use the transcript parameter instead to bypass transcription entirely.

Bring Your Own Transcript (BYOT)

Skip transcription entirely by providing your own word-level transcript using the transcript parameter.

{
"templateId": "a51c5222-47a7-4c37-b052-7b9853d66bf6",
"autoApprove": true,
"transcript": [
{
"text": "Hello",
"type": "word",
"start_time": 1.5,
"end_time": 1.8
},
{
"text": "world",
"type": "word",
"start_time": 1.9,
"end_time": 2.3
}
]
}

Word Entry Properties

PropertyTypeRequiredDescription
textstringYesThe subtitle text
typestringYesEntry type (use "word")
start_timenumberYesStart time in seconds
end_timenumberYesEnd time in seconds
emojistringNoEmoji representation of the word
importantbooleanNoMark word as important for highlighting
fontIdstringNoCustom font ID for this word

See the API Reference for the full list of word entry properties.

Export Settings

Control the output quality and format using exportSettings.

{
"templateId": "a51c5222-47a7-4c37-b052-7b9853d66bf6",
"autoApprove": true,
"exportSettings": {
"fps": 60,
"quality": "quadHD",
"speed": "fast"
}
}

Properties

PropertyTypeOptionsDefaultDescription
fpsnumber12120 (e.g. 23.976, 25, 29.97, 30, 59.94, 60)30Frames per second. Non-integer rates are supported for NLE timelines.
qualitystring"standard", "quadHD", "ultraHD""standard"Output resolution (1080p, 2K, 4K). Ignored when outputMode is "transparent" and explicit width/height are provided.
speedstring"standard", "fast", "ultraFast""standard"Export speed (each level ~2x faster)
outputModestring"composited", "greenScreen", "transparent""composited"Controls how captions are composited. See Output Modes.
outputCodecstring"prores4444", "vp9""prores4444"Codec for transparent renders only. See Transparent Overlay Renders.
widthnumber163840(from source)Required when outputMode is "transparent". Caps at 4K.
heightnumber163840(from source)Required when outputMode is "transparent". Caps at 4K.
greenScreenbooleantrue, falsefalseLegacy flag for chroma-key green output. Prefer outputMode: "greenScreen" going forward.

Quality Levels

  • standard: 1080p resolution
  • quadHD: 2K resolution
  • ultraHD: 4K resolution

Output Modes

outputMode controls how captions are composited into the rendered file:

  • "composited" (default) — captions are burned onto the source video. Output is .mp4.
  • "greenScreen" — captions on a #04F404 green canvas, no source video. Drop into your editor and chroma-key the green out. Output is .mp4.
  • "transparent" — captions on a fully transparent canvas (real alpha channel), no source video. Drop directly on a track above your editor's source clip and the captions composite cleanly without any keying. See below.

Transparent Overlay Renders

Transparent renders produce a caption-only video with a real alpha channel, suitable for use as an overlay layer in any NLE (Premiere, DaVinci Resolve, Final Cut, After Effects, Avid). The original clip stays untouched on its own track — there's no transcode of the source video.

{
"templateId": "a51c5222-47a7-4c37-b052-7b9853d66bf6",
"autoApprove": true,
"exportSettings": {
"outputMode": "transparent",
"outputCodec": "prores4444",
"fps": 23.976,
"width": 1920,
"height": 1080
}
}

Codec choice:

CodecContainerNLE supportFile size (60s 1080p)When to use
prores4444.movNative in Premiere, Resolve, FCP, AE, Avid~5GBDefault. Editor handoff.
vp9.webmNative in browsers; Premiere needs an importer~150MBWeb overlay, browser preview, bandwidth-sensitive workflows

Required when transparent:

  • width and height — the rendered overlay's pixel dimensions. Match your NLE sequence to avoid scaling at composite time.

Recommended:

  • fps — match your NLE sequence's frame rate exactly (23.976, 25, 29.97, etc). A mismatched fps will drift over long clips.

Source uploads:

  • For overlay use cases, you can upload audio-only files (e.g. .mp3, .wav) to skip the video upload entirely — the transparent render only needs the audio for transcription and the dimensions/fps you supply in exportSettings. Audio-only uploads are only valid with outputMode: "transparent".
Cost Impact

Higher resolution, FPS, and the transparent codecs increase processing costs. See Billing for the multiplier table.

Complete Example

Here's an example combining multiple advanced options:

POST /videos/{videoId}/task
Content-Type: application/json
x-api-key: YOUR_API_KEY

{
"templateId": "a51c5222-47a7-4c37-b052-7b9853d66bf6",
"autoApprove": true,
"language": "en",
"translateTo": "es",
"dictionary": ["ZapCap", "API"],
"exportSettings": {
"fps": 60,
"quality": "quadHD",
"speed": "fast"
}
}

This creates a task that:

  1. Transcribes the video in English
  2. Uses the dictionary to improve recognition of "ZapCap" and "API"
  3. Translates the subtitles to Spanish
  4. Exports at 60fps in 2K resolution with faster processing