This API facilitates audio transcription by submitting audio data to the OpenAI Whisper backend. The following documentation details the messaging sequence and data structures involved in requesting and receiving transcriptions.
To initiate a transcription, the client sends a TranscriptionRequest message.
{
mt: "TranscriptionRequest",
lang: "de",
task: "transcribe",
prompt: "Please write words exactly as spoken. Mixed languages possible.",
response_format: "verbose_json",
timestamp_granularities[]: "word",
src: "src0"
}
| string mt | message type [mandatory] |
| string lang | the language of the audio. If not provided or set to "no-entry", Whisper will automatically (not always correctly) detect the language [optional]. Supported values: "ca", "cs", "de", "en", "es", "eu", "fr", "it", "nl", "pl", "pt", "ru", "si", "tr", "no-entry" |
| string task | the given task. Defaults to "transcribe". Other option: "translate" (transcribes and outputs translated text in the target language) [optional] |
| string response_format | the format of returned transcription. Defaults to "text". Other possible formats are json, text, srt, verbose_json, vtt, or diarized_json [optional]. For more info, WhisperAPI |
| string timestamp_granularities[] | the timestamp detail level for timestamped formats (ignored for pure text responses) [optional]. For more info, WhisperAPI |
| string src | client-defined unique identifier for this request. Used to coordinate messages when multiple requests run in parallel [mandatory] |
{
mt: "TranscriptionRequestResult",
url: "http://localhost/transcriptions/?SessionId=3435407065703356068&TranscriptionId=4089590383170956585",
src: "src0"
}
| string mt | message type "TranscriptionRequestResult" |
| string url | the upload endpoint. Includes both service-assigned SessionId and TranscriptionId. SessionId identifies the current session. TranscriptionId refers to the specific audio stream. |
| string src | same source identifier as in the request. |
{
mt: "TransferTranscribedAudio",
SessionID: 3435407065703356068,
TranscriptionID: 4089590383170956585,
DataComplete : true,
TranscribedAudioData : "Happy transcriptions\n",
DataLength : 21,
src: "src0"
}
| string mt | message type "TransferTranscribedAudio" |
| ulong64 SessionID | matches SessionID from the request result |
| ulong64 TranscriptionID | matches TranscriptionID from the request result |
| boolean DataComplete | indicates whether the transcription is complete. False means additional chunks will follow; true marks the final chunk |
| string TranscribedAudioData | the transcribed text (format depends on response_format) |
| ulong64 DataLength | the length (in bytes) of TranscribedAudioData. This value represents the raw byte length of the TranscribedAudioData field, whether the content is plain text or a structured format such as JSON |
| string src | same source identifier as in the request |
If an error occurs, the service sends a final TransferTranscribedAudio message indicating failure.
{
mt: "TransferTranscribedAudio",
SessionID: 3435407065703356068,
TranscriptionID: 4089590383170956585,
DataComplete : true,
TranscribedAudioData : "",
DataLength : 0,
error: "HTTP_AUTHENTICATION_FAILED"
src: "src0"
}
| string mt | same message type for consistency "TransferTranscribedAudio" |
| ulong64 SessionID | matches SessionID from the request result |
| ulong64 TranscriptionID | matches TranscriptionID from the request result, identifiying the failed transcription |
| boolean DataComplete | always true for errors |
| string TranscribedAudioData | empty |
| ulong64 DataLength | always 0 for errors |
| string error | human readable reason |
| string src | same source identifier as in the request |
SessionId & TranscriptionId)DataComplete = false means streaming in progressDataComplete = true means final message
You can consume the API com.innovaphone.transcriptions and send request to it. Inside the callback you can access the response text via recv.msg.mt
var transcriptionsApi = start.consumeApi("com.innovaphone.transcriptions");
function requestTranscription(file) {
transcriptionsApi.sendSrc({ mt: "TranscriptionRequest", lang: "de", src: "src0" }, transcriptionsApi.providers[0], (recv) => onRequest(recv, file));
}
// Callback for TranscriptionRequest
function onRequest(recv, file) {
const msg = recv.msg;
if (msg.mt === "TranscriptionRequestResult") {
// Create and send the audio file via HTTP POST to the given URL
const postUrl = msg.url;
const httpReq = new XMLHttpRequest();
httpReq.open("POST", postUrl, true);
httpReq.onload = () => {
if (httpReq.status === 200) {
console.log("Audio uploaded successfully.");
} else {
console.error("Upload failed:", httpReq.statusText);
}
};
httpReq.onerror = () => console.error("Network error during upload.");
httpReq.send(file);
}
else if (msg.mt === "TransferTranscribedAudio") {
console.log("Transcribed text:", msg.TranscribedAudioData);
if (msg.DataComplete) {
console.log("Transcription complete.");
}
}
}
requestTranscription(file);