com.innovaphone.transcriptions

This API facilitates audio transcription by submitting audio data to the OpenAI Whisper backend. The following documentation details the messaging sequence and data structures involved in requesting and receiving transcriptions.

Messages

TranscriptionRequest
TranscriptionRequestResult
TransferTranscribedAudio

TranscriptionRequest

To initiate a transcription, the client sends a TranscriptionRequest message.

{
    mt:                         "TranscriptionRequest",
    lang:                       "de",
    task:                       "transcribe",
    prompt:                     "Please write words exactly as spoken. Mixed languages possible.",
    response_format:            "verbose_json",
    timestamp_granularities[]:  "word",
    src:                        "src0"
}

Parameters

string mtmessage type [mandatory]
string lang the language of the audio. If not provided or set to "no-entry", Whisper will automatically (not always correctly) detect the language [optional]. Supported values: "ca", "cs", "de", "en", "es", "eu", "fr", "it", "nl", "pl", "pt", "ru", "si", "tr", "no-entry"
string task the given task. Defaults to "transcribe". Other option: "translate" (transcribes and outputs translated text in the target language) [optional]
string response_format the format of returned transcription. Defaults to "text". Other possible formats are json, text, srt, verbose_json, vtt, or diarized_json [optional]. For more info, WhisperAPI
string timestamp_granularities[] the timestamp detail level for timestamped formats (ignored for pure text responses) [optional]. For more info, WhisperAPI
string srcclient-defined unique identifier for this request. Used to coordinate messages when multiple requests run in parallel [mandatory]

Notes

Response

{
    mt:      "TranscriptionRequestResult",
    url:     "http://localhost/transcriptions/?SessionId=3435407065703356068&TranscriptionId=4089590383170956585",
    src:     "src0"
}

Parameters

string mtmessage type "TranscriptionRequestResult"
string urlthe upload endpoint. Includes both service-assigned SessionId and TranscriptionId. SessionId identifies the current session. TranscriptionId refers to the specific audio stream.
string srcsame source identifier as in the request.

Response

{
    mt:                     "TransferTranscribedAudio",
    SessionID:              3435407065703356068,
    TranscriptionID:        4089590383170956585,
    DataComplete :          true,
    TranscribedAudioData :  "Happy transcriptions\n",
    DataLength :            21,
    src:                    "src0"
}

Parameters

string mtmessage type "TransferTranscribedAudio"
ulong64 SessionIDmatches SessionID from the request result
ulong64 TranscriptionIDmatches TranscriptionID from the request result
boolean DataCompleteindicates whether the transcription is complete. False means additional chunks will follow; true marks the final chunk
string TranscribedAudioDatathe transcribed text (format depends on response_format)
ulong64 DataLengththe length (in bytes) of TranscribedAudioData. This value represents the raw byte length of the TranscribedAudioData field, whether the content is plain text or a structured format such as JSON
string srcsame source identifier as in the request

Response

If an error occurs, the service sends a final TransferTranscribedAudio message indicating failure.

{
    mt:                     "TransferTranscribedAudio",
    SessionID:              3435407065703356068,
    TranscriptionID:        4089590383170956585,
    DataComplete :          true,
    TranscribedAudioData :  "",
    DataLength :            0,
    error:                  "HTTP_AUTHENTICATION_FAILED"
    src:                    "src0"
}

Parameters

string mtsame message type for consistency "TransferTranscribedAudio"
ulong64 SessionIDmatches SessionID from the request result
ulong64 TranscriptionIDmatches TranscriptionID from the request result, identifiying the failed transcription
boolean DataCompletealways true for errors
string TranscribedAudioDataempty
ulong64 DataLengthalways 0 for errors
string errorhuman readable reason
string srcsame source identifier as in the request

Typical Flow

  1. Client to Service: TranscriptionRequest
  2. Service to Client: TranscriptionRequestResult (contains SessionId & TranscriptionId)
  3. Client: Uploads audio to the provided URL
  4. Service to Client: TransferTranscribedAudio (one or more messages)

Example

You can consume the API com.innovaphone.transcriptions and send request to it. Inside the callback you can access the response text via recv.msg.mt


    var transcriptionsApi = start.consumeApi("com.innovaphone.transcriptions");

    function requestTranscription(file) {
    transcriptionsApi.sendSrc({ mt: "TranscriptionRequest", lang: "de", src: "src0" }, transcriptionsApi.providers[0], (recv) => onRequest(recv, file));
    }

    // Callback for TranscriptionRequest
    function onRequest(recv, file) {
        const msg = recv.msg;

        if (msg.mt === "TranscriptionRequestResult") {
            // Create and send the audio file via HTTP POST to the given URL
            const postUrl = msg.url;
            const httpReq = new XMLHttpRequest();
            httpReq.open("POST", postUrl, true);
            httpReq.onload = () => {
            if (httpReq.status === 200) {
                console.log("Audio uploaded successfully.");
            } else {
                console.error("Upload failed:", httpReq.statusText);
            }
        };
            httpReq.onerror = () => console.error("Network error during upload.");
            httpReq.send(file);
        }

        else if (msg.mt === "TransferTranscribedAudio") {
            console.log("Transcribed text:", msg.TranscribedAudioData);
            if (msg.DataComplete) {
                console.log("Transcription complete.");
            }
        }
    }

    requestTranscription(file);