Transmit multi -channel audio to Amazon transcription using Audio Web API

Multi -channel transcription is a feature of Amazon transcription that can in many cases be used with an online browser. Creating this flow source has challenges, but with Javascript Web Audio API, you can connect and combine various audio sources such as videos, audio files or devices like microphones to get transcripts.

In this post, we instruct you on how to use two microphones like audio sources, attach them to a single two -channel audio, perform the required coding and transmit it to the Amazon transcription. A VU.JS application source code is provided that requires two microphones connected to your browser. However, the versatility of this approach extends beyond this case of use – you can adjust it to accommodate a wide range of equipment and audio resources.

With this approach, you can get transcriptions for two resources in a single transcription session in the Amazon, providing cost savings and other benefits compared to the use of a particular session for each source.

Challenges when using two microphones

In our case of use, the use of a single -channel current for two microphones and enabling the loudspeaker label to the Amazon to identify the speakers may be sufficient, but there are some considerations:

The speaker labels are randomly assigned to the start of the session, means you will need to design the results in your application after the currents have begun
May occur wrong speakers with voice -like tones, which even for a man is difficult to distinguish
Overlap of sound can occur when two speakers speak at the same time with an audio source

Using two microphone audio sources, you can address these concerns by ensuring that each transcription is from a fixed source of the input. By assigning a device a speaker, our application knows in advance which transcript to use. However, you can still encounter sound overlap if two nearby microphones are getting many voices. This can be facilitated by using directed microphones, volume management and word confidence results in Amazon.

Settlement

The following diagram illustrates the resolution work flow.

Diagram of application for two microphones

We use two audio entries with API audio online. With this API, we can combine two inputs, MIC A and MIC B, in a single source of audio data, with the left channel representing the MIC A and the right channel representing MIC B.

Then, we turn this audio source into the PCM audio (pulse code modulation). PCM is a common format for audio processing, and is one of the formats required by Amazon Transcrib for audio entry. Finally, we transmit Audio PCM to the Amazon transcription for transcription.

PRECONDITIONS

You must have the following prerequisites in place:

{
  "Version": "2012-10-17",
  "Statement": (
    {
      "Sid": "DemoWebAudioAmazonTranscribe",
      "Effect": "Allow",
      "Action": "transcribe:StartStreamTranscriptionWebSocket",
      "Resource": "*"
    }
  )
}

Start the app

Complete the following steps to start the app:

Go to the root directorate where you downloaded the code.
Create a .env file to set your AWS input keys from env.sample FILE
Install packages and execute bun install (If you are using the knot, execute node install).
Start the server online and execute bun dev (If you are using the knot, execute node dev).
Open your browser in http://localhost:5173/.

Application that operates at http: // localhost: 5173 with two microphones connected

Return of code

In this section, we consider the important parts of the code for implementation:

The first step is to list the connected microphones using the browser api navigator.mediaDevices.enumerateDevices():

const devices = await navigator.mediaDevices.enumerateDevices()
return devices.filter((d) => d.kind === 'audioinput')

Next, you have to get MediaStream object for each of the connected microphones. This can be done using navigator.mediaDevices.getUserMedia() API, which enables access to user media devices (such as cameras and microphones). Then you can get a MediaStream Object representing audio or video data from those devices:

const streams = ()
const stream = await navigator.mediaDevices.getUserMedia({
  audio: {
    deviceId: device.deviceId,
    echoCancellation: true,
    noiseSuppression: true,
    autoGainControl: true,
  },
})

if (stream) streams.push(stream)

To combine audio from multiple microphones, you need to create an audio processing audio interface. Within this AudioContextYou can use channelmergernode to attach audio streams from different microphones. connect(destination, src_idx, ch_idx) The methods of the method are:
- destination – The destination, in our case, joins.
- Src_idx -The source channel indentation, in our case both 0 (because each microphone is a single -channel audio current).
- ch_idx – Destination channel index, respectively in our case 0 and 1, to create a stereo output.

// instance of audioContext
const audioContext = new AudioContext({
       sampleRate: SAMPLE_RATE,
})
// this is used to process the microphone stream data
const audioWorkletNode = new AudioWorkletNode(audioContext, 'recording-processor', {...})
// microphone A
const audioSourceA = audioContext.createMediaStreamSource(mediaStreams(0));
// microphone B
const audioSourceB = audioContext.createMediaStreamSource(mediaStreams(1));
// audio node for two inputs
const mergerNode = audioContext.createChannelMerger(2);
// connect the audio sources to the mergerNode destination.  
audioSourceA.connect(mergerNode, 0, 0);
audioSourceB.connect(mergerNode, 0, 1);
// connect our mergerNode to the AudioWorkletNode
merger.connect(audioWorkletNode);

Microphone data is processed in an audiovorcles that issues data messages any defined number of registration frameworks. These messages will contain audio data coded in PCM format to send Amazon Transcrib. Using the P-Event library, you can repeat asynchronously for events from work. A more in -depth description of this work is given in the rest of this post.

import { pEventIterator } from 'p-event'
...

// Register the worklet
try {
  await audioContext.audioWorklet.addModule('./worklets/recording-processor.js')
} catch (e) {
  console.error('Failed to load audio worklet')
}

//  An async iterator 
const audioDataIterator = pEventIterator<'message', MessageEvent>(
  audioWorkletNode.port,
  'message',
)
...

// AsyncIterableIterator: Every time the worklet emits an event with the message `SHARE_RECORDING_BUFFER`, this iterator will return the AudioEvent object that we need.
const getAudioStream = async function* (
  audioDataIterator: AsyncIterableIterator>,
) {
  for await (const chunk of audioDataIterator) {
    if (chunk.data.message === 'SHARE_RECORDING_BUFFER') {
      const { audioData } = chunk.data
      yield {
        AudioEvent: {
          AudioChunk: audioData,
        },
      }
    }
  }
}

To begin transmitting data to the Amazon transcription, you can use fabricated and activated iterator NumberOfChannels: 2 AND EnableChannelIdentification: true To enable double channel transcription. For more information, refer to the AWS SDK StarstreamtranscriptCommand documentation.

import {
  LanguageCode,
  MediaEncoding,
  StartStreamTranscriptionCommand,
} from '@aws-sdk/client-transcribe-streaming'

const command = new StartStreamTranscriptionCommand({
    LanguageCode: LanguageCode.EN_US,
    MediaEncoding: MediaEncoding.PCM,
    MediaSampleRateHertz: SAMPLE_RATE,
    NumberOfChannels: 2,
    EnableChannelIdentification: true,
    ShowSpeakerLabel: true,
    AudioStream: getAudioStream(audioIterator),
  })

Once you send the request, a websockt link is created to exchange transmission audio data and Amazon transcription results:

const data = await client.send(command)
for await (const event of data.TranscriptResultStream) {
    for (const result of event.TranscriptEvent.Transcript.Results || ()) {
        callback({ ...result })
    }
}

result The object will include a ChannelId property you can use to identify your microphone source, such as ch_0 AND ch_1respectively.

Deep Division: Audio Work Work

Audio work rules can execute in a special thread to provide very low audio processing. The source of implementation and demonstration code can be found in public/worklets/recording-processor.js FILE

In our case, we use the work to perform two main tasks:

recondition mergerNode Audio in a repeated way. This node includes both our audio channels and is the contribution to our work.
Codify the data bytes of mergerNode The node in the PCM signed a small 16-bit audio format. We do this for each repetition or when required to issue a messaging load on our application.

The overall structure of the code to implement this is as follows:

class RecordingProcessor extends AudioWorkletProcessor {
  constructor(options) {
    super()
  }
  process(inputs, outputs) {...}
}

registerProcessor('recording-processor', RecordingProcessor)

You can pass custom options in this work example using processorOptions attribute. In our demonstration, we decided a maxFrameCount: (SAMPLE_RATE * 4) / 10 As a Bitrate guide to determine when to issue a new load of messages. A message is for example:

this.port.postMessage({
  message: 'SHARE_RECORDING_BUFFER',
  buffer: this._recordingBuffer,
  recordingLength: this.recordedFrames,
  audioData: new Uint8Array(pcmEncodeArray(this._recordingBuffer)), // PCM encoded audio format
})

PCM coding for two channels

One of the most important sections is how to codify in PCM for two channels. Following the AWS documentation in the API transcript reference to the Amazon, the audiochunk is determined by: Duration (s) * Sample Rate (Hz) * Number of Channels * 2. For two channels, 1 second at 16000Hz is: 1 * 16000 * 2 * 2 = 64000 bytes. Our coding function, then should look like this:

// Notice that input is an array, where each element is a channel with Float32 values between -1.0 and 1.0 from the AudioWorkletProcessor.
const pcmEncodeArray = (input: Float32Array()) => {
  const numChannels = input.length
  const numSamples = input(0).length
  const bufferLength = numChannels * numSamples * 2 // 2 bytes per sample per channel
  const buffer = new ArrayBuffer(bufferLength)
  const view = new DataView(buffer)

  let index = 0

  for (let i = 0; i < numSamples; i++) {
    // Encode for each channel
    for (let channel = 0; channel < numChannels; channel++) {
      const s = Math.max(-1, Math.min(1, input(channel)(i)))
      // Convert the 32 bit float to 16 bit PCM audio waveform samples.
      // Max value: 32767 (0x7FFF), Min value: -32768 (-0x8000) 
      view.setInt16(index, s < 0 ? s * 0x8000 : s * 0x7fff, true)
      index += 2
    }
  }
  return buffer
}

For more information on how the audio data blocks are handled, see the audiovoring method: Process (). For more information on PCM format coding, see Multimedia programming interface and data specifications 1.0.

cONcluSiON

In this post, we researched the details of implementing an online application that uses API Audio on the browser’s internet and Amazon transcription transcription to enable two -channel transcription in real time. Using the combination of AudioContext, ChannelMergerNodeAND AudioWorkletWe were able to process smoothly and encode audio data from two microphones before sending them to Amazon transcription for transcription. The use of AudioWorklet In particular, it allowed us to achieve low latency audio processing, providing a smooth and responsible user experience.

You can build this demonstration to create more advanced real -time transcription applications that take care of a wide range of use cases, from fulfilling recordings to voice -controlled interfaces.

Try the solution for yourself and leave your reactions in the comments.

Around the author

Jorge Lanzotti It is a SR prototyping SA in the Amazon Web Services (AWS) based on Tokyo, Japan. It helps clients in the public sector by creating innovative solutions to challenging problems.

About Us

Categories

AI news

book reviews

current events

education

history

military

Contact Info

Transmit multi -channel audio to Amazon transcription using Audio Web API

Challenges when using two microphones

Settlement

PRECONDITIONS

Start the app

Return of code

Deep Division: Audio Work Work

PCM coding for two channels

cONcluSiON

Around the author

Leave feedback about this Cancel Reply

PROS

CONS

Language Mode

Top Categories

Quick links

About Us

Categories

AI news

book reviews

current events

education

history

military

Contact Info

Transmit multi -channel audio to Amazon transcription using Audio Web API

Challenges when using two microphones

Settlement

PRECONDITIONS

Start the app

Return of code

Deep Division: Audio Work Work

PCM coding for two channels

cONcluSiON

Around the author

Share This Post:

Leave feedback about this Cancel Reply

PROS

CONS

Related Posts