Juliano Sirtori - Capture, Record, and Send Audio in the Browser

I am currently working on a POC to validate the development of a system that interacts with artificial intelligence, something similar to this site.

In this article, I will demonstrate how I implemented microphone capture and used the MediaRecorder class to store the audio and send it in a RESTful request for the POC I'm developing.

Capturing Audio from the Microphone

First, we need to check if the browser supports microphone access. We can do this using the following condition:

if (navigator.mediaDevices.getUserMedia) {
	// has access
} else {
	// does not have access
}

Next, we will use the getUserMedia method, passing audio input as the parameter. When calling this function, the browser will request permission to capture only the computer's audio. If the user denies permission, an error will be thrown. If permission is granted, a MediaStream object will be provided. Our code will look like this:

if (!navigator.mediaDevices.getUserMedia) {
	// does not have access
	return;
}
 
navigator.mediaDevices.getUserMedia({ audio: true }).then(
	(stream) => {
		// User granted permission
	},
	(err) => {
		// User denied permission
		console.error("The following error occurred: " + err);
	}
);

Storing Audio with `MediaRecorder`

Once we have permission to capture the audio and receive the MediaStream, we will use the MediaRecorder to store our stream of data.

According to the MDN documentation, this interface provides several methods. For our example, we will use only three: start, stop, and requestData. We can implement a button to trigger these methods. For more details on MediaRecorder, you can check out the documentation: MediaStream Recording API. Our code will look like this:

if (!navigator.mediaDevices.getUserMedia) {
	// does not have access
	return;
}
 
navigator.mediaDevices.getUserMedia({ audio: true }).then(
	(stream) => {
		// User granted permission
		const mediaRecorder = new MediaRecorder(stream);
		mediaRecorder.ondataavailable = async (e) => {
			// here we will call our request
		}
 
		const micButton = document.querySelector(".mic-button");
		micButton.onclick = async () => {
			const pressed = micButton.getAttribute("aria-pressed") === "true";
			micButton.setAttribute("aria-pressed", !pressed);
			
			if (!pressed) {
				mediaRecorder.start();
				return;
			}
			mediaRecorder.stop();
		}
			
	},
	(err) => {
		// User denied permission
		console.error("The following error occurred: " + err);
	}
);

Sending Audio to an API

Now, we just need to send the audio via a request. To do this, we'll create a blob file and send it via a POST request in FormData. This finalizes our code:

if (!navigator.mediaDevices.getUserMedia) {
	// does not have access
	return;
}
 
navigator.mediaDevices.getUserMedia({ audio: true }).then(
	(stream) => {
		// User granted permission
		const mediaRecorder = new MediaRecorder(stream);
		mediaRecorder.ondataavailable = async (e) => {
			const blob = new Blob([e.data], { type: "audio/mp3" });
			const formData = new FormData();
			formData.append("audio", blob, "recording.mp3");
			const response = await fetch("http://localhost:3001/transcribe", {
				method: "POST",
				body: formData,
			});
			// rest of the implementation...
		}
 
		const micButton = document.querySelector(".mic-button");
		micButton.onclick = async () => {
			const pressed = micButton.getAttribute("aria-pressed") === "true";
			micButton.setAttribute("aria-pressed", !pressed);
			
			if (!pressed) {
				mediaRecorder.start();
				return;
			}
			mediaRecorder.stop();
		}
			
	},
	(err) => {
		// User denied permission
		console.error("The following error occurred: " + err);
	}
);

Conclusion

We have demonstrated how to capture, record, and send audio. You can view the complete code at this link: poc-talk-ai. If you have any questions or suggestions, feel free to leave them in the comments below.

Until next time, and thanks for all the fish.

Capture, Record, and Send Audio in the Browser

Mar 03, 2024 • 3 min - read time

Capturing Audio from the Microphone

Storing Audio with MediaRecorder

Sending Audio to an API

Conclusion

Storing Audio with `MediaRecorder`