A couple of months ago I was lucky enough to meet Senator Ed Markey while he was visiting Silicon Valley. It was fascinating to talk to him, and I learned that was one of the driving forces behind laws mandating closed captions on TV shows, starting as far back as 1990. I use captions myself, and I’m not alone, with over 50% of Americans using them most of the time. They’ve also had the unexpected benefit of providing great training material for speech to text models, by pairing audio with ground truth transcriptions. I told Ed he should consider himself one of the driving forces behind AI, thanks to the contribution video captions have made to voice AI!
Outside of YouTube, most pre-recorded videos on the web don’t offer captions, which is a shame, but understandable because adding them isn’t easy. The gold standard for captioning is having a person listen and manually type out what they’re hearing. This is a time-consuming process, and costs money that many organizations don’t have. Even Google relies on machine-generated captions for the vast majority of YouTube videos. It’s also not straightforward to add captions as an option to web videos even if you have created a transcript.
All this is why I’m excited to announce the public launch of MoonshineJS. This is an in-browser implementation of our lightweight speech to text models, and while you can do a lot of different things with the library, one of my favorite use cases is adding captions to videos. Here’s how you can do that with Moonshine in only five lines of code:
import * as Moonshine from "https://cdn.jsdelivr.net/npm/@moonshine-ai/moonshine-js@latest/dist/moonshine.min.js"
var video = document.getElementById("video");
var videoCaptioner = new Moonshine.VideoCaptioner(video, "model/base", false);
video.addEventListener("play", () => {
videoCaptioner.start();
});
You can see the result as a screen recording at the top of this post, try a live example for yourself, and see the complete page and script on GitHub.
I know from talking to people in the Deaf community and others who rely on captions that machine-generated transcripts in general are lower quality than human-written versions, so I don’t see this approach replacing high-quality manual subtitles. What I am hoping is that websites that currently don’t have any captions at all can add them, making the web a little more accessible.
If you’re a developer you can learn more at dev.moonshine.ai, and we’ve open sourced the code and models. We support English and Spanish, with more languages arriving soon, along with accuracy improvements across the board. Since everything is running client side, there’s no account signup, credit card, or access token needed to get started and no API usage fees. You also don’t have to worry about the service vanishing since you can keep everything you need locally, forever.
If you do use Moonshine, I’d love to hear your thoughts and feedback, please do get in touch.
Pingback: Pete Warden: How to caption videos in Javascript | ResearchBuzz: Firehose