Video Transcription using Cloudinary Add-on in a Nuxt 3 Application

Video Transcription using Cloudinary Add-on in a Nuxt 3 Application

Video transcription is the process of converting the spoken words in a video into written text. This can be done automatically using speech recognition technology or manually by transcribers. The transcribed text is usually synchronised with the video, allowing viewers to read along as they watch.
There are a variety of use cases for video transcription, including:

  • Accessibility: Transcription can make videos more accessible to people with hearing impairments or those who speak different languages.

  • Searchability: Transcribed videos can be searched and indexed, making it easier for viewers to find specific information within the video.

  • Subtitling: Transcription can be used to create subtitles for videos, making them more accessible to a broader audience.

  • Language Learning: Transcription can be used as a tool for language learning, allowing students to read along with native speakers and improve their listening and comprehension skills.

This implies that video transcription is a necessity for improved accessibility. As a result, several SaaS companies, such as Cloudinary, offer a method to transcribe videos smoothly.

Cloudinary, a cloud-based platform for managing and optimising media assets such as images and videos, also offers a video transcription add-on. The add-on converts the audio in our videos to text automatically. It applies an effective AI algorithm to our videos using Google’s Cloud Speech API to generate the best possible speech recognition results. These generated texts can be overlayed on our video as subtitles using Cloudinary video transformations.

In this article, we will build a Nuxt application that uploads a video to Cloudinary and applies the Cloudinary transcription add-on on upload. Then we will use the Cloudinary transformation API to overlay the AI-generated text on the video. Finally, we will make the transformed video downloadable.

Application Repository and Demo

Repo: https://github.com/yhuakim/nuxi-transcription
Demo
: https://nuxi-transcription.netlify.app/

Prerequisites

  • A Cloudinary free tier account and valid API keys.

  • Good knowledge of Nuxt version 3.

Registering the Google AI Video Transcription Addon

After creating a free tier account, enable the Google AI video transcription addon to use the Cloudinary transcription endpoint. We can do this by following the steps below:

  1. Navigate to the addon tab, then click the Google AI video transcription card as shown in the image below:

    Cloudinary Addon setup; step1

  2. The preceding step will take us to the screen below, where we can select the Free plan link. This will enable our free plan.

    Cloudinary Addon setup; step2

Now, we have successfully registered the Google AI video transcription addon.

Setting Up a Development Environment

To set up a development environment, we will need to run the following commands in the terminal:

npx nuxi init video-transcription && cd video-transcriptio

This command creates a Nuxt project in a directory called video-transcription and navigates into the directory.

Next, we need to install the Nuxt3 dependencies by running the following command in the terminal:

yarn install

Installing Dependencies
After initializing a Nuxt project, we will add other dependencies necessary for the current project. These dependencies include:

Cloudinary: A Node.js SDK for media management. We can add this package by entering the following command:

yarn add cloudinary

File saver: A package to download the transcribed video to the storage device. This can be added using the following command:

yarn add file-saver

TailwindCSS: A CSS framework to assist with the page styles. To use this, we can run the following command:

yarn add -D tailwindcss postcss autoprefixer
npx tailwindcss init -p

Then, we can follow the instructions here for a complete guide to set up TailwindCSS in a Nuxt3 application.

Building Out The User Interface

To create the UI of this project, we will split it into two major sections namely:

  • The Form handling section: This will handle the user’s input, the video file and the upload to our Cloudinary endpoint.

  • The Transcribed video preview: Here, the returned transcribed video is displayed to the user with a download button.

Before we start writing the code, let's adjust our file structure as follows:

  1. Navigate to the project directory and create a pages folder.

  2. Then, open the app.vue file and add <NuxtPage /> component as shown below, and then run yarn dev -o to start the development server.

<template>
    <div>
       <NuxtPage />
    </div>
</template>

Form Handling
Now let’s create a file in the pages folder called index.vue then, add the following snippet

<!-- pages/index.vue -->
    <template>
      <div class="max-w-[60rem] grid justify-center items-center mx-auto py-4">
        <header class="text-4xl font-bold text-gray-700 text-center">
          <h1>Video Transcription App</h1>
        </header>
        <main class="py-5">
          <section class="py-5 flex">
            <form @submit="handleSubmit">
              <div class="upload">
                <label
                  for="video-file"
                  class="block mb-2 text-sm font-medium text-gray-500"
                  >Upload a Video file</label
                >
                <input
                  type="file"
                  @change="handleChange"
                  name="video-file"
                  id="video-file"
                  class="g-gray-50 border border-gray-300 text-gray-900 text-sm 
                  rounded-lg focus:ring-blue-500 focus:border-blue-500 block w-full p-2.5"
                />
              </div>
              <button
                v-if="selected !== null"
                type="submit"
                class="text-white bg-gradient-to-r from-blue-500 via-blue-600 
                to-blue-700 hover:bg-gradient-to-br focus:ring-4 focus:outline-none 
                focus:ring-blue-300 dark:focus:ring-blue-800 font-medium rounded-lg 
                text-sm px-5 py-2.5 text-center mr-2 mb-5 mt-2"
              >
                Upload
              </button>
            </form>
          </section>
        </main>
      </div>
    </template>

The snippet above is a markup for our Form, which consists of an input element with a type of file and a button with a type of submit. The submit button is configured to be visible only when a user has selected a valid input.

Next, let’s create the handleSubmit and handleChange methods that we have defined in the markup above. Let’s add the following snippet just above our template markup

<!-- pages/index.vue -->
    <script setup>
    const selected = useState("selected", () => null);
    const videoUrl = useState("videoUrl", () => "");
    const downloaded = useState("downloaded", () => false);

    const handleChange = (e) => {
      if (e.target.files && e.target.files[0]) {
        const i = e.target.files[0];
        let reader = new FileReader();
        reader.onload = () => {
          let base64String = reader.result;
          selected.value = base64String;
        };
        reader.readAsDataURL(i);
      }
    };

    const handleSubmit = async (e) => {
      e.preventDefault();
      try {
        const body = JSON.stringify(selected.value);

        const config = {
          headers: {
            "Content-Type": "application/json",
          },
        };
        const { data } = await useFetch(`/transcribe`, {
          method: "POST",
          headers: config.headers,
          body,
        });
        videoUrl.value = JSON.parse(data.value.data);
      } catch (error) {
        console.error(error);
      }
    };
    </script>

The snippet above shows how we manage our states, monitor and process the user’s input using the handleChange method and what to do with the file once the user submits it. To manage our states we will use the Nuxt provided useState composable. To learn more about the useState composable visit here.

In the handleChange method, we will convert the user-selected video file into a data url format using Javascript’s built-in FileReader() constructor. Then we will assign it to the selected state.

Then, in the handleSubmit method, we will:

  • prevent the default action of the onSubmit event by calling the preventDefault() function

  • send a POST request to our server route (/transcribe), with the value of the selected state as the body of the request

  • now assign the response to our videoUrl state

Creating the Server Route
From the above snippet, we are making a POST request to the /transcribe route but we have not created this route. To create this route, let’s navigate to the project directory and create a folder named server. Inside this folder, let’s create another folder called routes and create a file called transcribe.js.

In the transcribe.js file, we will handle two major logic:

  • Upload and Transcription: Here we will use the Cloudinary upload API to upload the user-selected video file. Also, we will ask Cloudinary to transcribe this video using the transcription addon.

  • Video Transformation: Here we will use the Cloudinary video transformation API to overlay the generated text on the video.

First, let’s add the following snippet to /transcribe.js:

    import { v2 as Cloudinary } from 'cloudinary';
    import { parse } from 'path'

    export default defineEventHandler(async (event) => {
        const { name, path } = await readBody(event)
        const filename = parse(name).name

        try {
            Cloudinary.config({
                cloud_name: process.env.CLOUD_NAME,
                api_key: process.env.API_KEY,
                api_secret: process.env.API_SECRET,
                secure: true
            });

            await Cloudinary.uploader.upload(path,
                {
                    resource_type: "video",
                    public_id: `demo/${filename}`,
                    raw_convert: "google_speech:srt:vtt"
                },
                function (error, result) {
                    if (result) {
                        return result
                    }
                });

        } catch (error) {
            console.log(error);
        }
    })

In the snippet above, we

  • imported the cloudinary and path packages

  • created a handler function for our server logic which takes an event parameter.

  • destructured the name and path from the body of our request using Nitro’s built-in readBody query and passing in the event parameter.

  • trimmed off the file extension from the user-selected file name using the parse method we imported, by passing in the file name and then retrieving only the name attribute.

  • set up a Cloudinary instance by passing in our cloud_name, api_key and api_secret, inside a try-catch statement.

  • used the Upload method provided by the Cloudinary instance to upload the video file located at the path stored in our path variable. Additionally, we will set the raw_convert option to google_speech:srt:vtt. This will trigger an automatic transcription of the video as it uploads.

Once this process is completed, we have one more task to perform, and that’s to overlay the generated text file on our video. To do this, we will add the following snippet just below our upload method.

    // server/routes/transcribe.js
    const transcribedVideo = Cloudinary.url(`demo/${filename}`, {
          resource_type: "video",
          loop: false,
          controls: true,
          autoplay: true,
          fallback_content: "Your browser does not support HTML5 video tags",
          transformation: [
              {
                  overlay: {
                      resource_type: "subtitles",
                      public_id: `demo/${filename}.en-US.srt`
                  }
              },
              { flags: "layer_apply" }
          ]
      })
    return {
        statusCode: 200,
        data: JSON.stringify(transcribedVideo)
    };

In the above snippet, we;

  • used the Cloudinary url endpoint to retrieve the uploaded video by its name, and then we applied an overlay transformation to it. The overlay will be of resource_type subtitles, then we will provide the location of our generated text using the public_id option.

  • returned the url generated from our Cloudinary transformation as a JSON string to the client side with a statusCode of 200.

Now we have successfully created our server logic and can post and retrieve data from our client side, let’s display our transcribed video to the user.

Displaying The Transcribed Video and Adding a Download Button
Here we will strive to display the transcribed video as returned from our server logic to our users. Additionally, we will provide a download button, so the user can save the video to their local storage.

Now let’s navigate to our pages/index.vue file and add the following snippet just below the Form section.

    <section class="transcribed-preview">
      <!-- preview Transcribed video here -->
      <div v-if="videoUrl === ''" class="">
        Your transcribed video will appear here
      </div>
      <div v-else class="max-w-[35rem] shadow-2xl">
        <video autoplay controls className="mb-5 w-100">
          <source
            :src="videoUrl ? `${videoUrl}.webm` : ''"
            type="video/webm"
          />
          <source :src="videoUrl ? `${videoUrl}.mp4` : ''" type="video/mp4" />
          <source :src="videoUrl ? `${videoUrl}.ogv` : ''" type="video/ogg" />
        </video>
      </div>
      <button
        v-if="videoUrl !== ''"
        @click="handleDownload"
        class="text-white bg-gradient-to-r from-blue-500 via-blue-600 to-blue-700 
        hover:bg-gradient-to-br focus:ring-4 focus:outline-none focus:ring-blue-300 
        dark:focus:ring-blue-800 font-medium rounded-lg text-sm px-5 py-2.5 
        text-center mr-2 mb-2"
        :disabled="downloaded ? true : false"
      >
        Download Video
      </button>
    </section>

The above snippet only displays when we have a response from our server. Now let’s handle our download button logic. To do this we will add the following snippet inside our script element.

    import { saveAs } from "file-saver";

    const handleDownload = () => {
      if (videoUrl.value !== "") {
        saveAs(videoUrl.value, "transcribed video");
        downloaded.value = true;
      }
    };

The method above uses the saveAs method we imported to initiate a download when the button is clicked. The saveAs method takes in two parameters; the first is the url of the file and the second is the name of the file that will be saved.

Finally, we will have the following screen once we have successfully executed all these procedures.

Dropbox - video-transcription-demo.mp4 - Simplify your life

Conclusion

This article shows a managed method to implement video transcription with Cloudinary in a Nuxt version 3 application. It exposes the usage of server routes to protect and manage our API keys. Additionally, we explored the use of Nuxt3 reactive state management to manage our application states.

Resources

The following are resources for further reading and guides to set up our next project.