Moving to a cloud transcoding pipeline

In Vibbio we transcode a lot of files. We need to make versions of the videos that will play nicely in web browsers, and we need to generate thumbnails.

What we had

Our first solution was to use FFMPEG, an incredibly powerful tool, running on App Engine in the Google Cloud.

Since transcoding is not the core of what we do, we preferred to use an external solution to do it. Transcoding is not easy when you have multiple sources, so leaving the heavy-lifting on codecs infrastructure and scaling on someone else’s hands is a big win for us.

Choosing a solution

After some research, our VP of Product, Alexandra Leisse, suggested to go with a cloud transcoding solution. Coconut seemed to fit our needs. There were other solutions, like Zencoder, but they were similar enough and Coconut is cheaper on our calculations.

Flow

We had a Java backend that runs FFMPEG for transcoding and thumbnail generation of the videos. The new solution should integrate easily with the Java backend, transcode the video, generate the thumbnail, and upload everything to Google Cloud Storage.

Since we had decided to move the backend from Java to Node.js, we decided we didn’t want to put the full responsibility of the HTTP call to Coconut on the Java backend.

We also wanted to automate the transcoding as much as possible, and a future step would be to automatically transcode the videos once they are uploaded to Storage.

The obvious solution was a Google Cloud Function. It could be triggered now by an HTTP request from the Java backend and, once ready, we could trigger it directly from the upload trigger on Storage with little code changes.

To decouple the transcoding pipeline we wanted to do the minimum. That meant to surgically remove the transcoding process and then replace it with an HTTP call to the Function.

Looking at the database and how our data is structured, the least information to be passed to the Function is the file ID. That would simplify the HTTP call from the Java backend.

Coconut is really simple. You make an HTTP request with the configuration of the transcoding (source, destination, output settings -format, resolution, bitrate-) and you’re good to go. Since the process can take some time, you can also give Coconut a webhook URL that will be called once the process is done.

To simplify maintenance and deployment, we decided to implement a single GC Function that would have three methods. There’s just one endpoint, but we can do the routing internally, so we decided to go this way.

Google Cloud Functions

There’s lots of resources on how to start with GC Functions, so I’m not going to go through that step-by-step, but I’ll describe what we’re doing and how we did it.

So, as I said before, we’re going with just one function that will contain three methods. To do the routing we just use <trigger URL>/method.

Since Functions uses Express internally but does the routing for the function, we can’t use it properly, so we had to go with a not-so-elegant solution, splitting the URL:

exports.transcodingpipeline = (req, res) => {
	var pathElements = req.originalUrl.split('/')
	var method = pathElements[1]

  switch (method) {
    case 'setupjob':
      // Set up the transcoding job
      break

    case ...

    default:
      res.sendStatus(404)
  }
}

Since we’re going to have only three methods, it’s simple enough to follow.

setupjob method

setupjob is our first method. It is called from the Java service. It receives the ID of the file to be transcoded. It accesses the database, reads all the information about the file and calls Coconut to start the transcoding.

Accessing the database

Our backend uses a MySQL database. I wrote a really basic MySQL ORM, and we’re using it to read the file from the database MySQLTable. It just “connects” to a table and has a few methods for easy CRUD.

var MySQLTable = require('@olavgm/mysqltable')
var vibbioFilesTable =  new MySQLTable(<MySQL configuration object>, 'vibbiofiles', 'id')

// Get the fileId from the request
var vibbioFileId = req.body.vibbioFileId

try {
  var vibbioFile = await vibbioFilesTable.read(vibbioFileId)
} catch (error) {
  ...
}

With this, we have the path to the file in the Google Cloud Storage bucket.

Signing the URL

Since the source videos are stored in GC Storage and are not available publicly, we have to generate a signed URL to pass to Coconut so it can download the video file for processing.

Signing the URL is a very straight-forward process. There’s a getSignedUrl method on the File class in the @google-cloud/storage npm package, but it needs some work.

We need credentials to access the URL signing API. That’s easy to do. Go to your Google Cloud Platform Console, APIs and Services, Credentials (lazy link) and create a Service account key and export it as JSON.

First, you need to import the library and create the client using the project ID and the ADC JSON file:

const { Storage } = require('@google-cloud/storage')

const storage = new Storage({
  projectId: config.googlecloudstorage.projectId,
  keyFilename: <path to JSON file>
})

Now connect to the file (filePath was passed to this method):

const bucket = storage.bucket(<your bucket name>)
const file = bucket.file(filePath)

Then we have to prepare the access configuration for the URL. We use read only access for one week, just in case the Coconut service is down and will take some time to get the file.

var oneWeekInTheFuture = new Date()
oneWeekInTheFuture.setDate(oneWeekInTheFuture.getDate() + 7)

const signedUrlConfig = {
  action: 'read',
  expires: oneWeekInTheFuture
}

And now we get the signed URL (we use promises on this function):

return new Promise((resolve, reject) => {
  file.getSignedUrl(signedUrlConfig, (err, url) => {
    if (err) {
      reject(err)
    }

    resolve(url)
  })
})

Setting up the job in Coconut

Coconut offers an npm module for integrations, so this becomes a very easy step.

Let’s get the module:

var coconut = require('coconutjs')

After setting up your account in Coconut you get an API key (lazy link). You need that to make requests.

We need to create an object with the parameters of the Coconut job:

var createJobParameters = {
  'api_key': '<Your Coconut API key>',
  'source': '<Signed URL we obtained before>',
  'webhook': '<Webhook URL>',
  'outputs': {
    'mp4:0x540': '<Destination of the output video file>',
    'jpg:x180': '<Destionation of the thumbnail>'
  }
}

The webhook will be the same URL for this method, but replacing setupjob for jobcompleted.

The keys of the output (mp4:0x540, jpg:x180) uses the Coconut notation for output files. There’s a lot of information in the documentation. In our case, it is an MP4 file with 540 pixels in height, and whatever width is needed to maintain the aspect ratio. Same-ish for the JPG thumbnail.

Now we just need to call the createJob method and we’re good to go!

coconut.createJob(createJobParameters, (job) => {
  if (job.status ===  'ok') {
    console.log('Created job in Coconut successfully:', job)
	// Flag the video as sent to Coconut
    ...
  } else {
    // Flag the video as failed to send to Coconut
    ...
  }
})

jobcompleted method

jobcompleted is the method we use for the Coconut webhook. There’s a lot of great documentation here.

I’m not going to go deeper into this because this method only flags the job as done and stores the received metadata of the video and thumbnail in the database.

These are the statuses we use for the jobs just in case you’re curious:

NOT_STARTED
STORAGE_UPLOADING
STORAGE_UPLOAD_FNISHED
TRANSCODING_UPLOADING
TRANSCODING_UPLOAD_FINISHED
FAILED_TRANSCODING_UPLOAD
FAILED_TRANSCODING
DONE

resetfailedjobs method

I’m not going to go deep in this either because it’s really simple. We have prepared this method to trigger it manually just in case individual Coconut jobs fail, but we haven’t ran into problems yet.

This method just gets a list of all jobs with FAILED_TRANSCODING_UPLOAD or FAILED_TRANSCODING status and makes the requests again to the setupjob method.

As I said, no jobs have failed so far, but we have a plan in case it happens often. We will modify the database to keep track of how many attempts have been made, and increment the number on attempts. If it is higher than a value we consider too high (3? 4?), we will notify ourselves to manually have a look at the file and logs.

Environment configuration

GC Functions does not support changing the value of the ENV environment variable. Its value is always production, so we cannot use the config npm package to work on the three environments we have (local, test, production).

The thing is that, during the deployment of the function, other environment variables can be set, so we settled on using FUNC_ENV as the environment variable to use on all our GC Functions.

We have three JSON files in the config folder of the function, and we load them depending on the FUNC_ENV value. For that, we use a very simple config.js file that we load whenever needed. This is the code of the config.js file:

if (!process.env.FUNC_ENV) {
  console.error('FUNC_ENV not set')
  process.exit(1)
}

module.exports = require('./config/' + process.env.FUNC_ENV + '.json')

And just like that we load the different configurations for the different environments.

Deployment

Deploying to Google Cloud Functions is very easy. There’s great documentation here.

This is the command we use for deploying this function:

gcloud beta functions deploy transcodingpipeline-test \
  --entry-point=transcodingpipeline \
  --memory=128MB \
  --trigger-http \
  --runtime=nodejs8 \
  --set-env-vars FUNC_ENV=test

We have to use the beta parameter because the --set-env-vars parameter only works there. At first we used beta because we needed the nodejs8 runtime docs, but it seems to be out of beta or at least it doesn’t need the beta parameter anymore.

The default memory usage is 256MB, that’s why we specify 128MB, which is the minimum.

Local deployment

There’s a great emulator for running GC Functions on your machine. Docs.

Once installed and running. deploying to the emulator is as simple as this:

functions deploy transcodingpipeline-test \
  --entry-point=transcodingpipeline \
  --trigger-http \
  --set-env-vars FUNC_ENV=test

Keep in mind that it works the same but we don’t need the beta, memory and runtime parameters. Have it mind that it will run with whatever runtime you have running on your machine.

Once you deploy, you get something like this:

┌─────────────┬──────────────────────────────────────────────────────────────────────────────────┐
│ Property    │ Value                                                                            │
├─────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ Name        │ transcodingpipeline-test                                                         │
├─────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ Entry Point │ transcodingpipeline                                                              │
├─────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ Trigger     │ HTTP                                                                             │
├─────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ Resource    │ http://localhost:8010/<Your project ID>/us-central1/transcodingpipeline-test     │
├─────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ Timeout     │ 60 seconds                                                                       │
├─────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ Local path  │ /...                                                                             │
├─────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ Archive     │ file:///var/folders/kj/svgnh92x29d2mdk36ts50f5m0000gn/T/tmp-9026D6F1zTjR9H4W.zip │
└─────────────┴──────────────────────────────────────────────────────────────────────────────────┘

You can easily trigger the function with cURL:

curl --header "Content-Type: application/json" \
  --request POST \
  --data '{"vibbioFileId":1971}' \
http://localhost:8010/<Your project ID>/us-central1/transcodingpipeline-test/setupjob