We recently upgraded the scrmhub content publishing platform to support video uploads. This is something we've wanted to do as a team for a while and something we wanted to ensure was done the best way possible, within the AWS platform.
There are various parts to handling video, and the first, most obvious one is the upload stage. This is the make-or-break phase that will set the tone for the whole user experience. Make it too slow or prone to failing on poor connections and user's will move away.
The team also wanted to test out some ideas they've had on leveraging the browser for more heavy lifting and also utilising more of what the AWS platform has to offer.
You can find the source code and instructions on our GitHub page, along with the instructions to get started.
The first most obvious issue with uploading large files is the risk of the connection failing and the download having to completely start again. Whilst this happens less these days, it still does happen. And add to that that people are now more mobile, it makes it even more important that a dropped connection does not kill the upload, otherwise people will get charged for unnecessary bandwidth caused by us - not a good look for a SAAS business!
The second problem is that as our client base grows, we want to be thinking about the time and data costs for us and our customers. An obvious saving is moving files between servers. It's easy to move files, but each file takes both time to transfer and costs us money in data between the server and the bucket.
We were also concerned about how our infrastructure will scale. This affects us both locally and on a global frame, so we are always looking for how we can leverage systems that exist so we don't need to worry about things like this.
- Timeframe to upload
- Upload large files in a distributed storage accessible by micro services
- Fast, reliable user experience
- Scalable and cheap
After scoping out the requirements we decided that we wanted to go with a solution that:
- Supports chunk uploading and the many benefits including error handling and faster uploading (if the connection is fast enough)
- Uploaded directly to our S3 Bucket and not via a backend server (saving cost and time)
- Leverage AWS' services as much as possible (more $$$ saving)
We first looked at the setup we have, and realised that being based on Amazon's AWS services, there are already a wealth of ways to streamline services like this, you just have to find the best way to do it. The trick is that you are putting the file in one place, but you need to generate the URL for that place. This is where the server side code comes in. It can be used to generate each url and put the file chunk in the correct place, and the tell AWS to put it back together.
Another key thing we looked at is the browser's behaviour and how we can leverage solutions as simple as how asynchronous uploads. Not a new trick, but not easy to implement direct to AWS S3, especially directly from a browser.
The flow we ended up implementing uses a our server to generate the upload urls required by the browser and then the browser to send the chunk to AWS using the generated URL. This obviously still requires a server, but this is a requirement as we need to use the account secret and we don't want that made public. That looks something like this:
Sequence flow of upload
Our first bit of inspiration was from the awesome flow.js library. From all the research we did, this is an outstanding library that allows you to upload a file in parts to a server. Ok, it's a lot more than just that, doing pausing, resuming, fault tolerance, etc.
But, if you read through the comments on their site, there's one thing people ask for again and again... Direct to S3 uploads. And the reason they haven't is because it relies on a server to generate the urls.
The second bit of inspiration came from some work done by ienzam on direct to S3 chunk uploads. In his own words "The codes are not well tested, poorly written, and kind of a mess. You should get inspiration (!) from the code and make your own version.". We found this out rather quickly, but also that he had broken the back of the problem using the AWS SDKs.
We were quite pleased with the results of our first version of the chunk uploader. It's extremely responsive, provided real-time feedback on upload progress, and has good fault tolerance.
As for the upload speed, that's still at the mercy of the user's connection. We have tried playing with the number of connections and size and found that the default size we use works the most consistently for us. There is a slight improvement with more simultaneous uploads, but we also found errors went up too on slower connections doing that.
And when we started out, we'd always planned to write it as a standalone library with with separation and open sourcing in mind, so it was relatively straightforward to share it on GitHub.
- Migrate server code to Node Js for even more seamless connect between server and client
- Move the final stitching call to the browser (this can take a few seconds for a big file)
- Remove jQuery dependency as much as we love the work jQuery has done, there's really no reason to make it a requirement for something like this.
- Leverage other services such as Lambda to make it faster and predictively generate the urls for the next set of uploads before they happen, reducing some of the lag and pauses in the browser.
We are hiring
We are an Artificial intelligence Marketing Technology startup that is growing quickly and working globally to deliver the next generation of tools and services. Our platform is pushing into new bigger markets and we’re looking for Engineers who are after their next challenge building a multi-lingual, multi-regional real-time platform built on big data and machine learning.
To find out more about your next company and see the current opportunities, visit our careers page https://u.scrmhub.com/joinus
If this kind of work excites you, let's have a chat over coffee
scrmhub, Bringing Machine Learning to Marketing