Blog Post

Open Sourcing our Client-side to AWS S3 chunk uploader

Blog Single

We recently upgraded the scrmhub content publishing platform to support video uploads. This is something we've wanted to do as a team for a while and something we wanted to ensure was done the best way possible, within the AWS platform.

There are various parts to handling video, and the first, most obvious one is the upload stage. This is the make-or-break phase that will set the tone for the whole user experience. Make it too slow or prone to failing on poor connections and user's will move away.

The team also wanted to test out some ideas they've had on leveraging the browser for more heavy lifting and also utilising more of what the AWS platform has to offer.

You can find the source code and instructions on our GitHub page, along with the instructions to get started.

Background

The first most obvious issue with uploading large files is the risk of the connection failing and the download having to completely start again. Whilst this happens less these days, it still does happen. And add to that that people are now more mobile, it makes it even more important that a dropped connection does not kill the upload, otherwise people will get charged for unnecessary bandwidth caused by us - not a good look for a SAAS business!

The second problem is that as our client base grows, we want to be thinking about the time and data costs for us and our customers. An obvious saving is moving files between servers. It's easy to move files, but each file takes both time to transfer and costs us money in data between the server and the bucket.

We were also concerned about how our infrastructure will scale. This affects us both locally and on a global frame, so we are always looking for how we can leverage systems that exist so we don't need to worry about things like this.

  • Timeframe to upload
  • Upload large files in a distributed storage accessible by micro services
  • Fast, reliable user experience
  • Scalable and cheap

The solution

After scoping out the requirements we decided that we wanted to go with a solution that:

  • Supports chunk uploading and the many benefits including error handling and faster uploading (if the connection is fast enough)
  • Uploaded directly to our S3 Bucket and not via a backend server (saving cost and time)
  • Leverage AWS' services as much as possible (more $$$ saving)

Architecture

We first looked at the setup we have, and realised that being based on Amazon's AWS services, there are already a wealth of ways to streamline services like this, you just have to find the best way to do it. The trick is that you are putting the file in one place, but you need to generate the URL for that place. This is where the server side code comes in. It can be used to generate each url and put the file chunk in the correct place, and the tell AWS to put it back together.

Another key thing we looked at is the browser's behaviour and how we can leverage solutions as simple as how asynchronous uploads. Not a new trick, but not easy to implement direct to AWS S3, especially directly from a browser.

The flow we ended up implementing uses a our server to generate the upload urls required by the browser and then the browser to send the chunk to AWS using the generated URL. This obviously still requires a server, but this is a requirement as we need to use the account secret and we don't want that made public. That looks something like this:

Sequence flow of upload

Inspiration

Our first bit of inspiration was from the awesome flow.js library. From all the research we did, this is an outstanding library that allows you to upload a file in parts to a server. Ok, it's a lot more than just that, doing pausing, resuming, fault tolerance, etc.

But, if you read through the comments on their site, there's one thing people ask for again and again... Direct to S3 uploads. And the reason they haven't is because it relies on a server to generate the urls.

The second bit of inspiration came from some work done by ienzam on direct to S3 chunk uploads. In his own words "The codes are not well tested, poorly written, and kind of a mess. You should get inspiration (!) from the code and make your own version.". We found this out rather quickly, but also that he had broken the back of the problem using the AWS SDKs.

The results

We were quite pleased with the results of our first version of the chunk uploader. It's extremely responsive, provided real-time feedback on upload progress, and has good fault tolerance.

As for the upload speed, that's still at the mercy of the user's connection. We have tried playing with the number of connections and size and found that the default size we use works the most consistently for us. There is a slight improvement with more simultaneous uploads, but we also found errors went up too on slower connections doing that.

And when we started out, we'd always planned to write it as a standalone library with with separation and open sourcing in mind, so it was relatively straightforward to share it on GitHub.

Future Plans

  • Migrate server code to Node Js for even more seamless connect between server and client
  • Move the final stitching call to the browser (this can take a few seconds for a big file)
  • Remove jQuery dependency as much as we love the work jQuery has done, there's really no reason to make it a requirement for something like this.
  • Look to use JavaScript Promises more to improve the flow of data within the client-side code and increase error tolerance further.
  • Leverage other services such as Lambda to make it faster and predictively generate the urls for the next set of uploads before they happen, reducing some of the lag and pauses in the browser.

We are hiring

We are an Artificial intelligence Marketing Technology startup that is growing quickly and working globally to deliver the next generation of tools and services. Our platform is pushing into new bigger markets and we’re looking for Engineers who are after their next challenge building a multi-lingual, multi-regional real-time platform built on big data and machine learning.

To find out more about your next company and see the current opportunities, visit our careers page https://u.scrmhub.com/joinus

If this kind of work excites you, let's have a chat over coffee

scrmhub, Bringing Machine Learning to Marketing

Share this Post:

Posted by

Gregory Brine
CPO & Co-Founder

Greg has a passion for what AI and Deep Learning can bring to the MarTech stack and how small and medium businesses can benefit from these new technologies. He has over 20 years experience as an engineer and product developer, having worked for significant global marketing agencies, Razorfish and We Are Social.

Posted by

Johnson Lin
Partnerships Director & Co-Founder

Johnson has 20 years senior engineering experience including a significant background in projects driving integration with 3rd parties and the management of teams to bring these projects to successful outcomes. His specific interest and experience in content management, developed in a large data VOD environment, has also helped in guiding us with content management and analysis, which set Metigy up to tackle many of our content related AI challenges.

Posted by

David Fairfull
CEO & Co-Founder

David has developed deep marketing domain expertise and is passionate about shaping the the role technology is having on the marketing function. Formerly a Managing Partner of We Are Social (the largest global social agency), Regional Director APAC for McCann Erickson WorldGroup's digital business, and Managing Director of The Brave Group (an early pioneer in digital marketing), he is driven by the idea that technology can make marketing fun