Amazon Rekognition Video Adds Media Analysis to Automate Processing

Amazon Rekognition Video shots

Shot 71

Amazon Rekognition Video is a machine learning (ML) based service that can analyse videos to detect objects, people, faces, text, scenes and activities. Amazon has now added new analysis functionality that detects key content characteristics for working with video. It is now possible to automate four common media analysis tasks – detection of black frames, end credits, shot changes and colour bars using ML-powered APIs from Amazon Rekognition Video.

Videos often contain a short run of empty black frames without audio to demarcate ad insertion slots or end of a scene. Using Amazon Rekognition Video, you can detect those indicators and use them to automate ad insertion, or package content for VOD by removing unwanted segments.

Performing these detection analyses also help enable users execute workflows such as content preparation and ad insertion, and to add ‘binge-markers’ to content, at scale in the cloud. You can insert interactive viewer prompts as well, such as ‘Next Episode’ in VOD applications, by identifying the exact frames where the closing credits start and end in a video.

Amazon Rekognition Video

Detect segments automatically and capture frame accurate start and end timecodes.

For viewers who want to watch multiple episodes of a specific TV program in a row, binge markers are used to locate and omit the start- and end credits or the ‘previously on’ segment of a TV program, allowing a smooth transition between episodes. Because they limit the manual steps in switching between episodes, markers make it likelier that a viewer’s attention is always on the content. OTT and VOD providers have already started adding such features, and TV providers may want to as well.

Viewer-Focussed Workflows

Other useful functionality in Amazon Rekognition Video is detecting shot changes, when a scene cuts from one camera to another. Using this information, you can create promotional videos using selected shots, generate good-looking preview thumbnails by choosing the best frames in shots, and insert ads without disrupting viewer experience, for example, by avoiding the middle of a shot when someone is speaking.

Amazon Rekognition shot detection

Shot selection

Editors like making soft transitions from one camera to another to produce a video that is easier and more pleasant to watch than a series of hard cuts. But transitions make it awkward to work with the video frames later on. When Amazon Rekognition Video detects a soft transition, it omits the transition all together to make sure that shot start and end times don’t include sections without actual content.

It’s also useful to detect sections of video that display SMPTE (Society of Motion Picture and Television Engineers) colour bars, either to remove them from VOD content or, more important, to detect issues such as loss of broadcast signals in a recording. In those cases, colour bars may be shown continuously as the default signal.

Amazon Distecolorbars 720

Reducing the Workload

With these APIs, you can analyse large volumes of videos stored in Amazon S3 and extract SMPTE timecodes and timestamps for each detection, even without machine learning experience. The APIs are documented and managed, showing a consistent interface that allows developers to work with digital assets without having to understand the complexity of the underlying systems. The interface reduces the specific expertise needed to build new applications and so gives more developers a chance to innovate further in less time. 

The SMPTE timecodes sent back to the user are frame accurate, which means that Amazon Rekognition Video supplies the exact frame number when it detects a relevant segment of video, and also handles various video frame rate formats, such as drop frame and the fractional frame rates used in some countries, under the hood. Drop frame eliminates the fractional difference between the 29.97 fps frame rate and the 30 fps numbering.

You can use the frame accurate metadata from Amazon Rekognition Video, either to automate operational tasks completely or to reduce the review workload of trained human operators enough to run media analysis workflows rapidly and at scale in the cloud.

Segmenting at Synchronised

Synchronized develops an AI engine that understands the content and context of a video and enriches it with metadata. Then, instead of simply leaving it as a linear sequence, their team can add interactivity in a manner similar to hypertext to work more effectively in digital environments.

Amazon Rekognition credits

Television channels are now interested in adapting traditional, long-form content produced for linear TV into short-form segments that suit online consumption. By segmenting and clipping content editorially, viewers can directly access the parts that interest them. The Synchronized platform automates the video segmenting, clipping and distribution workflow for broadcasters.

Until now, accurate, automatic transformation of audiovisual content into editorial segments has been a complex task requiring a number of specialised techniques, but by combining Amazon Rekognition Video with their own segmentation service, they can speed up the preparation and delivery of accurate clips to TV editorial teams.

The editors can then manipulate the segments without specialists, and distribute them immediately. The level of automation makes the process scalable, and the ability to automatically detect end credits means their customers also have the means to add features such as ‘Next Episode’ buttons to their content.

Rekognition Video users pay only for the minutes of video they analyse, without minimum fees, licenses or upfront commitments. So far, media analysis features for Amazon Rekognition Video are only available in AWS Regions supported by Amazon Rekognition.