Amazon video segmentation

Amazon Rekognition video segment detection, an API released in 2020, automatically detects frame accurate end credits, black frame segments, shot changes and colour bars in video files using machine learning (ML). It now includes support for four new types of segments – opening credits, content segments, slates and studio logos. It also detects credits and shots more accurately, and has new filters to control black frame detection.

Operations and media supply chain teams need to perform segment detection as part of content preparation and QC for video on demand (VOD) applications. Amazon developed Rekognition because detecting and handling video segments is generally a time-consuming, tedious manual task, but is necessary for much of the work involved in extracting the full value from video material.

Automated Segmentation

Examples include adding markers like ‘Next Episode’ and ‘Skip Intro’, detecting pre-inserted ad break markers indicated by silent black frames, and removing unwanted sections such as slates and colour bars before publishing. Broadcasters and streaming platforms can create interactive ‘skip’ prompts to keep viewers engaged or make ad insertion more precise and less intrusive, and devote their operator resources to higher value work.

Amazon Rekognition helps to automate and speed up these processes. Amazon says that by automating video segment detection, users can rapidly prepare large volumes of archival or 3rd party content for streaming, and lower the cost of manual asset review operations by two to five times.

Amazon video segment timeline

A typical timeline for a video asset in the media supply chain showing colour bars at the start, black frames throughout and credits at the end.

Detection itself produces useful information – frame accurate start and end timestamps, SMPTE timecodes or frame numbers. For instance, by using Amazon Rekognition Video to identify the exact frames where the opening and closing credits start and end for a movie or TV show, users can automatically generate binge markers that omit credits and intros, or interactive viewer prompts such as ‘Next Episode’ or ‘Skip Intro’ in VOD applications. Rekognition uses machine learning to handle varied opening and end credit styles like rolling credits to credits alongside content, credits on scenes, or stylised credits in anime content.

Slates, Content and Other Segment Types

Typically seen at the beginning of a video, slates contain text metadata about the episode, studio, video format, audio channels and so on. Amazon Rekognition can identify the start and end of such slates so that operators can use the text metadata for other analysis or just to remove the slate when preparing streaming content. The API can also identify studio logo sequences, which operators can review to further identify studios.

Content refers only to the actual program portions of the TV show or movie, without silent black frames, credits, colour bars, slates or studio logos. Amazon Rekognition detects the start and end of each content segment in the video, which helps determine the program runtime or demarcate important program segments. For example, a quick recap of the previous episode will appear at the beginning of the video, or bonus post-credit content will follow the end credits.

Some videos may have a collection of the shots and sequences that play out with overlaid text, such as lower thirds, gathered at the end with the text removed. Having these clean ‘textless’ versions allows operators to internationalise the content by overlaying text in another language.

Once all the content segments are detected with Amazon Rekognition Video, extra domain-specific rules such as ‘my videos always start with a recap’ can be applied to further identify the key segments that need further review or follow-up action.

Amazon video segment detection1

Black Frame Filters

Amazon has also added new filters to manage the detection of silent black frames in a way that allows users to work with media files from different sources with varying levels of quality and colour range support. For example, a file digitised from tape may have different black levels and more noise than modern digitally produced files. To address the varying requirements, the appropriate Black Frame filters can be specified in the API request.

MaxPixelThreshold is a threshold used to determine the maximum luminance value for a pixel to be considered black. In a full colour range video, luminance values range from 0 to 255. A pixel value of 0 is pure black, and the most strict filter.

MinCoveragePercentage is the minimum percentage of pixels in a frame that need to have a luminance below the max_black_pixel_value for a frame to be considered a black frame. Luminance is calculated using the BT.709 matrix.