Managing the Data Lifecycle in Dynamic Environments

At this time, media organizations are generating and acquiring huge amounts of data, media and assets that exist as digital files. Learning how to store and manage those assets efficiently is key to keeping track of data, and taking advantage of the digital format to discover its value and repurpose it.

Data-Avid-collaborative2

Many hardware and software products have been developed for different applications and styles of data management. But the individual products themselves have become less relevant than the rules, processes and services that tie them together into a controllable system, whether you are managing content for a single project, or within an organization. Data that can't be found, cannot be re-used. Data that occupies storage space without a return on investment becomes a cost.

A workshop on Media Asset Management and Data Lifecycle was held in February at digital media systems supplier Digistor in Sydney that focused on data lifecycles and systems that help you manage data from the time files are acquired or created, until they are archived.  At the event, Solutions Architect Patrick Trivuncevic at Digistor and Steve Baird, Channel Business Manager at Intraware, presented ideas on managing data through its lifecycle in a facility, and about asset management systems and integrated storage. 

Lifecycles, Systems and Your Data

A dynamic data lifecycle management system, or DLM, supports a lot of activity inside an organization, including workflows.

Once data is generated through capture, creation or repurposing, it is generally ingested into a storage system that supports sharing files online without actually moving any files between storage and workstations. This makes data available for review and approval, and management tasks like tagging and rights control. The system also supports versioning so that you can update files without losing earlier iterations, and allows access to files through searches.

A management system often keeps a single master file of each digital asset, and contains a transcode engine for derivative formats needed for clients or projects. Transcoding may also be required for distribution, the final stage of a project, and later for re-purposing and monetization. Finally, archiving backs up assets and maintains their original integrity.

Data-CatDV

Two major advantages of setting up a DLM system are, first, gaining a much better knowledge of your assets through managing their data and, second, understanding the connected or integrated nature of file-based operations. Knowledge of assets can include formats and resolutions as well as content, and understanding integrations between data and the system’s applications and storage can lead to better workflows, faster access and the ability to set up an automated production and/or archiving system.

What Data?

Patrick Trivuncevic said, “DLM is not a product but an approach to managing data that keeps evolving, starting with procedures and practices that define how data is categorised and located inside the system at any lifecycle stage. At Digistor, we have a questionnaire for clients to start them thinking about their own data and where problems lie within the processes they have to carry out. How do you use your data? How does it fit into equipment implementation? What kind of data do you have and how much, and how can it be used by other organisations?

“In short, putting a system in place in an organization for managing data through its lifecycle allows you to quantify and qualify the data in your assets and understand their value. Most of the associated benefits relate to efficiency, reducing production cost and time by automating manual processes and capturing metadata consistently through the lifecycle. Archived assets can be discovered and utilized in a timely manner.”

Specific reasons for setting up DLM systems range from day to day workflow efficiency to monetization. Handling, managing and storing very large quantities of data does bring a cost – outside the cost of regular delivery workflows - but that cost can largely be recovered through better utilization of data over a longer period of time.

You can quickly find out where it is, access and browse through it, and re-use it. For example, automating the generation of proxy files and placing them in the cloud immediately makes them visible to all kinds of users, not only artists who need it during production, but also marketing and sales staff who see its longer term value and need to reference it later on.

Metadata and Automation

Most of the lifecycle stages, and an asset’s intrinsic value, rely on metadata, extremely useful pieces of data about your data that make it possible to identify it with clip names or timecode, describe it with keywords or tags and classify it to sort and make it searchable.

Data-img59

The metadata associated with modern media is substantial, and usually stored automatically in the header of the data file either at the time of creation or of ingest into the system. But some older formats haven’t been designed with as much metadata, and in any case, users may want to supplement it at different stages of the lifecycle, either manually or automatically. After all, the more metadata available in each file and across the system, the more effective your searches, automated processes, management and accessibility will be.

Consequently, a DLM system should be able to read and understand metadata and support as many metadata schemas as possible. This increases its value to you, now and in the future, and to other organizations. For example, during production, the time your video was shot, the frame rate and codec are critical to the production team but once the project is complete, it takes on a different kind of value to different users and must be identifiable by them as well.  

Automation, crucial to saving time and cost when working with data, and metadata work hand in hand because metadata allows automated processes to find and identify the data it needs to work on. Steve said, “With very large amounts of data and files, if a task cannot be automated it may not be worth doing. A lot of automation has been devoted to ingest workflows involving transcoding engines, for example. As data is ingested into a MAM, it goes first into the transcode engine, then back out in the required format or formats, and is recorded in the MAM. I feel the automation of very detailed, customised ingest will make a huge change to our ability to manage data.”

Future-Proofing

Making efforts to future-proof the system is important because they are costly and time-consuming to put in place. Because it would be too difficult and costly to set up a MAM that, from the start, had all of the features you’ll ever need, it is best approached by working from problem to problem, and considering your current investments.

Therefore, a system architecture that supports open standards may give you more agility and flexibility as you develop the system and the business grows, while a proprietary architecture tends to place a single supplier or vendor in control over its functionality. The system also has to be scalable in terms of capacity, number of users and integrated functionality, and support customization through open APIs for workflows.

Tiered Storage – a Balancing Act

Automation and metadata have another key role in tiered storage management. Tiered storage systems automatically place files into types of storage with different prices, maintenance and levels of performance, depending on how readily the files need to be accessed, and by which applications. DLM’s ability to integrate with and manage shared, tiered workflow storage can help lower total storage costs by optimizing the process of moving the desired data into the most appropriate tier of storage, at the optimal time.

Data-Avid-collaborative

When balancing capacity against performance, as performance increases, so does the cost, while capacity often reduces. Cost is also associated with higher capacity, which is currently evolving as new systems are developed.

Tier 1 is the most expensive level in the system, typically a NAS or SAN system that may involve storage arrays with SAS connections and solid state drives. It holds the files an organization accesses most often for use in applications that need the highest performance and fastest access - real-time collaborative editing and content creation, DI, ingest and playout.

Tier 2 storage, sometimes called nearline, is less expensive with greater capacity but lower performance, as it is often based on storage arrays with SATA disks. It is still useful for commonly accessed assets, but it suits less specialized, less interactive applications such as offline editing. Through automation, its extra capacity could be used to hold assets that need to be accessed later by Tier 1 applications.

Tier 3 is mostly used as offline storage for archiving, with high capacity, long term protection and lower cost but the least accessibility because files take time to restore for use online. There are also options to use the cloud or store copies of tapes offsite.

The challenge of choosing an archive system is that, from a user’s point of view, archiving is better defined by its restore process than the initial archive process. Restoring is often a time-pressured task carried out to meet a client request, whereas archiving itself is usually done after a project is complete and delivered. You need to understand the processes and maintenance costs, and match the archive hardware and software to your own storage and distribution, now and further down the track.

Dynamic Storage

Examples of Tier 1, online shared storage are the Ethernet-based SAN systems built by DynamicDrivePool to overcome certain limitations in SAN and NAS systems. To run at speed, digital media software like editing applications for video need direct, high-speed shared access to large amounts of data on storage drives or volumes. For this purpose, traditional SAN or NAS shared storage systems sometimes encounter capacity and IP bandwidth limitations, and need secure backup.

Data-DDP-collaborative

However, by using their AVFS/iSCSI protocol, DynamicDrivePool can combine the performance of SAN with NAS intelligence. The iSCSI protocol is an alternative to Fibre Channel for block-based storage area networking, and accelerating data access. While a combined SAN and NAS on fibre-channel would need two networks, AVFS [Ardis Virtual File System] allows it to work over a single network infrastructure.

Thus, DDP systems work from one device on one Ethernet network as an IP-SAN, and allow project and file level-based sharing for workgroups, using almost all of the available Ethernet bandwidth. The drives behave in the same way as local drives with no latency and, designed as a pool, the systems are scalable. 

Further along the lifecycle, an archive storage example is the Q24 LTO tape library, a network-based, scalable, high performance system from Qualstar. It records and manages metadata, proxies and the complete data lifecycle within a MAM, using ID number policy-based movement. The hardware is a rackmountable 2U enclosure with a half-height LTO 6 or LTO 7 tape drive and single-slot I/O port, holding 24 tapes for a capacity of up to 144 TB native or 360 TB with LTO 7 compression.

The graphical built-in operator control panel for configuring, managing and diagnosing the library is driven via menus, and these same controls are also accessible remotely through the Internet in an integrated browser-based interface. 

Digital Asset Management

A dynamic DLM is also composed of applications, both those that an organization currently has that need to be integrated into the management system, such as transcoding engines or archive systems, distribution and delivery software, and others specifically devoted to management like a production asset management system, PAM, or media asset management, MAM.

Data-video-editor

PAM software is designed mainly for quick movement of assets in production workflows – in this case, video production - for films, television content, games and animation, and includes revision control and team collaboration. They tend to integrate with NLEs, supporting content that frequently changes as editing proceeds and maintaining multiple versions. The Avid InterPlay PAM is known for its integration with Media Composer. But two examples that integrate with other NLEs as well are FLAVOURSYS Strawberry and VSNExplorer. EditShare Flow and Grass Valley GV STRATUS are other types of PAM.

MAM applications are mostly concerned with ingest, deliver, sharing and ultimately archiving, specifically large media files such as high quality video and other digital material destined for secure, long-term preservation. It should include functionality for handling media through its full life cycle from ingest through to archive, including delivery.

Because the original MAM systems were developed for broadcast, such as Dalet’s Galaxy, until recently they have tended to be broadcast-specific and fairly expensive. But more recent MAM examples such as Cantemo Portal and CatDV are flexible enough to support post production, general content producers and other media based workflows as well as broadcast, at a lower cost with more scalability in functionality and size. Cantemo Portal, for example, can be scaled up in terms of tools by purchasing apps, and out to accommodate more users.

Customising Management

A great deal of value can be gained by using a MAM system as a front-end to an archive system. As content is completed, it is ingested into the MAM and archived. The MAM can then be used to search, find and verify that a particular file is what the user wants. Only then is the restoration process started. Integration of this type not only helps retrieve the right data for a specific use, it also can be designed to move data into the optimum tier.

Data-eMAM-UI

Steve is most excited about the current interest in merging PAM and MAM functions, mainly due to the fact that the cloud and internet generally are making it possible to involve a much wider scope of users. “A hybrid category also exists here as well because some MAM products today are starting to overlap into production with tools that add value for content creatives,” he said.

“At the same time, some PAM functions are capable of external access, delivery and content sharing beyond the production’s creators. Some MAM products are also being developed that add panels to the UIs of integrated NLE software so that an editor can search for stock footage, word documents or spreadsheets stored in the MAM without leaving the editor.”

A Hybrid System

An example of a hybrid PAM/MAM system demonstrated at the workshop is eMAM Vault. The software combines DAM functions, transcoding and LTO library management into a scalable system with one eMAM browser interface that manages storage, processing and archive. Users can search and preview media using online proxies, while the original high resolution content is secured in an LTO archive. High speed transcoding generates the formats the organization uses for its work, and delivers the files to editing systems, playout servers, websites, mobile devices and so on.

Nearline and archived content can be accessed from any web browser, and searched through a search engine that uses embedded and customized metadata. Downloads and other functions can be restricted with permissions. Basic workflow tools support online collaboration and review, rough cuts, approvals and so on.

eMAM Vault is made for M&E companies, corporate marketing and communications organisations, and houses of worship. Larger enterprise and workgroup versions add servers, external review and approval, a timeline and some other functions and can be managed through Adobe Anywhere or Premiere Pro. But all versions have on-premise and cloud storage support, internal transcoding, and archive software to control online and offline LTO tapes.