Configurable Media Codec Framework: A Stepping Stone for Fast and Stable Codec D(全文)

开篇：润墨网以专业的文秘视角，为您筛选了一篇Configurable Media Codec Framework: A Stepping Stone for Fast and Stable Codec D范文，如需获取更多写作素材，在线客服老师一对一协助。欢迎您的阅读与分享！

Abstract

Recent advances in reconfigurable computing have led to new ways of implementing complex algorithms while maintaining reasonable throughput. Video codecs are becoming more complex in order to provide efficient compression for video with ever-increasing resolution. This problem is compounded by the fact that spectra of video decoding devices has become wider in the move from traditional TV to cable and satellite TV， IPTV， mobile TV， and Internet media. MPEG is tackling this problem with a reconfigurable video coding （RVC） framework and is standardizing a modular definition of tools and connections. MPEG’s work started with video coding and has recently extended to graphics data coding. RVC will be supported by non-MPEG standards such as the Chinese audio-video standard （AVS）. This article gives a brief background to the reconfigurable codec framework. The key to this framework is reconfigurability and reducing granularity to find commonality between different standards.

Keywords

MPEG； reconfigurable coding； RVC； RMC

1 Introduction

The Motion Picture Experts Group （MPEG） has created many audio-visual coding standards， including MP3， MPEG-2， and MPEG-4

AVC/H.264 [1]. MPEG’s multimedia coding standards have been central in the shift from an analog to digital paradigm. Video coding standards have been developed for specific applications： MPEG-1 for video CD， MPEG-2 for digital TV and DVD， MPEG-4 Part 2 Visual for mobile video， and MPEG-4 AVC/H.264 for DMB and Internet. It has been 20 years since MPEG-1 was standardized. There are many video coding standards， some from MPEG and others from non-MPEG organizations. However， competition between standards makes it difficult to develop video devices because such devices must support an ever-increasing number of codecs.

It could be argued that there should only be one or two generic video codecs and that standards should be unique. This may seem idealistic considering a huge amount of content has already been created using various standards. However， if we consider how media coding is done， a generic video coding standard is not impossible. Most media coding standards have basic processes： prediction， quantization， transform， and entropy coding. If a decoder is componentized into modules， it may be possible for a video coding standard to reuse modules from another video coding standard. The size of the module determines the granularity of the module， and the hardware and software of a module may vary in size， performance， and cost. The reusability of a module greatly increases when the right granularity is found for a given architecture.

The granular design of a codec can be used to describe a media coding standard， from bitstream syntax parsing to reconstruction of pixels or audio samples. All the media coding standards cannot be merged into one， but they can be described in a generic coding framework. In 2003， MPEG began standardizing a reconfigurable video coding （RVC） framework. The RVC framework can be considered a configurable media codec （CMC） framework that encompasses not only video coding but also audio and graphics coding.

MPEG first took the CMC framework approach in the area of video coding. This is a fast-evolving area because more demanding video services require more efficient coding standards. MPEG and ITU-T’s Video Coding Experts Group （VCEG） have joined together to standardize high-efficiency video coding （HEVC）， which aims to provide the most efficient compression. HEVC is expected to be more complex than MPEG-4 AVC/H.264. Very recently， Internet video coding （IVC） and web video coding （WVC） have also been proposed as royalty-free coding standards for Internet applications. Such diversity in video coding standards calls for CMC to be considered.

One of the main objectives of CMC is to narrow the gap between the design and implementation of algorithms. Generally speaking， designers of video coding algorithms do not take implementation into consideration when determining the merits of one algorithm over another. They instead design algorithms according to compression efficiency， the first requirement of video coding. Preferred algorithm designs are often complex and are difficult to implement. Designing algorithms according to implementation has been tried， but it is difficult because architecture such as hardware and software， single core and multicores， and floating-point and fixed-point arithmetic varies widely. Algorithm-architecture co-design has only recently been acknowledged as an important next step in research [2].

The idea of modularizing the codec with common tools came about by first considering how a module is constituted. A module is a functional unit （FU） comprising input， output， and internal processing. The FU can be described as a function call in a program， a logic unit in a chip， or a thread running in a parallel computing environment. An FU is designed to provide an abstract form of a function that can be implemented in different environments. MPEG’s FU design is similar to the black-box approach， although this was not cleatly stated in the RVC standard. As long as input and output behaviors in an FU implementation conform to the standard， internal implementation of FU is left open.

A decoder can be viewed as an FU with one input （for example， a bitstream） and three outputs （for example， YUV）. However， the granularity of a large FU does not conform to the goal of the RVC framework， that is， to define a toolbox containing FUs that can be reused in many coding standards. FU granularity is key in determining how efficient the RVC framework is. FUs that are standardized in a video tool library （ISO/IEC 23002-4） are not thoroughly verified in terms of whether they are efficiently segmented or divided with optimal granularity. The initial goal of RVC standardization was to design a proper framework for configuring FUs to form a decoder network.

FUs must be configured and connected in such a way as to form a decoder network that is interoperable with different implementations. The model of computation （MoC） of the formed decoder network is a dataflow model in which the input and output of the FUs are called tokens. The availability of input tokens determines FU execution of input tokens to produce output tokens. Therefore， connections between FUs are data-driven. The dataflow model is a significant departure from the traditional model of computation based on signal flow. Most signal-processing algorithms can be modeled as a signal-flow graph， and there is no room for functions or computations at the individual node. In a data-flow graph， additions， multiplications， and cosine functions can be hidden in a node. Input and output are described as input and output edges （or tokens）.

A data-flow-based description of MoC is a simplified description of a decoder network （FU network）. The remaining implementation details， such as buffer management， timing， and data precision， are unspecified so that implementation can be flexible. This is why there are two standard specifications， one for the framework （ISO/IEC 23001-4） and one for the toolbox （ISO/IEC 23002-4）. The framework standard contains the decoder description language， used to describe the FU network， and bitstream syntax parsing. The toolbox standard contains video coding FUs and a simulation model with several decoder configurations for existing video coding standards.

The RVC framework is intended to cover not only video coding but also audio and graphics. The MPEG graphics community recently started work on a reconfigurable graphics coding （RGC） framework that is similar in principle to the RVC framework. The main goal of the RGC framework is to construct a toolbox for MPEG graphics coding tools [3]. Activities relating to the RGC framework include confirming the RVC approach for any-media coding. GPUs are heavily used in graphics applications， and the modular design of the RGC framework helps in the implementation of FUs， which are well-suited for such graphics applications.

The CMC approach looks promising， and modular design in parallel computing is attracting interest in areas where multicores and GPUs can accelerate computing. MPEG is not the only group interested in CMC. The Audio Video Standard （AVS） Group in China shares MPEG’s vision of CMC and has been developing its own FUs to support AVS codecs.

This paper takes the MPEG RVC framework as a good example of the CMC approach. A few years have passed since CMC was first hatched by MPEG， but there is much room for improvement of technology and standards.

2 The CMC Framework

There are two main issues for the modular design in CMC： how to define a module and how to connect modules. In MPEG RVC， a module is called a functional unit （FU）. Input and output behavior is normatively defined， and internal processing is left open and is implementation-specific.

2.1 Module Design Philosophy

When designing a module in CMC， implementation and granularity， testability， and interoperability should be taken into account.

2.1.1 Implementation and Granularity

A module should be implementable in platforms that have hardware or software， single core or multiple cores. Abstract modeling is often preferred to physical implementation because it increases flexibility when implementing modules in various platforms. The module in MPEG RVC is designed using an abstract definition of FU in the text specification and using an examplar implementation of FU in RVC-CAL language. In the text specification， each FU is viewed as a black box， and in RVC-CAL implementation， each FU is viewed as a white box. Module granularity is included in the FU definition， and directly affects the reusability and reconfigurability of modules. For this reason， both implementation and granularity should be clearly defined.

2.1.2 Testability

Efficient testing and debugging of an implemented module is one of the goals of CMC. Adequate module granularity helps reduce testing and debugging work. In MPEG RVC black-box testing， golden responses are generated by analyzing the corners of a given FU. The black-box approach is taken to ensure that different module implementations can be tested in a standard way.

2.1.3 Interoperability

The standard definition of a media codec has， so far， been confined to the bitstream syntax， and parsing and decoding algorithms. Implementation of algorithms in a codec is unspecified. Industry fills this gap by enhancing the compression efficiency of encoding algorithms （through， for example， efficient mode decision） and enhancing the encoding and decoding algorithms with cost-effective implementations. An implementation designer can create customized algorithms， for example， a combined implementation of quantization and transform. Using CMC， interoperability between modules of different implementations may be possible if any implemented module conforms to the input and output behavior of the abstract module definition. Therefore， it is possible to produce a decoder comprising a combination of modules from different implementations. This has not been possible with conventional decoder implementation. In a multimedia framework such as DirectShow， the only visible component has been a decoder， not the modules to generate a decoder. In MPEG RVC， module-level interoperability is not yet supported because the first goal of MPEG RVC is to provide a framework， not the modules.

2.2 Case Study： MPEG Video Tool Library

The MPEG video tool library （VTL）（ISO/IEC 23002-4） is a collection of FUs and part of the MPEG RVC standard. The tools （or FUs） available in MPEG VTL are supported by the MPEG codec configuration representation （CCR） standard. Fig. 1 shows an abstract definition of an inverse-scan FU used in MPEG-4 AVC/H.264. Two important fields in the abstract are input and output. The input is a 4 × 4 BLOCK token， and the output is also a BLOCK token. The description field contains a brief description of what the FU does internally with input and output tokens. The exact behavior is not explicitly described. There could be various implementations of the FU. Fig. 2 shows a reference description and implementation in RVC-CAL.

In Fig. 1， the FU testing is very much like black-box testing. In Fig. 2， the testing is white-box testing. MPEG RVC is not clear about this issue yet， but the most important thing is that the input and output behavior is transparent to any implementation.

2.3 Module Connections

Once modules are defined， connections between modules have to be made in order to form a module network. When connecting the modules， any data transaction between modules should be defined clearly enough so that any implementation follows the specification. In MPEG， input and output data of FUs are called tokens. These are the basic elements for connecting FUs into an FU network. In a packet-switched network such as the Internet， a datagram is similar to a token. However， a token is different to a datagram in that the size and format of each token may be different to one another. A variety of token types influences the design of interconnections between modules. If too much information is carried in a token type， modularization may not be done with optimal granularity. Connections and traffic between modules should be therefore be minimized when defining modules.

In CMC， connections are described in a readable format because they could be essential information in implementations. The following information should be described： connections between module input and output ports， definition of the token type of each connection （e.g.， block of 8 × 8， pixel， MB， 1-bit flag）， token sequence or order， and parameters for specific implementations.

In MPEG RVC， the module network is descibed with an XML-like description called the FU network description （FND）. The rules for describing connections are defined in the FU network language （FNL） in the MPEG RVC standard. A diagram is commonly used to describe the module connections. Fig. 3 shows an FU network in MPEG-RVC. There can be up to four different token types， that is， two external and two internal. Input and output ports that share the same connection should support the same token type.

The diagram helps the implementation designer understand the modular network， but it is also desirable to describe the network in language format. Fig. 4 shows an FND written in FNL.

2.4 Syntax Parser

The bitstream syntax and parsing process is unique for each codec and usually includes entropy coding （variable-length decoding， arithmetic decoding）. Unlike in the modular CMC approach， the syntax parser module is less likely to be reused by other codecs and is highly codec-dependent. The parser is usually the first module to process the bit stream. In MPEG-RVC， the bit-stream parser description （BSD） is part of the decoder description. Each decoder description contains FND and BSD， and a parser module can be generated from the BSD. The BSD format is RVC bit-stream syntax description language （RVC-BSDL）， a variant of XML.

A syntax parser can run without necessarily engaging the decoding process. This means that the syntax parsing and entropy decoding process can be detatched from the decoding process， and conformance of bit-stream syntax to bit-stream semantics can be checked. In MPEG RVC， automatic generation of the bit-stream parser from the BSD is still an unresolved issue because generating the parser， including the entropy decoder， is difficult to describe in XML.

Fig. 5 shows an FND that includes all FUs needed to form an MPEG-4 simple-profile decoder. Each box is an FU. The FU on the far left is the syntax parser， which receives a bit stream and produces output tokens （e.g. entropy-decoded semantic data） for the other FUs.

CMC has two parts： framework and toolbox. Fig. 6 shows how different toolboxes can be used to generate a decoder based on the MPEG RVC framework. Other than toolbox 1， the other toolboxes may be proprietary or non-MPEG standards. This opens the way for non-MPEG organizations to use the RVC framework for their own codec implementations and for MPEG codec implementations such as decoder 1 and decoder 2. The AVS group in China supports RVC and multiple toolboxes.

The toolbox approach also extends to other types of media coding， such as graphics coding. In reconfigurable graphics coding （RGC）， the RVC framework supports graphics coding tools. Graphics coding is an area that can benefit from RVC. Many graphics applications are multimedia applications that encompass not only geometry data processing but also audio， image， and video data processing. To view a movie， two bit streams are needed， one for video and another for audio. For graphics applications such as games， many data sets need to be processed as components that include encoded graphics content. Many graphical object types share common coordinates， colors， and normals. As with many object-type compression methods， graphics data compression involves compressing primitives. For this reason， the division of codecs into modules is easy in graphics coding.

Fig. 7 shows an FND of an MPEG scalable complexity 3D mesh coding （SC-3DMC）. Many FUs are reused in order to decode attributes such as coordinates， colors， normals， and texture.

3 Future Research Directions

Although many years have been spent researching and standardizing CMC， this field is relatively young and there is much room for improvement. This is one reason why MPEG RVC continues. This section describes issues that are open for future research.

3.1 Model of Computation

Coding tools are usually represented by algorithms， reference implementations， and textual specifications. In any representation format， the MoC is implicitly defined； otherwise， it would be hard to understand how a coding tool operates for a given functionality. MoC may differ from implementation to implementation. If there are three consecutive statements， that is， no branch or loop， in a C code， three statements are executed in sequence. Sequential execution may not be guaranteed if the implementation is done in hardware. Parallel execution of three statements may be possible if the statements are independent of each other. The choice of MoC directly affects implementation complexity， and MoC must be chosen carefully.

During the development of MPEG RVC， there have been many discussions about how to define MoC. The consesus is that the reference implementation language， RVC-CAL， should be used as a model to understand MoC in MPEG RVC. To confirm this recommendation， more experiments should be conducted on how to describe a network of modules， how input and output tokens behave in the network， and how a generic description on different implementations can be guaranteed.

3.2 Parser Generation

Bit-stream syntax parsing， including entropy decoding， usually consumes 20 to 40 percent of the decoding time， and this makes the parser one of the most time-consuming modules in the decoder. It is difficult to design a parallel algorithm to speed up the bitstream syntax parser because the parsing process is sequential. This is not the case with other modules. It is also difficult to subdivide the bit-stream syntax parser， which is likely to be the largest module in CMC and outputs the largest number of tokens.

Despite the importance of parser generation， it is not an automatic process yet. In MPEG RVC， there is BSD in the decoder description. However， BSD is not directly used to generate the parser module， and this is called a built-in approach. While it can support existing codecs， the parser is less flexible in generating new codecs as needed. One reason parser generation is not automatic is because the parser includes entropy decoding algorithms， such as variable-length decoding and arithmetic decoding. Entropy decoding requires a complex procedural description， and it may be difficult to define a generic description for any implementation. Future research on automatic parser generation is necessary.

3.3 Design-Time versus Run-Time Generation of the Decoder

There are two distinct approaches in CMC： design-time codec configuration and run-time codec configuration. Most efforts have been focused on design-time configuration. Run-time codec configuration is a challenging issue because of the run-time requirements. In design-time configuration， defined modules may be complex in terms of implementation and computation. This is not the case for run-time configuration， where reasonable performance is expected.

3.4 Granularity of Modules

One of the frontier research areas in CMC is defining proper granularity when designing a module. A decoder can be regarded as a module， and dividing a decoder into modules is only beneficial if there is a gain in the divide-and-conquer strategy. This means the sum of all the processes of modules in a decoder should be less than or equal to that of the decoder. This problem is challenging because the cost may be different from implementation to implementation， from one set of division of modules to another， and from one platform to another. More research has been focused on the framework than on the modules.

3.5 Evolution of the Media Codec

One unrealized but very interesting objective of CMC is the evolution of media codecs through module upgrade. There has been recent discussion within MPEG of a royalty-free video coding standard for Internet applications. To date， it has been a very difficult create a royalty-free standard because standards depend on patent holders. Even if only one algorithm is not royalty-free， there are only a few limited ways to make the entire codec free： wait for up to 20 years until the patent has expired or design a codec standard that circumvents the patented algorithm. Both scenarios are very costly and seldom chosen. With a CMC framework， it is possible to pinpoint the algorithm in a module or set of modules in a decoder. The bypass standard has to include a new set of modules that do not have the patented algorithm. If this approach becomes common in standardization， the number of codecs will grow quickly， and it will be necessary to keep track of tools and their configurations. Although interesting， this idea is yet to be tested and implemented.

4 Conclusion

Standardization of the CMC framework has mostly been the work of MPEG. There are still many issues to be resolved before a dependable framework can be created and modules can be properly defined. MPEG’s research is important for fast and stable codecs in the future.

References

[1] MPEG homepage [Online]. Available： http：///

[2] Y.-K. Chen， Gwang-Gook Lee， Marco Mattavelli， Euee S. Jang， “Algorithm/Architecture Co-Exploration of Visual Computing on Emerging Platforms，” IEEE Trans. Circuits Syst. Video Techn， vol. 19， no. 11， pp. 1573-1575， 2009.

[3] Sinwook Lee， Taehee Lim， Euee S. Jang， Ji Hyung Lee， Seungwook Lee， “MPEG Reconfigurable Graphics Coding Framework： Overview and Design of 3D Mesh Coding，” in Proc. IEEE Visual Commun. And Image Processing （VCIP）， Taiwan， Nov. 2011.

Manuscript received： January 26， 2012

Biography

Euee S. Jang received his BS degree from Jeonbuk National University， Korea， and his PhD degree from the State University of New York （SUNY）， Buffalo. He is currently a professor at the College of Engineering， Hanyang University， Seoul. His research interests include image/video coding， reconfigurable video coding， and computer graphics objects. He has authored more than 150 MPEG papers and more than 30 journal and conference papers. He also has 35 patents， some of which are pending， and has contributed chapters to two books. He has received three ISO/IEC Certificates of Appreciation for his contributions to MPEG-4 development. He also received a Presidential Award from the Korean government for his contributions to MPEG standardization. Professor Jang is an IEEE Senior Member.

Configurable Media Codec Framework: A Stepping Stone for Fast and Stable Codec D

优秀范文