Audio Video Standard

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Audio Video Coding Standard (AVS) refers to the digital audio and digital video series compression standard formulated by Audio and Video coding standard workgroup of China according to the open international rules. At present, the formulation of two-generation AVS standards has been completed.[1]

The first generation AVS standard includes “Information Technology, Advanced Audio Video Coding, Part 2: Video” (AVS1 for short) and “Information Technology, Advanced Audio Video Coding Part 16: Radio Television Video” (AVS+ for short).

For the second generation AVS standard, referred to as AVS2, the primary application target is Ultra HD (High Definition) video, supporting the efficient compression of ultra high-resolution (4K above), HDR (High Dynamic Range) videos, and it has been submitted to the IEEE international standard (Standard No.: IEEE1857.4) for application.

The “AVS Patent Pool” provides one-stop authorization for AVS standard, which charges only a small amount of royalties for terminal products, excluding content providers and operators, the royalty for the first generation AVS standard is one yuan per terminal.[2]

In order to propel the development and promotion of the AVS standard, Huawei, TCL, Skyworth and other companies established Zhongguancun audiovisual industry technology innovation alliance (abbreviation: AVS industry alliance), which is devoted to the development and promotion of the AVS standard.[3]

Related organizations[edit]

AVS Workgroup[edit]

The AVS workgroup is the abbreviation for the digital audio and video coding standard workgroup, founded in June 2002, its mission is to cooperate with domestic enterprises and scientific research institutions, facing the requirements of the information industry, to formulate (revise) common technical standards such as digital audio and digital video’s compression, decompression, processing and representation, thus to provide efficient and economic coding/decoding technologies for digital audio and digital video devices and systems, serving the high-resolution digital broadcasting, high-density digital laser storage media, wireless broadband multimedia communication, Internet broadband streaming media and other major information industry applications. Currently the AVS workgroup is composed of 81 member units from universities, enterprises and scientific research institutions, which are headed by Gao Wen, the academician of Chinese Academy of Engineering, the professor and Ph.D. supervisor of Peking University, and the deputy director of the National Natural Science Fund Committee, consisting of requirement group, system group, video group, audio group, test group, intellectual property group and other departments. Since its establishment, the AVS working group has been persisting in formulating AVS series standards in accordance with open international rules. And two-generation AVS standards have been formulated so far.

AVS Patent Pool Management Committee[edit]

In the aspect of intellectual property management, AVS established a "Patent Pool" management mechanism, with the management and authorization of the patent pool in charge of “AVS Patent Pool Management Committee”, an independent corporate association founded in September 20, 2004, the committee is also the first "Patent Pool" management institution in China. Relying on the independent corporate association “Beijing Haidian District Digital Audio and Video Standard Promotion Center” registered in the Civil Affairs Bureau of Haidian District of Beijing City, it set up one-stop, low-cost patent authorization principles and management rules [4] for patent technologies included in the standard, as the expert committee and the main business decision-making institution of the promotion center. The royalty for the first generation AVS standard is only charged one-yuan per terminal, and the same mode will be adopted for the second generation, to charge a small amount of royalty only for the terminal, excluding the contents, as well as software services on the Internet.

AVS Industry Alliance[edit]

The AVS industry alliance is the abbreviation for Zhongguancun audio visual industry technology innovation alliance; in May 2005, twelve enterprises (units) of TCL Group Co., Ltd., Skyworth Group Research Institute, Huawei Technology Co., Ltd., Hisense Group Co., Ltd., Haier Group Co., Ltd., Beijing Haier Guangke Co., Ltd., Inspur Group Co., Ltd., Joint Source Digital Audio Video Technology (Beijing) Co., Ltd., New Pudong District Mobile Communication Association, Sichuan Changhong Co., Ltd., Shanghai SVA (Group) Central Research Institute, Zte Communication Co., Ltd., Zhongguancun Hi-Tech Industry Association, volunteered to jointly launch and establish the AVS industry alliance in Beijing, in order to propel the industrialization progress of AVS as soon as possible, and form a complete industrial chain and multi-manufacturer supply environment soon, further to inject a strong power into the development of Chinese audio and video industry. The organization's English name is "AVS Industry Alliance", referred to as "AVSA", constituting mutually independent and supportive "Three Carriages" with "AVS Workgroup" and "AVS Patent Pool Management Committee", the AVSA is committed to constructing a complete digital audio and video “technology→patent→standard→chip and software→whole machine and system manufacturing→digital media operation and culture industry” industry chain, creating a comprehensive breakthrough of the standard formulation, rapid technological progress and industrial leapfrogging development, achieving the overall rising of the digital AV industry, and forming a digital AV enterprise group with significant impact on the world. At present, the total number of alliance members is 117, including 81 standard members, and 36 industrial promotion members.

The first generation AVS standard[edit]

The first generation AVS standard includes Chinese national standard “Information Technology, Advanced Audio Video Coding, Part 2: Video” (AVS1 for short, GB label:GB/T 20090.2-2006) and “Information Technology, Advanced Audio Video Coding Part 16: Radio Television Video” (AVS+ for short, GB label: GB/T 20090.16-2016). The AVS video standard test hosted by the Radio and Television Planning Institute of SARFT (State Administration of Radio, Film, and Television) shows: if the AVS1 bitrate is half of MPEG-2 standard, the coding quality will reach excellent for both standard definition or high definition; if the bitrate is less than 1/3, it also reaches good-excellent levels. The AVS1 standard video part was promulgated as the Chinese national standard in February 2006.

During May 7-11, 2007, the fourth meeting of the ITU-T (The ITU Telecommunication Standardization Sector) IPTV FG made it clear that the AVS1 became one of the standards available for IPTV selection ranked with MPEG-2, H.264 and VC-1. On June 4, 2013, the AVS1 video part was issued by the most influential academic organization IEEE (Institute of Electrical and Electronics Engineers) in the field of international electronic information, with Standard Number IEEE1857-2013, marking that the AVS series of standards made an important step on the internationalization road.

AVS+ is not only the radio, film and television industry standard GY/T 257.1-2012 “Advanced Audio Video Coding for Radio and Television, Part 1: Video” issued by the SARFT on July 10, 2012, but also the enhanced version of AVS1, whose compression efficiency is neck and neck with that of the high profile of the same-class international standard H.264/AVC.[citation needed] Until now, Chinese AVS standard has landed in Sri Lanka, Laos, Thailand, Kyrgyzstan, and other countries, so that thousands of sets of HD contents applying AVS+ coding have been broadcast through satellite channels worldwide.

The second generation AVS standard[edit]

The second-generation AVS standard includes the series of Chinese national standard “Information Technology, Efficient Multi Media Coding” (AVS2 for short), the AVS2 mainly faces the transmission of extra HD TV programs, aiming at leading the development of the digital media industry in the next five to ten years, and striving to play a key role in the formulation of relevant international standards. At the same time of the promotion and application of the first generation AVS standard, the continued evolution work of AVS technology is being actively advanced, and the development of the second generation standard AVS2 technology has been completed, the SARFT issued AVS2 video as the industry standard in May 2016, and as the Chinese national standard on December 30, 2016. Currently, it has been submitted to the IEEE international standard (Standard Number: IEEE1857.4) for application.

The test of authoritative institutions shows, the coding efficiency of AVS2 is improved higher than doubled that of AVS+, and the compression rate surpasses the latest international standard HEVC (H.265). Compared with the first generation AVS standard, the second can save half transmission bandwidth, and will support the promotion and application of extra HD TV in the next few years.

AVS2 features[5][edit]

AVS2 adopts a hybrid-coding framework, and the whole coding process includes modules such as intra-frame prediction, inter-frame prediction, transformation, quantization, inverse quantization and inverse transformation, loop filter and entropy coding, it owns technical features as followings:

  • Flexible Coding Structure Partition
    • In order to satisfy the requirements of HD and Ultra HD resolution videos for the compression efficiency, AVS2 adopts a block partition structure based on the quadtree, including the CU (Coding Unit), PU (Prediction Unit) and TU (Transform Unit). An image is partitioned into LCU (Largest CU) of fixed size, which is iterated and partitioned into a series of CUs in the form of quadtree, each CU contains a luminance-coding block and two corresponding chrominance-coding blocks (the size of the block unit below refers to the luminance coding block). Compared with the traditional macro block, the partition structure based on the quadtree is more flexible, with the CU size extended from 8×8 to 64×64.
    • The PU stipulates all prediction modes of CU, and it is the basic unit for the prediction, including intra-frame and inter-frame prediction. The maximum size of PU is not permitted to exceed that of the current CU it belongs to, on the basis of AVS1 square intra-frame prediction blocks, the non-square intra-frame prediction block partition is added. Meanwhile, on the basis of the symmetric prediction block partition, the inter-frame prediction also adds 4 asymmetric partition ways.
    • Besides CU and PU, AVS2 also defines a transformation unit TU for the prediction of residual transformation and quantization. TU is the basic unit of transformation and quantization, defined in CU like PU, its size selection is related to the corresponding PU shape. If the current CU is partitioned into non-square PU, the non-square partition will be applied to the corresponding TU; otherwise, the square partition type will be applied. It should be noted that the size of TU could be greater than that of the PU, but no more than that of the CU it belongs to.
  • Intra Prediction Coding
    • Compared with the AVS1 and H.264/AVC, AVS2 designs 33 modes for the intra-frame prediction coding of luminance blocks, including DC prediction mode, plane prediction mode, bilinear prediction mode and 30 angel prediction modes. There are 5 modes for chrominance blocks: DC mode, horizontal prediction mode, vertical prediction mode, bilinear interpolation mode as well as the luminance derived mode (DM) newly added.
  • Inter Prediction Coding
    • Compared with AVS1, AVS2 increases the maximum quantity of candidate reference frames to 4, so as to adapt to the multi-level reference frame management, which also takes full advantage of the redundant space of the buffer.
    • In order to satisfy the requirements of multiple reference frame management, AVS2 adopts a kind of multi-level reference frame management mode. In this mode, the frames in each GOP (Group of Pictures) are partitioned into multiple levels according to the reference relationship between frames.
  • Inter Prediction Mode
    • On the basis of AVS1’s three image types I, P, B, according to the requirements of application, AVS2 adds the forward multi-hypothesis prediction image F. Aiming at the video surveillance, scene play and other specific applications, AVS2 designs scene frames (Image G and Image GB) and reference scene frame S.
    • For Frame B, in addition to traditional forward, backward, two-way mode and skip/direct mode, a new symmetric mode is added. In symmetric mode, only forward motion vectors are required to be encoded, and then backward motion vectors will be derived from the forward motion vectors.
    • In order to fully exert the performance of the skip/direct mode of Frame B, AVS2 also adopts multi-direction skip/direct mode under the premise of retaining the original skip/direct mode of Frame B: two-way skip/direct mode, symmetrical skip/direct mode, backward skip/direct mode and forward skip/direct mode. For the four particular modes, the same prediction mode block between adjacent blocks is discovered according to the prediction mode of the current block, and the motion vectors of adjacent blocks with the same prediction mode, which are found out first, will be considered as that of the current block.
    • For Frame F, coding blocks can refer to the two forward reference blocks, equivalent to the double hypothesis prediction of Frame P.
    • AVS2 divides the multi-hypothesis prediction into two categories, namely temporal and spatial multi-hypothesis mode.
    • The current encoding block of the time-domain double hypothesis applies the weighted average of prediction blocks as the current prediction value, but there is only one for both the MVD (Motion Vector Difference) and the reference image index, while another MVD and reference image index are derived from linear scaling based on the distance in the time domain.
    • The spatial-domain double prediction is also called DMH (Directional Multi-Hypothesis), which is obtained by fusing two prediction points around the initial prediction point, and the initial point is located in the line between the two prediction points. In addition to the initial prediction point, there are 8 prediction points in total, to be fused only with the two prediction points located in the same straight line with the initial prediction point. Besides four different directions, the adjustment will also be conducted according to the distance, and the four modes with 1/2 pixel distance and 1/4 pixel distance will be respectively calculated, plus the initial prediction point, to work out 9 modes in total for comparison, thus to select out the optimal prediction mode.
    • The scene frame is proposed by AVS2 based on the surveillance video coding method of background modeling. When the surveillance tool is not opened, Frame I is only for reference for images before the next random access point. When the surveillance tool is opened, AVS2 will apply a certain frame in the video as the scene image frame G, which can be considered as a long-term reference for the subsequent images.
    • AVS2 can generate the scene image frame GB with some frames in the video, and frame GB can also be applied as a long-term reference.
    • In order to simplify the motion compensation, AVS2 adopts an 8-tap interpolation filter based on DCT transformation, which requires only one filtering, and supports the generation of higher motion vector accuracy than 1/4 pixel.
  • Transformation
    • Transformation coding in AVS2 mainly applies integer DCT transformation, which is directly performed on the transformation blocks of Size 4×4, 8×8, 16×16, 32x32.
    • For one transformation block with dimension greater than 64, a logical transformation LOT is adopted to conduct the wavelet transformation, followed by the integer DCT transformation.
    • After the DCT transformation is achieved, AVS2 will conduct the second 4 x 4 transformation for the 4 x 4 blocks with low frequency coefficients, thus further to reduce the correlation between coefficients, and enable the energy to be more concentrated.
  • Entropy Coding
    • The AVS2 entropy coding divides transformation coefficients into CGs (Coefficient Group) of 4 x 4 size first, and then conducts encoding and zigzag scan according to the CGs.
    • Coefficient coding encodes the CG position containing the last non-zero coefficient first, and then encodes each CG, until all CG coefficients are completed, so as to enable zero coefficients to be more concentrated during the encoding process.
    • Binary arithmetic coding and two-dimensional variable-length coding based on the context are still applied in the AVS2.
  • Loop Filter
    • Loop filter modules of AVS2 contain three parts: deblocking filter, adaptive sample point offset and sample compensation filter.
    • The filtering blocks of the deblocking filter are of an 8×8 size, which conduct filtering on the vertical edge first, followed by the horizontal edge. And diverse filtering methods are selected for each edge according to different filtering intensities.
    • After the deblocking filter, the adaptive sample offset compensation is adopted to further reduce the distortion.
    • The AVS2 adds an adaptive filter after the deblocking filter and sample offset compensation, a Wiener filter with 7×7 cross plus 3×3 square centrosymmetry, which applies the original undistorted image and encoding reconstructed image to figure out the least square filter coefficient, and conduct filtering on the decoding reconstructed image, thus to reduce the compression distortion in the decoding image, and enhance the quality of the reference image.

AVS Implementation[edit]

uAVS2[6][7][edit]

uAVS2 is the world’s first HD real-time encoder based on the AVS2 standard, successfully developed by the digital media research center of Shenzhen Graduate School of Peking University, with performance dramatically beyond the x265 HEVC/H.265 encoder, which has eliminated technical obstacles to enable the AVS2 standard into industrial application. Subsequently, AVS2 Ultra HD real-time video encoder and mobile HD encoder have also been launched in succession.

OpenAVS2[8][edit]

OpenAVS2 is a set of mature industrial audio and video coding, transcoding and decoding kit based on the AVS2 standard, covering the mobile Internet, the Internet core applications and vertical industries, working out mature one-stop AVS2 audiovisual industry solutions.

References[edit]

External links[edit]