WO2012149904A1

WO2012149904A1 - Modeling method and system based on context in transform domain of image/video

Info

Publication number: WO2012149904A1
Application number: PCT/CN2012/075052
Authority: WO
Inventors: 武筱林; 牛毅
Original assignee: Wu Xiaolin; Niu Yi
Priority date: 2011-05-04
Filing date: 2012-05-03
Publication date: 2012-11-08
Also published as: CN104094607B; CN104094607A

Abstract

A modeling method and system based on the context in the transform domain of an image/video in the image/video coding/decoding and other processing technical fields. Compression or other processing can be performed on dynamic or static images by approximating a natural image to a two-dimensional Markov random field which has radial and reverse radial relevance simultaneously in the DCT transform domain or another transform domain and depicting the direction relevance of an image signal in a two-dimensional change domain using the model.

Description

Context-based modeling method and system in image/video transform domain

Technical field

The invention relates to a method and a system for image/video coding and decoding and other processing technologies, in particular to anisotropic correlation of a natural image in a certain transform domain (such as discrete cosine transform (DCT) domain). Two-dimensional Markov random field (Markov Random field) modeling method and its system.

Background technique

Current image/video processing techniques, such as compression, reconstruction, enhancement, analysis, etc., are mostly performed in the transform domain. Commonly used transform domains include discrete cosine transform (DCT), discrete Fourier transform (DFT), and Hada code transform. The effect of the transform is to concentrate the energy in the image/video signal among a small number of transform coefficients, thereby significantly reducing the statistical redundancy in the signal. The academic community has a variety of descriptions of the process: such as energy packaging (energy Packing), de-correlation or sparse representation of image/video signals, etc. But even in the transform domain, natural image/video signals are far from independent and identically distributed (i.i.d), but should be classified as Markov stochastic processes (Markov). Random Processes). Therefore, contextual statistical modeling of image/video signals in the transform domain becomes a key component in widely used image/video processing systems. The so-called context statistical modeling here refers to a method or process for estimating the conditional probability of a Markov or approximate Markov signal.

In transform-based image/video compression systems such as MPEG, JPEG, JPEG2000, H.264, contextual statistical modeling of image/video signals is undoubtedly critical to the rate-distortion performance of the system. A conditional probability of a transform coefficient used to drive an entropy coder (such as a context-based arithmetic coder, etc.) is predicted. In the entropy coding process, any prediction bias for the conditional probability will directly lead to a decrease in coding performance. To be precise, compared with the theoretical shortest code length ratio, the redundant code length caused by the prediction probability deviation is equal to the mutual information entropy between it and the true probability distribution (relative Entropy), or KL distance (Kullback-Leibler distance). Therefore, the accuracy of contextual statistical modeling ultimately determines the compression performance of the system.

Existing context modeling methods often take advantage of the fact that natural images have a rapidly declining power spectrum (assuming an exponential decrease in the literature). However, the power spectrum with fast drop itself cannot characterize the statistical properties of the image/video signal in the transform domain because the signal energy of the image and video is only one-dimensionally distributed in the frequency domain and two-dimensional in the transform domain. Or three-dimensional.

In particular, images with directionality such as boundaries and textures have characteristics of two-dimensional directional correlation by transform coefficients in a two-dimensional transform domain. The directional correlation of the above transform coefficients is performed on image pixel blocks containing boundaries. Dimensional discrete transformation is especially significant. For image blocks with boundary or regular textures, the signal energy is concentrated in the directional subband, as shown in Figure 1(a). And 1(c) shows the result of discrete cosine transform (DCT) of image pixel blocks containing different orientations.

For a block of pixels containing smooth shadows, the signal energy is concentrated in the low frequency region, while the DCT transform coefficients are attenuated at a radial and nearly identical speed, as shown in Figure 1(b). The visible natural image can be approximated to have radial (radial) in the discrete cosine transform (DCT) domain. Two-dimensional Markov random field model with direction and anti-radial direction correlation.

Through the above analysis, a variety of current international image video compression standards, such as the common DCT coefficient zigzag scanning method adopted by JPEG, MPEG and H.264, are inherently flawed. Zigzag scan is anti-radial Direction) Reciprocal scanning, completely ignoring the statistical correlation of natural images in the radial direction. In fact, in order to compensate for this defect, the MPEG and H.264 standards respectively propose horizontal and vertical scanning modes as a substitute for the Zigzag scanning method. However, the switching of the multi-scan mode is only a local and inflexible temporary scheme, and it is impossible to model the arbitrary direction correlation of the image/video signal in the transform domain, and the encoding of the scan mode also causes additional code rate overhead.

After searching for the prior art, Chinese Patent Document No. CN1741616, published on the date of 2006-03-01, describes a context-based adaptive entropy coding method, which includes the following steps: when encoding: scanning the current transform block The quantized DCT coefficients, thereby forming (level, run) pairs of sequences; then entropy encoding each pair of pairs in the inverse order of the scan, in the encoding, using the already coded block has been completed The values of the encoded pairs are dynamically adaptively constructed to construct the context statistics model. At the same time, a context model weighted fusion technique is proposed to further improve the compression performance of the model; the context statistics model obtained in the previous step is used to drive the entropy coding. The context-based adaptive entropy decoding method is the inverse of the encoding method. However, this technique has the following drawbacks and shortcomings: although the method uses a run-length method to combine and encode consecutive zero coefficients between two non-zero coefficients, the method does not combine other typical coefficients. Consolidation, and still failed to jump out of Zigzag's simple scanning method.

Chinese Patent Publication No. CN1431828, published on 2003-07-23, describes a "optimal scanning method for encoding/decoding an image signal" in a method of encoding an image signal by discrete cosine transform, At least one of the plurality of reference blocks is selected. A scan sequence is generated in which the blocks of the reference block to be encoded are scanned, and the blocks to be encoded are scanned in the resulting scan order. The at least one selected reference block is temporally or spatially adjacent to the block to be encoded. When scanning a block to be encoded, the probability of occurrence of non-zero coefficients is obtained from the at least one selected reference block, and the scanning order is determined in descending order starting from the highest probability. Here, if the probabilities are the same, the scan order is generated as a zigzag scan order. However, this method still fails to fundamentally solve the shortcomings of the Zigzag scanning mode, and its adaptive scanning of the transform coefficients is still performed for a single coefficient, and the coefficient blocks having the approximate properties are not merged. Therefore, the method The coding efficiency is still not satisfactory.

Summary of the invention

The present invention is directed to the above-mentioned deficiencies of the prior art, and provides a context-based modeling method and system thereof in a transform domain of an image/video (Method And System for Context Modeling of Images and Videos in Transform Domains), through adaptive block evolution (ABE: Adaptive Block) Evolution), unlike the existing Zigzag, horizontal or vertical scanning method, this method does not use a fixed one-dimensional scanning order, but adopts an adaptive two-dimensional scanning method, and the transform domain coefficients are simultaneously radial. Statistical modeling was performed with the inverse radial correlation.

In addition to being an effective tool in image video compression, the present invention can also be used to perform other image/video processing such as denoising, interpolation, classification, visual information retrieval and extraction, digital watermarking, information hiding, image retrieval, steganalysis, and the like. In the application.

The invention is achieved by the following technical solutions:

The present invention relates to a contextual statistical modeling method in a transform domain, which adaptively constructs a two-dimensional Markov model to reflect the directional correlation of an image/video signal in a two-dimensional variation domain, and the method specifically includes:

Combining multiple or individual transform coefficients into states in a two-dimensional Markov process;

Calculating the probability of transmission between two adjacent Markov states online or offline;

From the initial state, the adaptation of single or multiple traversal directions is achieved by the transfer probability, respectively.

The transform coefficients may or may not be quantized;

The state in the two-dimensional Markov process refers to one or a set of adjacent and related coefficients in the transform domain, the state being defined by a coefficient block of a typical mode;

The initial state refers to a coefficient block formed by a coefficient group having the same or similar values and close frequencies with the lowest frequency coefficient and/or the highest frequency coefficient as reference points; the intercontinental of the coefficient group is Generalized radius.

The transfer probability P ( S _i | S _{i -1} ) refers specifically to the transfer probability in the two-dimensional Markov process, which can be calculated by offline or online, wherein: S _i is the next state, S _{i - 1} is the current state, i represents the sequence number of the coefficient block of the transform coefficient currently traversed, and takes a value from 1 to the total number of states in the 2D Markov process.

The adaptation of the traversal direction refers to: selecting a next state according to a state corresponding to an optimal value of a transfer probability between a current state and all of its next possible states; for two or more initial states, respectively Perform the above adaptive calculation. The optimal value can be a maximum value, a minimum value, or other value that best meets the criteria requirements.

The invention relates to an image/video compression method based on transform domain context statistical modeling, comprising the following steps:

The first step is to perform coefficient transformation on the input image;

The second step is to perform context-based modeling on the transform coefficients and determine an initial state S ₀ ;

In the third step, online or offline, the two-dimensional Markov process is used to calculate and compare all possible transfer probabilities P ( S _i | S _{i -1} ) under state S _{i -1} , and based on the optimal value The next state S _i ;

The fourth step, the next state S _i obtained in the third step, drives the entropy encoder with its corresponding transfer probability P ( S _i | S _{i -1} ), and outputs the output code of the change coefficient corresponding to the state S _i , Then return to the third step to recalculate and compare according to the new state until all the transform coefficients are traversed and a complete code stream composed of all output codes is obtained.

The initial state encodes at least one coefficient or coefficient block in the lowest frequency domain and/or at least one coefficient or coefficient block and/or frequency domain in the lowest frequency domain with a generalized radius. A block of coefficients with distinct features in the section.

The distinct features are: two or more coefficient pairs or coefficient blocks that are adjacent and have the same value or are typically distributed.

The invention relates to an image/video compression system based on transform domain context statistical modeling, comprising: a transform module, an adaptive block evolution (ABE) module and an entropy coding module, wherein:

Transforming the module to perform coefficient transformation on the image;

The ABE module performs context-based modeling on the transform coefficients output by the transform module, and sequentially outputs the transfer probabilities obtained in each step of the traversal process to the entropy encoding module in an adaptive manner.

The entropy coding module entropy encodes and outputs the transform coefficients according to the transfer probability output by the ABE module.

The context-based modeling refers to: using one or a group of adjacent and related coefficients in the transform domain as the state of the two-dimensional Markov process, with the lowest frequency coefficient and/or the highest frequency coefficient as the reference point, The coefficient blocks formed by the clusters of coefficients having the same or similar values and close frequencies are gathered as the initial state of the model.

The adaptive manner refers to: calculating, from the initial state of the context-based model, a state corresponding to the highest value among the transfer probabilities among all next possible states as the next state.

Compared with the existing H.264 compression method, the present invention improves the coding efficiency, especially when the code rate is increased to 0.45 or above, the code stream length of the ABE coding system can be reduced to nearly 90%.

DRAWINGS

FIG. 1 is a schematic diagram of corresponding DCT transform coefficients of 16×16-size pixel blocks of different images in Embodiment 1 (the coefficient amplitude is indicated by the degree of shading).

2 is a comparison of a conventional DCT encoding system and an ABE encoding system of the present invention.

FIG. 3 is a block diagram of a plurality of DCT transform coefficient blocks of 8×8 size in a Markov state, specifically as shown in the gray area.

4 is a schematic diagram of state transition in the direction of high transfer probability, steps i to iv;

In the figure: (a) is a radial state change; (b) is a horizontal state change.

5 is a schematic diagram of a cross-sectional scan of a one-dimensional or two-dimensional direction of a transform domain;

In the figure: (a) is traversed from the low transform domain as the initial state; (b) Traversing from the high transform domain as the initial state; (c) traversing simultaneously from the low transform domain and the high transform domain, respectively.

Figure 6 is a collection of images used in a test experiment to verify the coding performance of the present invention.

Figure 7 shows the comparison between the coding performance of the present invention and the existing best coding system H.264.

detailed description

The embodiments of the present invention are described in detail below. The present embodiment is implemented on the premise of the technical solution of the present invention, and detailed implementation manners and specific operation procedures are given, but the scope of protection of the present invention is not limited to the following implementation. example.

Example 1 Two-dimensional Markov modeling

To further illustrate the innovation of the ABE method of the present invention, the following is an example of a plurality of image video compression standards and the most core processing in the system, that is, entropy coding of the quantized two-dimensional DCT coefficients. A block diagram of a conventional DCT coding system and an ABE coding system of the present invention is shown in FIG. Similar to the traditional DCT coding system, the ABE system also first inputs the image for DCT transformation and quantization, but unlike the Zigzag used in the traditional system, the ABE system adopts a more flexible adaptive block evolution method, which has a higher The coding efficiency.

After quantization, most of the non-significant DCT coefficients are quantized to zero, while the remaining non-zero, significant coefficients, tend to accumulate in a certain direction. On the contrary, a large number of non-significant coefficients (0) will fan-shaped along the direction from low frequency to high frequency. In order to improve the compression efficiency, the ABE method uses an ordered two-dimensional Markov model and the position of the adjacent pixel block sequence to decompose 0 and non-zero (for example, generate a weight map). The above-described two-dimensional Markov modeling process predicts the transfer probability P ( S _i | S _{i -1} ) from the current state S _{i -1} to the next state S _i . The state refers to a set of adjacent and related coefficient blocks in the transform domain. For example, a set of all-zero coefficient blocks in the highest frequency domain constitutes one state (as shown in Fig. 4(a)); and another set of all non-zero coefficient blocks in the lowest frequency domain constitutes another state (as shown in Fig. 4(b). )). More generally, a state is arbitrarily composed of a partial 0 coefficient and a non-zero coefficient (Fig. 3(c)).

As the entropy coding of the quantized transform domain DCT coefficients, the ABE method improves the transfer probability P ( S _i | S _{i -1} ) by transforming in the S _{i -1} -> S _i manner, and realizes scanning by means of the transverse transform domain. . The initial state S ₀ during the transformation may be a large coefficient block located in the high frequency domain and containing 0, or a large coefficient block located in the low frequency domain and containing non-zero, as shown in Fig. 4(b) and Fig. 4(a, respectively. ) shown. These all-zero or all-one initial states can be easily encoded by a generalized radius of integer values, which is more efficient than the end-of-block (EOB) method and is widely used in existing compression standards.

The intermediate state during the transverse scan of the two-dimensional transform domain consists of consecutive zero or non-zero coefficients located at the head of the current scan region. By performing contextual statistical modeling of the transformation of adjacent states, the ABE method achieves the use of correlations in the particular direction (mostly radial) of the transform domain.

Since the directivity of the transform coefficients is closely related to the directional features in the image block, when two adjacent states S _i and S _{i -1} are of the same type and the directions are the same, the probability of transmission P ( S _i | S _{i -1 )} ) will behave even higher. By utilizing this rule, the ABE method can achieve adaptive directional state transitions during traversal scanning or when compressing transform coefficients, as shown in FIG.

The propulsion direction for traversing the transform coefficients may be unidirectional: from high frequency to low frequency or from low frequency to high frequency, but may also be bidirectional, that is, the final convergence from the high frequency and the low frequency to the intermediate frequency respectively. Figure 5 is a schematic illustration of these three traversal modes.

By reasonably selecting the Markov state and implementing a high probability transform, the ABE method can be combined with a context based matching entropy coder to encode the transform coefficients with a shorter code length.

Embodiment 2 Image Compression Application

This embodiment includes the following steps:

The first step is to perform coefficient transformation on the input image;

The coefficient transformation may be performed by DCT transform or KLT (Karhunen-Loeve) transform, or by using other existing transform methods, and the following traversal manners, and can be verified by experiments and shown in FIG. 7 . Similar effect.

The coefficient transformation may also add a quantization process;

The initial state in this embodiment is encoded by a generalized radius, and the initial state may be any of the following:

i) a number of coefficient blocks of value 1 gathered in the lowest frequency domain, as shown in Figure 5(a);

Ii) a number of coefficient blocks of zero in the highest frequency domain, as shown in Figure 5(b);

Iii) A combination of the above i) and ii) as shown in Fig. 5(c).

The third step, online or offline, calculates and compares all possible transfer probabilities P ( S _i | S _{i -1} ) of the current state S _{i -1} by a two-dimensional Markov process, and based on the maximum value thereof A state S _i .

The fourth step, for the next state S _i obtained in the third step, drives the entropy encoder with the transfer probability P ( S _i | S _{i -1} ), outputs the output code of the change coefficient corresponding to the state S _i , and then returns The third step recalculates and compares according to the new state until all transform coefficients are traversed and a complete code stream of all output codes is obtained.

In the above steps, i is the total number of states in the 1 to 2D Markov process; if the number of the initial states is more than one, the simultaneous traversal in two or more directions may be implemented as shown in FIG. 5(c). Increase the coding speed.

In this application, an implementation system related to the above method includes: a transform module, an adaptive block evolution (ABE) module, and an entropy encoding module, where:

a transform module that performs coefficient conversion on the image such as DCT or KLT;

The ABE module performs context-based modeling on the transform coefficients output by the transform module, and selects the highest transition probability step by step in an adaptive manner. The corresponding state performs a jump of the Markov process, and finally traverses all the coefficients, and at the same time outputs to the entropy coding module in order to drive the entropy encoder during the traversal process.

The entropy coding module entropy encodes and outputs the transform coefficients according to the transfer probability of the stepwise output of the ABE module.

The conversion module may be accompanied by a quantization processing function;

The adaptive manner refers to: calculating the transfer probability between all the next possible states from the initial state of the context-based model, and selecting the next state based on the highest value among them.

This embodiment uses 38 common test images as shown in FIG. 6 to verify the coding performance of the ABE system of the embodiment 1. Each image is first subjected to an 8x8 DCT transform, and then a significant map of the DCT coefficients is used with different quantization steps. Map) to encode. For comparison, we use the H.264 encoder with the best effect to encode the same saliency map. In order to compare the two methods objectively, the entropy coding module uses the default adaptive binary arithmetic of H.264 encoder. Encoder CABAC. Also, the initial probability of all contexts is set to 0.5. It should be noted that in this example, the ABE encoding system uses three times the number of contexts of H.264. (378 vs 126), so the context dilution of the ABE system (context) Dilution) The punishment is more serious. Therefore, this initial setting method is actually more advantageous for the H.264 encoding system.

We compare the relative ratios of the lengths of the code streams generated by the two encoding methods: γ = ( L ₂₆₄ - L _ABE ) / L ₂₆₄ , where L _ABE represents the code stream length generated by the ABE encoding system, and L ₂₆₄ represents the H.264 system. The length of the generated stream. The γ of different images at different code rates is shown in Fig. 7. As can be seen from the figure, the ABE coding system is significantly more efficient than H.264, especially when the code rate is increased, the ABE coding system The advantage is even more obvious. In addition, tests have shown that the ABE system can be transformed into other coefficients, such as KLT transform, and other existing entropy encoders, such as the MQ encoder used in JPEG2000, and Huffman encoder, can be similar to Figure 7. Performance improvements.

Claims

A context statistical modeling method in a transform domain, characterized in that a two-dimensional Markov model is adaptively constructed to reflect the directional correlation of an image/video signal in a two-dimensional variation domain;

The described construction includes:

Combining multiple or individual transform coefficients into states in a two-dimensional Markov process;

Calculating the probability of transmission between two adjacent Markov states online or offline;

From the initial state, adaptation of single or multiple traversal directions is achieved by comparison of the estimated transfer probabilities.
The method of claim 1 wherein the state in said two-dimensional Markov process refers to one or a set of adjacent and related coefficients in the transform domain, the state passing through a coefficient block of a typical mode Make a definition.
The method according to claim 1, wherein said initial state is a coefficient group having the same or similar values and close frequencies with the lowest frequency coefficient and/or the highest frequency coefficient as reference points. The resulting coefficient block.
The method according to claim 1, wherein said transfer probability P ( S i | S i -1 ) is specifically a transfer probability in a two-dimensional Markov process, calculated by offline or online, Where: S i is the next state, S i -1 is the current state, and i represents the sequence number of the coefficient block of the transform coefficient currently traversed, which takes a value from 1 to the total number of states in the two-dimensional Markov process.
The method according to claim 1, wherein the adaptation of the traversal direction refers to: a state corresponding to an optimal value of a transfer probability between a current state and all of its next possible states as a lower One state; the above adaptive calculation is performed separately for two or more initial states.
An image/video compression method based on transform domain context statistical modeling, comprising the following steps:

The first step is to perform coefficient transformation on the input image;

The second step is to perform context-based modeling on the transform coefficients and determine an initial state S 0 ;

In the third step, online or offline, the two-dimensional Markov process is used to calculate and compare all possible transfer probabilities P ( S i | S i -1 ) under state S i -1 , and based on the optimal value The next state S i ;

The fourth step, the next state S i obtained in the third step, drives the entropy encoder with its corresponding transfer probability P ( S i | S i-1 ), and outputs the output code of the change coefficient corresponding to the state S i , Then return to the third step to recalculate and compare according to the new state until all the transform coefficients are traversed and a complete code stream composed of all output codes is obtained.
The image/video compression method according to claim 6, wherein said initial state encodes at least one coefficient or coefficient block and/or highest frequency domain in the lowest frequency domain by a generalized radius. At least one coefficient or coefficient block having a value of 0 and/or a coefficient block having significant features in any portion of the frequency domain.
The image/video compression method according to claim 7, wherein said distinct feature refers to two or more coefficient pairs or coefficient blocks that are adjacent and have the same value or are typically distributed.
The image/video compression method according to claim 6 or 7, wherein the initial state is any one of the following:

i) a number of coefficient blocks of value 1 gathered in the lowest frequency domain;

Ii) a number of coefficient blocks of zero in the highest frequency domain;

Iii) A combination of the above i) and ii).
The image/video compression method according to claim 6, wherein said optimum value is a maximum value, a minimum value, or other values most in accordance with a criterion requirement.
An image/video compression system based on transform domain context statistical modeling, comprising: a transform module, an adaptive block evolution Markov modeling module and an entropy encoding module, wherein:

Transforming the module to perform coefficient transformation on the image;

The adaptive block evolution Markov modeling module performs context-based modeling on the transform coefficients output by the transform module, and constructs and traverses the state of the Markov process step by step in an adaptive manner;

The entropy coding module entropy encodes and outputs the transform coefficients according to the transfer probability outputted by the adaptive block evolution Markov modeling module.
The image/video compression system according to claim 6 or 11, wherein said context-based modeling means: using one or a group of adjacent and related coefficients in the transform domain as a two-dimensional Marco The state of the process, with the lowest frequency coefficient and/or the highest frequency coefficient as reference points, the coefficient blocks formed by the coefficient groups having the same or similar values and close frequencies are used as the initial state of the model.
The image/video compression system according to claim 11, wherein said adaptive manner means: calculating a transfer probability between all of its next possible states from an initial state of the context-based model, and The next state is selected based on the state corresponding to the highest value among them.