Localization plays an important role in human vision and in the design of image processing and compression algorithms [1]. To take advantage of local correlation and visual masking in an algorithm, the underlying image sequence representation itself must be spatially local. Exploitation of the bounded frequency response of the human visual system requires localization in the frequency domain. Representations which provide good localization in both domains are therefore promising candidates for compression systems.
The class of representations which provide this joint localization are referred to as spatial/spatial-frequency representations. The Gabor transform is one such representation.
The Gabor representation was originally proposed for one-dimensional signal analysis [2]. Extending it to two dimensions in a straightforward way, a 2-D signal can be represented as the weighted sum of 2-D Gabor functions of the form:
The real and imaginary parts of typical 2-D Gabor functions are shown in Figures 1 and 2 (below).
The Gabor functions and associated representations have a number of interesting properties. The use of Gabor representations for low bit rate applications has been examined by Ebrahimi, et al. in [5,6]. The Gabor transform has also been used effectively for still image compression [7].
The difference images are constructed by taking the difference between the new input frame and the reconstructed previous frame in order to control noise propagation in the reconstructed sequence. The 2-D Gabor transform is performed on each frame difference image, then thresholded using an experimentally derived mask, and quantized.
Figure 5 shows the structure of this threshold mask for the Gabor transform with basis functions with centers of support on 8x8 pixel spatial centers.
Note that because the Gabor transform is conjugate symmetric for real data, the matrix exhibits symmetry so that only 34 of the 64 spatial frequencies need be transmitted. These are shown as the shaded regions in the figure. (0,0) and (7,7) are the lowest and highest frequencies, respectively.
Figure 6 shows the experimentally derived threshold mask.
The values in the mask are the thresholds relative to the largest coefficient in the transformed difference frame. The real and imaginary parts of the thresholded coefficients are scaled by a quality factor, uniformly quantized, and the two least significant bits are dropped. All coefficients corresponding to thresholds of 1.0 are set to zero, so that only the shaded frequencies are transmitted.
In the experiments which follow, the lossless coder is replaced by calculation of the entropy of the thresholded and quantized transform coefficients.
The results of applying this coding technique to the 128x128 pixel, 8 bit/pixel monochrome Miss America sequence (Figure 7) are shown in Figure 8.
The reconstructed sequence (not including the initial frame) has an average entropy of 0.024 bits/pixel for a compression ratio of about 335:1. At 24 frames per second, this represents a bit rate of 9.4 kbits/sec. The initial frame of the sequence was compressed by a factor of 14 to 9.1 kbits. For the retransmission of the initial frame every 100 frames, the additional cost to the average bit rate is about two kbits/sec.
As would be expected at this bit rate, there are clearly visible artifacts. These artifacts are much less structured than typical blocking artifacts, however, and may therefore be less objectionable. They may also be reduced to some extent by postprocessing.
The coder above was a simple differential transform coder. An obvious question is whether motion compensation provides substantially improved performance. It is not clear, particularly at very low bit rates, that the reduction in the information in the difference frames will be sufficient to justify the overhead of transmitting motion vectors (and the increase in coder complexity). This point is currently under investigation.
In this demonstration, we have examined the use of the Gabor representation for the very low bit rate coding of sequences. We have shown that this approach can indeed lead to moderate image quality at very high compression ratios. Together with previous work involving high quality still images, this indicates the Gabor transform can be used in coding applications with a wide range of quality/bit rate requirements.
[1] T.R. Reed. Local frequency representations for image sequence processing and coding. In Digital Images and Human Vision - Proceedings of the National Academy of Sciences/National Research Council Committee on Vision Workshop on Visual Factors in Electronic Image Communications, A.B. Watson, editor, MIT Press, Cambridge, Massachusetts, p. 3-12, 1993.
[2] D. Gabor. Theory of communication. Proceedings of the Institute of Electrical Engineers, 93(26):429-457, 1946.
[3] M. Porat and Y.Y. Zeevi. The generalized Gabor scheme of image representation in biological and machine vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(4):452-468, July 1988.
[4] J.G. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America A, 2(7):1160-1169, July 1985.
[5] T. Ebrahimi, T.R. Reed, and M. Kunt. Sequence coding by Gabor decomposition. In J. Torres, E. Masgrau, and M.A. Lagunas, editors, Signal Processing V: Theories and Applications, Proceedings of EUSIPCO-90, pages 769-772, Elsevier Science Publishers B.V. (North-Holland), Barcelona, Spain, September 18-21 1990.
[6] T. Ebrahimi, T.R. Reed, and M. Kunt. Video coding using a pyramidal Gabor expansion. In Proceedings of Visual Communications and Image Processing '90, pages 489-502, Lausanne, Switzerland, October 2-4 1990.
[7] T.R. Reed. High-quality image compression using the Gabor transform. Proceedings of SID'93, pp. 792-795, Seattle, WA, May 18-20, 1993.