Localization is an important process in visual perception. Our assignment of surface characteristics (like roughness) to specific objects and surfaces (regions in our field of view) implies that visual perception is jointly local, localizing simultaneously in the spatiotemporal and feature domains.

We might refer to the representation of information underlying this process as a spatiotemporal/feature representation.

Localization also plays an important role in image communications, and in the design of image processing and compression algorithms. To take advantage of local correlation and visual masking in an algorithm, the underlying image sequence representation itself must be spatiotemporally local. Exploitation of the bounded frequency response of the human visual system requires localization in the frequency domain.

The ease with which surface characteristics and frequency content can be correlated, and the success of frequency analysis in modeling certain aspects of the visual system also makes the use of some type of frequency decomposition attractive.

This leads us to consider local, ** spatiotemporal/spatiotemporal- frequency (st/stf) representations**.

The use of st/stf representations implies the processing of the image sequence as a three-dimensional (spatiotemporal) volume of data (below).

sequence rendered as a volume via raytracing.

Although more complicated from the standpoint of implementation as compared to, e.g., motion compensated prediction, this approach has some interesting features. In particular, it allows the characteristics of motion in the frequency domain and aspects of temporal perception to be exploited.

In following, we will examine some of these issues via a widely known local frequency representation, the Gabor transform.

The Gabor representation was first introduced for the 1-D case [1], where the signal of interest is expressed as a weighted sum of functions of the form:

Results by Ebrahimi, et al. [8] indicate that the 2-D Gabor representation is useful for image sequence coding in cases where high compression ratios and modest image quality are required. The suitability of the 2-D Gabor transform for very high compression applications (bit rates below 10kbit/sec) is demonstrated in [9].

Particularly notable is the absence of blocking artifacts in the compressed, then reconstructed images, due to the inherent spatial locality of the Gabor functions.

An image sequence can be represented as the weighted sum of 3-D Gabor functions of the form:

The real and imaginary parts of a representative 3-D Gabor function are shown in Figures 2 and 3 (below).

typical 3-D Gabor function.

a typical 3-D Gabor function.

Because the transform is 6-dimensional (x,y,t,u,v and w) it can be somewhat difficult to visualize. One way to do so is to project the 6-D space onto a 3-D volume.

There are a number of ways that this can be done. If coefficients corresponding to basis functions with the same spatiotemporal frequency are grouped together, maintaining the spatiotemporal organization of the original sequence within each group, the structure shown in Figure 4 results.

(a) the arrangement of spatiotemporal-frequency blocks;

(b) the spatiotemporal arrangement within each block.

Figure 4a shows the arrangement of coefficient blocks at the same spatiotemporal- frequency. Directions of increasing u, v, and w are indicated by the coordinate axes. Within each block, coefficients corresponding to different spatiotemporal locations are organized as shown in Figure 4b.

It should be emphasized that the blocks shown are purely an aid to visualization. There is no blocking applied to the sequence itself, since the basis functions are themselves local.

In the following, a 3-D Gabor transform with basis functions spaced 8 pixels apart in x, y, and t, and pi/4 apart in u,v, and w was used, with spatial and temporal variances set to 5.66. A 256 by 256 pixel, 24 frame sequence (with an entropy of 7.61 bits/ pixel) was used as test data (Figure 5).

from the original test sequence.

As a means of examining the distribution of coefficient energy for the transformed sequence, the magnitude of the transform was calculated, and isosurfaces were rendered at 5% of the peak magnitude (Figure 6).

(isosurfaces rendered at 5% of peak coefficient magnitude).

As expected, coefficient energy is highly concentrated in the low spatiotemporal- frequency coefficients. Furthermore, where significant energy exists at nonzero frequencies, it is concentrated at spatiotemporal locations where motion occurs. Details of the energy concentration at low frequencies can be seen in Figure 7.

coefficients (5% isosurfaces).

As a preliminary estimate of the compression ratios that might be obtained using this representation, coefficient entropies were calculated after applying various thresholds, then rounding to the nearest integer. Selected frames from the reconstructed sequences for unthresholded coefficients, and for thresholds at .5% and 1% of peak-to-peak coefficient magnitude are shown in Figures 8, 9, and 10, respectively.

from rounded coefficients.

thresholded (.5%), then rounded coefficients.

sequence with a 1% threshold.

The results are summarized for these (and other) thresholds in Table 1.

St/stf representations are an interesting and powerful framework for the analysis, processing, and coding of image sequences. By integrating both spatiotemporal locality and spatiotemporal-frequency selectivity, they provide a means to account for phenomena in both domains. Because of their ability to exploit local image correlation and localize edge-induced coding error, without blocking artifacts and while maintaining locality in the spatiotemporal-frequency domain, representations of this type are promising candidates for use in future coding systems.

[1] D. Gabor. Theory of communication. Proceedings of the Institute of Electrical Engineers, 93(26):429-457, 1946.

[2] M. Porat and Y.Y. Zeevi. The generalized Gabor scheme of image representation in biological and machine vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(4):452-468, July 1988.

[3] J.G. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America A, 2(7):1160-1169, July 1985.

[4] S. Marcelja. Mathematical description of the responses of simple cortical cells. Journal of the Optical Society of America, 70(11):1297-1300, November 1980.

[5] M.A. Webster and R.L. De Valois. Relationship between spatial-frequency and orientation tuning of striate-cortex cells. Journal of the Optical Society of America, 2(7):1124-1132, July 1985.

[6] D.J. Field and D.J. Tolhurst. The structure and symmetry of simple-cell receptive-field profiles in the cat's visual cortex. Proceedings of the Royal Society of London, 228(1253):379-400, September 1986.

[7] J.P. Jones and L.A. Palmer. An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58(6):1233-1258, December 1987.

[8] T. Ebrahimi, T.R. Reed, and M. Kunt. Sequence coding by Gabor decomposition. In J. Torres, E. Masgrau, and M.A. Lagunas, editors, Signal Processing V: Theories and Applications, Proceedings of EUSIPCO-90, pages 769-772.

[9] A.E. Soohoo and T.R. Reed. Low-bit-rate coding of sequences using the Gabor transform. Proceedings of the SID'94 International Symposium and Exhibition, Society for Information Display, Digest of Technical Papers, Vol. XXV, No. 28.3, pp. 641-644.

Professor T.R. Reed / Department of Electrical and Computer Engineering / University of California / Davis, CA 95616 / trreed@ucdavis.edu