Geometric Analysis of the Conformal Camera for Intermediate-Level Vision and Perisaccadic Perception Jacek Turski University of Houston-Downtown Department of Computer and Mathematical Sciences V2 with figures March 2009 Abstract A binocular system developed by the author in terms of projective Fourier transform (PFT) of the conformal camera, which numerically in- tegrates the head, eyes, and visual cortex, is used to process visual infor- mation during saccadic eye movements. Although we make three saccades per second at the eyeball’s maximum speed of 700 deg/sec, our visual sys- tem accounts for these incisive eye movements to produce a stable percept of the world. This visual constancy is maintained by neuronal receptive field shifts in various retinotopically organized cortical areas prior to sac- cade onset, giving the brain access to visual information from the saccade’s target before the eyes’ arrival. It integrates visual information acquisition across saccades. Our modeling utilizes basic properties of PFT. First, PFT is computable by FFT in complex logarithmic coordinates that approxi- mate the retinotopy. Second, a translation in retinotopic (logarithmic) co- ordinates, modeled by the shift property of the Fourier transform, remaps the presaccadic scene into a postsaccadic reference frame. It also accounts for the perisaccadic mislocalization observed by human subjects in labo- ratory experiments. Because our modeling involves cross-disciplinary ar- eas of conformal geometry, abstract and computational harmonic analysis, computational vision, and visual neuroscience, we include the correspond- ing background material and elucidate how these different areas interwove in our modeling of primate perception. In particular, we present the phys- iological and behavioral facts underlying the neural processes related to our modeling. We also emphasize the conformal camera’s geometry and discuss how it is uniquely useful in the intermediate-level vision compu- tational aspects of natural scene understanding. Keywords: The conformal camera, projective Fourier transform, com- plex projective geometry, intermediate-level vision, retinotopy, binocular vision, saccades, efference copy, predictive remapping, perisaccadic mislo- calization 1 arXiv:0908.3359v2 [cs.CV] 29 Aug 2009 1 Introduction In the last few years, we have developed projective Fourier analysis for compu- tational vision in the framework of the representation theory of the semisimple Lie group SL(2, C) [58, 59, 60, 61, 62]. It was done by restricting the group representations to the image plane of the conformal camera—the camera with image projective transformations given by the action of SL(2, C). This analysis provides an efficient image representation and processing that are not only well adapted to the projective transformations of retinal images, but are also to the retinotopic mappings of the brain’s oculomotor and visual pathways. This lat- ter assertion stems from the fact that the projective Fourier transform (PFT) is computable by a fast Fourier transform algorithm (FFT) in coordinates given by a complex logarithm that transforms PFT into the standard Fourier integral and at the same time approximates the retinotopic mappings [54]. However, the conformal camera is somewhat abstract and noticeably dif- ferent than any other camera model used in computer vision. Nevertheless, its remarkable advantages are revealed to us every time we model specific physiolog- ical processes involved in visual perception. For instance, one could reasonably expect that a stationary camera and a moving object is similar to a moving camera and a stationary because the relative position of the camera and the object could be the same in both cases. Remarkably, it fails in primate vi- sion systems. In fact, when the image of a fast-moving object sweeps across a static retina, though we are normally aware of its motion, we fail to detect the comparable motion of images as they sweep across the retina during fast eye movements. Computational modeling presented in this article demonstrates that the conformal camera naturally supports this asymmetry. Recently, building on projective Fourier analysis of the conformal camera, a mathematical model integrating the head, eyes, and visual cortex into a single computational binocular system was introduced in [63] with particular focus on stereopsis. Here it is demonstrated that this integrated system may efficiently process visual information during fast scanning eye movements called saccades, employed to build up understanding of a scene despite the highest acuity only present in the central foveal region of a 2 deg visual angle. We make about three saccades per second at the eyeball’s maximum speed of 700 deg/sec. Visual sensitivity is markedly reduced during saccades as we do not see moving retinal images. These fragmented pieces of visual information are sent to the cortical areas, with a minor part going to subcortical areas where they are integrated into a stable coherent percept of a 3D world despite of the persistance of incisive eyes movements. This constancy of vision is maintained by a widespread neural network with multiple mechanisms receiving inputs from several sources. Not surprisingly, in spite of a significant recent progress, how this problem is solved by the brain has been the topic of many theories, see [69] for a recent review. The modeling presented in this article, first proposed in [64], utilizes basic properties of PFT to capture some of the very first computational aspects of the neural processes during the saccadic eye movements. First, because the PFT of an image can be efficiently computed by FFT in complex logarithmic 2 coordinates that also approximate the retinotopy, the output from the inverse PFT resembles the cortical representation of the image. Second, a simple trans- lation in retinotopic (logarithmic) coordinates that is efficiently modeled here by the standard shift property of the inverse PFT when expressed in these co- ordinates, remaps the presaccadic scene in the reference frame centered on the fovea into a postsaccadic reference frame centered on the impending saccade target. Equivalently, it uniformly shifts images around the target in cortical periphery to the cortical foveal location. Moreover, this shift that takes place in retinotopic (logarithmic) coordinates accounts for perceptual space compres- sion seen around the time of saccadic eye movements by human subjects in psychophysical laboratory experiments [34, 53]. The idea of remapping is supported by the fact that the neural correlates of a copy of the oculomotor command to move eyes, known as efference copy or corollary discharge [24], have been found in the form of a neuronal receptive field shift about 50 ms before a saccade onset in various retinotopically organized visual cortical areas [18, 37]. This shift points to the possibility that prior to the eyes arriving at the target, the brain has access to visual information from that peripheral region. In fact, in the recent experiment [31], when human subjects shifted fixation to the clock, their reported time was earlier than the actual time on the clock by about 40 ms. It may integrate visual information from an object across saccades, and therefore, eliminate the need for starting visual information processing anew three times per second at each fixation and speed up a costly process of visual information acquisition [66]. It may also build up perceptual continuity across fixations [45]. The conformal camera was initially constructed for the purpose of developing projectively adapted image representation in the framework of the only well un- derstood ‘projective’ Fourier analysis formulated as a direction in the represen- tation theory of semisimple Lie groups, a great achievement of the 20th-century mathematics [29]. In the case of the conformal camera, it is the representation theory of the group SL(2, C), the group generating image projective transfor- mations in a conformal geometry setting; see [61] where a brief introduction to the group representations is also given. When writing this article, it became apparent that we should carefully set a stage for our modeling that involves conformal projective geometry, abstract and computational harmonic analysis, image processing, and computational vision including visual neuroscience and machine vision. Thus, the overarching aim of this article is to elucidate how these cross-disciplinary areas interwove in our modeling of primate perception. To this end, the paper is organized as follows. In the next section, we in- clude, in some detail, physiological and behavioral facts that underlie the neural processes of human vision related to our computational modeling. In the follow- ing three sections, we lay down the background that explains the mathematical tools we use in modeling human vision processes. In Section 3, we introduce the conformal camera and discuss the image projective transformations. We end this section with the construction of the group of image projective transforma- tions in the conformal camera. In Section 4, we review the geometry underlying the conformal camera and demonstrate that the fundamental properties of this 3 geometry should be uniquely useful in the early- and intermediate-level vision computational aspects of natural scene understanding. In the last of these three sections, Section 5, we show that the conformal camera possesses its own har- monic analysis—projective Fourier analysis—which gives efficient image repre- sentation well adapted to both the retinal image projective transformations and the retinotopy of the brain’s visual pathways. Finally, in Section 6, we discuss some implementation issues when working with the discrete PFT. In particular, the binocular system with head, eyes, and visual cortex numerically integrated by PFT is discussed. Further, using this integrated binocular system, we model the perisaccadic perception, including the perisaccadic mislocalizations observed in psychophysical laboratory experiments. This perisaccadic mislocalization, in the form of perceptual space compression around the saccade target, is simulated in the model by the standard shift property of Fourier transform. Also, the fu- ture direction in advancing our modeling and its implementation are discussed. The paper is summarized in the last section. The research program presented here advances our mathematical modeling intended for computational vision, including visual neuroscience and machine vision systems. It is guided by a strategy important in the contemporary neu- rocomputing research: linking known anatomical and physiological details with efficient computational modeling and engineering designs should be vital not only to the emerging field of neural engineering but also to interpreting relevant neurophysiological data. 2 Visual Neuroscience Background 2.1 Visual Perception is a Creative Process When light reflected from objects in the 3D world is impinged upon the retina, it activates the neuronal pathways, beginning with phototransduction by about 125 million photoreceptors. Next, the visual information passes through a multi- layered circuitry of the retina where substantial processing takes place. The only recently emerging picture [20] of the retinal processing tells us that more than a dozen of distinct visual recordings of the retinal image are extracted. For example, one recording emphasizes the boundaries between objects while another carries information about movement in specific directions. The result is that more than a dozen of the most essential features of the original retinal image are extracted in parallel and sent to the brain as a train of spikes along about 1.5 million axons of ganglion cells to more than 30 association cortex areas containing about 30 billion neurons where the details: depth, texture, color, form, motion, etc., are added and integrated into a coherent view of the 3D world. This integration is entirely dependent upon visual experience; almost all higher order features of vision are influenced by expectations based on past experience. Although such influences occasionally allow the brain to be fooled into misperception, as is the case with the optical illusion in Fig.1, they also give us the ability to see and respond to the visual world quickly. 4 Figure 1: This illusion created by Adelson illustrates how perception may reflect the complex properties of the environment. We see from this very brief description that visual perception is a creative process and, for this one reason alone, its quantitative modeling must be ex- tremely difficult. Therefore, we try to develop a model that captures only some of the very first computational aspects of visual perception that takes place the first seconds following the opening of our eyes in daylight. Even with this lim- ited goal, we find that those aspects are controlled by extremely sophisticated neural processes that involve nearly every level of the brain. 2.2 Early Visual Pathways When humans open eyes in daylight and direct their gazes to attend a scene, they only see with the highest clarity, the central part of about a visual angle of 2 deg. This region is projected onto the central fovea where its image is sampled by the hexagonal mosaic of photoreceptors consisting of mainly cone cells that are color-selective type of photoreceptors for a sharp daylight vision. The visual acuity decreases rapidly away from the fovea because the distance between cones increases with eccentricity as they are outnumbered by rode cells, photoreceptors for a low acuity black-and-white night vision. Moreover, there is a gradual loss of hexagonal regularity of the photoreceptor mosaic. For example, at 2.5 deg radius, which corresponds to the most visually useful region of the retina, acuity drops 50%. The distribution of axons in the optic nerve, which carries the retinal pro- cessing output to the brain, is precisely organized, but varies along the visual pathways. One aspect of this organization, or the retinotopy, is that axons corresponding to neighboring places in the retina are positioned closely in the nerve bundle, with notable exception along the vertical meridian. This excep- tion stems from the fact that the output of each eye splits along the retinal vertical meridian when the axons originating from the nasal half of the retina cross at the optic chiasm to the contralateral brain’s hemisphere and join the 5 temporal half, which remains on the same side of its eye-of-origin. This splitting and crossing re-organizes the retina outputs so that the left hemisphere desti- nations receive information from the right visual field, and the right hemisphere destinations receive information from the left visual field. According to the split theory [39, 43], which provides a greater understanding of vision cognitive pro- cesses than the bilateral theory of overlapping projections, there is a sharp foveal split along the vertical meridian of hemispherical cortical projections. Although it is crucial for synthesizing 3D representation from the binocular disparities in the pair of 2D retinal images, it presents a challenge in modeling retino-cortical image processing across visual hemifields. 2.3 Beyond Early Visual Pathways: Visuo-Saccadic Per- ception One of the most important functions of any nervous system is sensing the ex- ternal environment and responding in a way that maximizes immediate survival chances. For this reason, the perception and action have evolved in mammals by supporting each other’s functions. This functional link between visual percep- tion and oculomotor action is well demonstrated in primates when they execute the eye-scanning movements (saccades) in order to overcome the eye’s acuity limitation in building up the scene understanding (see Fig. 2). The saccadic eye movement is the most common bodily movement since we make about three saccades per second at the eyeball’s maximum speed of 700 deg/sec. The eyes remain relatively still (while undergoing tremors, drifts and microsaccades—a miniature, random eye movement important for proper functioning of eyes [44]) between consecutive saccades for about 180-320 ms, depending on the task performed. During this time period, the image is pro- cessed by the retinal circuitry and sent mainly to the visual cortex (starting with the primary visual cortex, or V1, and reaching higher cortical areas, including cognitive areas) with a minor part going to oculomotor midbrain areas. The sequence of saccades, fixations, and, often, also smooth-pursuit eye movements for tracking a slowly moving small object in the scene, is called the scanpath, first studied in [71]. In Fig. 2, (b) shows a progressively blurred image from (a), simulating the progressive loss of acuity with eccentricity. In Fig. 2 (c) we depict the scanpath that eyes might actually take to build up understanding of the scene. Although they are the simplest of bodily movements, the eyes’ saccades are controlled by widespread neural network that involves nearly every level of the brain. Most prominently, it includes the superior colliculus (SC) of the midbrain for representing possible saccade targets, the parietal eye field (PEF) and frontal eye field (FEF) in the parietal and frontal lobes of the neocortex (which obtain inputs from many visual cortical areas) for assisting the SC in the control of the involuntary (PEF) and voluntary (FEF) saccades. They also project to the simple neural circuits in the brainstem reticular formation in the midbrain that ensure the saccade’s outstanding speed and precision. 6 Figure 2: (a) San Diego skyline and harbor. (b) Progressively blurred image from (a) simulating the progressive loss of retinal acuity with eccentricity. The circle C1 encloses the part of the scene projected onto the high acuity fovea of a 2 deg diameter. The circle C2 encloses the part projected onto the visually useful faveal region of a 5 deg diameter. (c) A scanning path the eyes may take to build the scene understanding. Adapted from [2]. Remarkably, many of the neural processes involved in saccade generation and control are amenable to precise quantitative studies such that even questions regarding the operation of the whole structure can be addressed by building on the existing models [21]. This not only carries immense clinical significance [15], but also forms an essential preliminary stage in building our understanding of human vision, the knowledge that will eventually be transferred to the emerging field of neural engineering. Nevertheless, some neural processes of the visuo-saccadic system remain vir- tually unknown. Visual sensitivity is markedly reduced during saccadic move- ments as we do not see moving images on the retinas. This barely understood neural process is known as saccadic suppression. There is accumulating evidence that viewers integrate information poorly across fixations during tasks such as reading, visual search, and scene perception [50]. It means that, three times per second, there are instant large changes in the retinal images without almost any information consciously carried between images. Furthermore, because the next saccade target selection for the voluntary saccades takes place in the higher cor- 7 tical areas involving cognitive processes [22], the time needed for the oculomotor system to plan and execute the saccadic eye movement could take as long as 150 ms. Therefore, it is critical that visual information is efficiently acquired during each fixation period of about 300 ms without repeating much of the whole pro- cess at each fixation since it would require too much computational resources. However, visual constancy, the fact that we are not aware of any discontinuity in the scene perception when executing the scanpath, is not perfect. About 50 ms before the onset of the saccade, during saccadic movement (∼30 ms) and about 50 ms after the saccade, perceptual space is transiently compressed around the saccade target [34, 53], a phenomenon called perisaccadic mislocalization. We continue this discussion in Section 6.5 where we present our modeling of the perisacccadic perception based on projective Fourier transform of the conformal camera. 3 The Conformal Camera We model the human eyes’ imaging functions with the conformal camera, the name of which will be explained later. The camera has many remarkable prop- erties, the first following directly from its construction: the group of image projective transformations in the conformal camera is generated internally and has the ‘minimal’ property as explained in Fig. 3. In the remaining pages of this article, the other properties will be carefully examined in their relation to many computational aspects of visual perception. Figure 3: (a) Image projective transformations are generated by iterations of transformations covering translations ‘h’ and rotations ‘k’ of planar objects in the scene. (b) The 2D section of the conformal camera further explains how image projective transformations are generated and how the projective degrees of freedom are reduced in the camera; one image projective transformation in the conformal camera corresponds to different planar objects translations and rotations in the 3D world. In the conformal camera, the retina is represented by the image plane x2 = 1 with complex coordinates x3 + ix1, on which a 3D scene is projected under the 8 mapping j(x1, x2, x3) = (x3 + ix1) /x2. (1) The implicit assumption x2 ̸= 0 will be removed later. Next, we give the precise form of the ‘k’ and ‘h’ image transformations introduced in Fig. 3. 3.1 Basic Image Transformations The image projective transformations in the conformal camera are generated by the following two transformations: (1) an image is projected by  j|S2 (0,1,0) −1 into the unit sphere S2 (0,1,0) centered at (0, 1, 0), then the sphere is rotated and the (rotated) image is projected by j back to the image plane, (2) the image is translated out of the image plane then projected by j back to the image plane. The (1) and (2) transformations result in the ‘k’ and ‘h’ mappings in Fig. 3, respectively. They are explicitly given as follows: 1. k transformations: SU(2) = n α β −β α o is the maximal compact sub- group in SL(2, C), the group of 2 × 2 complex matrices of determinant 1. We let the group SO(3) of three dimensional rotations act on the sphere S2 (0,1,0) by rotating it about (0, 1, 0). Furthermore, we parametrize SO(3) by the Euler angles (ψ, φ, ψ′), where ψ is the rotation about the x2-axis, followed by the ro- tation φ about the y3-axis, whichis parallel to the x3-axis and passes through (0, 1, 0), and finally by the rotation ψ′ about the rotated x2-axis. Then, to each R(ψ, φ, ψ′) in SO(3) there correspond two elements in SU(2), k(ψ, φ, ψ′) = ±  ei(ψ+ψ′)/2 cos φ 2 iei(ψ−ψ′)/2 sin φ 2 ie−i(ψ−ψ′)/2 sin φ 2 e−i(ψ+ψ′)/2 cos φ 2  , (2) such that j ◦R(ψ, φ, ψ′) ◦  j|S2 (0,1,0) −1 (z) = k · z are given by the following linear fractional mappings k(ψ, φ, ψ′) · z = (e−i(ψ+ψ′)/2 cos φ 2 ) z + iei(ψ−ψ′)/2 sin φ 2 (iei(ψ−ψ′)/2 sin φ 2 ) z + ei(ψ+ψ′)/2 cos φ 2 . (3) 2. h transformations: Similarly, for each translation vector −→b = (b1, b2, b3) where b2 ̸= −1 acting on the image plane T− → b (x) = x+ −→b , there are two elements SL(2, C), h(b1,b2, b3) = ± (1 + b2)1/2 0 (b3 + ib1) (1 + b2)−1/2 (1 + b2)−1/2 ! (4) such that j ◦T− → b ◦(j|x2=1)−1 (z) = h · z are given by the corresponding linear fractional mappings by the same action as before, h(b1, b2, b3) · z = (1 + b2)−1/2 z + (b3 + ib1) (1 + b2)−1/2 (1 + b2)1/2 . (5) 9 Now, if f(z) is an image intensity function and g is either k or h mapping, the corresponding image transformation is the following: f(g−1 · z). Both k · z and h · z mappings have forms of special linear-fractional trans- formations g · z = αz + β γz + δ ; αδ −γβ = 1. These mappings are conformal, that is, they preserve the oriented angles of two tangent vectors z′ k(t0) to any two curves zk(t) (k = 1, 2) intersecting at the point q = z(t0). In fact, d dt αzk(t) + β γzk(t) + δ  t=t0 = z′ k(t0) (γq + δ)2 = eiχ(q)z′ k(t0) |(γq + δ)|2 ; k = 1, 2, (6) and both vectors z′ 1(t0) and z′ 2(t0) are rotated by the same angle χ(q). 3.2 The Group of Image Projective Transformations 3.2.1 The PSL(2, C) Group The group of image transformations in the conformal camera is generated by all finite iterations of k and h mappings. To derive this group, we recall that k ∈SU(2) and note that h ∈AN ⊂SL(2, C) if 1 + b2 > 0 and h = εAN ⊂ SL(2, C) if 1 + b2 < 0, where A =  ρ 0 0 ρ−1  , N = 1 0 ξ 1  , ϵ = −i 0 0 i  . (7) Now, it follows from the polar decomposition SL(2, C) = SU(2)ASU(2), that all these finite iterations result in the group SL(2, C) acting by linear-fractional mappings SL(2, C) ∋ a b c d  · z = dz + c bz + a; z = x3 + ix1 ≡(x1, 1, x3). (8) Because ± a b c d  have the same action, we need to identify matrices in SL(2, C) that differ in sign. The result is the quotient group PSL(2, C) = SL(2, C)/{±Id}, where Id is the identity matrix, and the action (8) establishes a group isomorphism between linear-fractional mappings and PSL(2, C). Thus, PSL(2, C) ∋g = a b c d  7−→f g−1 · z  = f  az −c −bz + d  (9) gives the image projective transformations of the intensity function f(z). 3.2.2 Conformality As we showed in (6), the mappings in (8) are conformal. Because of this prop- erty, the camera is called ‘conformal’. Although, the conformal part of an image 10 projective transformation can be removed with almost no computational cost, leaving only a perspective transformation of the image (see [61, 62]); the con- formality provides an advantage in imaging because the conformal mappings rotate and dilate the image infinitesimal neighborhoods, and, therefore, locally preserve the image ‘pixels’. To complete the description of the conformal camera, we need to address some implicit assumptions, such as the restriction −bz + d ̸= 0 in (9) we have frequently made in this section. 4 Geometry of the Conformal Camera In the homogeneous coordinate framework of projective geometry [6], the con- formal camera is embedded into the complex plane C2 = z1 z2  | z1 = x2 + iy, z2 = x3 + ix1  . In this embedding, the ‘slopes’ ξ of the complex lines z2 = ξz1 are numerically identified with the points on the extended image plane bC = C ∪{∞) where ∞ corresponds to the line z1 = 0. We note that if x2 ̸= 0 and y = 0, the slope ξ corresponds to the point x3 + ix1 at which the ray (line) in R3 that passes through the origin is intersecting the image plane of the conformal camera. Now, the standard action of the group SL(2, C) on nonzero column vectors C2,  z′ 1 z′ 2  =  a b c d   z1 z2  =  az1 + bz2 cz1 + dz2  implies that the slope ξ = z2 z1 is mapped to the slope ξ′ = z′ 2 z′ 1 = cz1 + dz2 az1 + bz2 = c + dξ a + bξ agreeing with the linear fractional mappings in (8). However, the action must be extended to include the line z1 = 0 of ‘slope’ ∞as follows: a b c d  · ∞= d/b, a b c d  · (−a/b) = ∞. The stereographic projection σ = j|S2 (0,1,0) (with j in (1)) maps S2 (0,1,0) bijectively onto bC and σ(0, 0, 0) = ∞gives a concrete meaning to the point ∞such that it can be treated as any other point of bC. Thus, geometry of the image plane bC of the conformal camera with the image projective transformations given by the group PSL(2, C) acting by linear-fractional transformations can be dually described as follows: 1. bC is the complex projective line, i.e., bC ∼= P 1 (C) where P 1 (C) =  complex lines in C2 through the origin 11 with the group of projective transformations PSL(2, C). Thus, the image pro- jective transformations acting on the points of the extended image plane (or simply, the image plane) of the conformal camera can be identified with pro- jective geometry (containing Euclidean geometry as a sub-geometry) of the one-dimensional complex line [6]. 2. bC is the Riemann sphere since under stereographic projection σ = j|S2 (0,1,0) we have the isomorphism bC ∼= S2 (0,1,0). The group PSL(2, C) acting on bC consists of the bijective meromorphic mappings of bC [32]. Thus, it is the group of holomorphic automorphisms of the Riemann sphere that preserve the intrinsic geometry imposed by complex structure, known as M¨obius geometry [27] or inversive geometry [13]. What we have just described shows the following fundamental property: pro- jective geometry underlying the conformal camera, also called M¨obius or inver- sive geometry, and holomorphic complex structure that provides the framework for the development of complex numerical analysis, are in fact two faces—one ‘geometric’ and the other ‘numerical’—of the same coin. We stress that the real projective geometry underlying the pinhole camera and usually employed in computer vision [49, 55] does not possess this fundamental property which sets apart our modeling of primate visual perception from other approaches. 4.1 The Conformal Camera and Visual Perception The image plane of the conformal camera does not admit a distance that is invariant under image projective (that is, linear-fractional) transformations. Therefore, geometry of the conformal camera does not possess a Riemannian metric; for instance, there is no curvature measure. As customary in complex projective (M¨obius or inversive) geometry, we consider a line as a circle passing through the point ∞. Then, the fundamental property of this geometry can be expressed as follows: linear-fractional mappings take circles to circles. Thus, circles can play the role of geodesics. Moreover, each circle carries a signature of curvature—the inverse of the radius. We showed before that linear-fractional mappings are conformal; we add here for completeness that stereographic pro- jection σ = j|S2 (0,1,0) is also conformal and maps circles in the sphere S2 (0,1,0) onto circles in bC. In conclusion, circles play a crucial role in the conformal camera geometry and it should be reflected in psychological and computational aspects of natural scene understanding if this camera is relevant to modeling primate visual perception. Neurophysiological experiments demonstrate that the retina performs filter- ing of impinged images that extract local contrast spatially and temporally. For instance, center surround cells at the retinal processing stage are triggered by local spatial changes in intensity referred to as edges or contours. This filter- ing is enhanced in the primary visual cortex, the first cortical area receiving, via LGN, the retinal output, which itself is a case study in dense packing of overlapping visual submodalities: motion, orientation, frequency (color), and 12 oculomotor dominance (depth). In psychological tests, humans easily detect a significant change in spatial intensity (low-level vision), and effortlessly and unambiguously group this usually fragmented visual information (contours of occluded objects, for example), into coherent, global shapes (intermediate-level vision). Considering its computational complexity, it is one of the most difficult problems that primate visual system has to solve [65]. The Gestalt phenomenology and quantitative psychological measurements established the rules, summarized in the ideas of good continuation [35, 68] and association field [19], that determine interactions between fragmented edges such that they extend along continuous contours joining them in the way they will normally be grouped together to faithfully represent a scene. Evidence accumu- lated in psychological and physiological studies suggests that the human visual system utilizes a local grouping process (association field) with two very simple rules: collinearity and co-circularity with underlying scale invariant statistics for both geometric arrangements in natural scenes. These rules were confirmed in [56, 16] by statistical analysis of natural scenes. Two basic intermediate-level descriptors that the brain employs in grouping elements into global objects are the medial axis transformation [10], or symmetry structure [40, 41], and the curvature extrema [3, 30]. In fact, the medial axis, which visual system extracts as a skeletal (intermediate-level) representation of objects [36], can be defined as the set of the centers of maximal circles inscribed inside the contour. The curvatures at the corresponding points of a contour are given by the inverse radii of the circles. From the above discussion we see that, on one hand, co-circularity and scale invariance emerge as the most basic concepts used by intermediate-level vision in solving the difficult problems of grouping local elements into individual objects of natural scenes. On the other hand, the non-metric projective geometry of the conformal camera that models eye imaging functions can be entirely constructed from circles such that co-circularity is preserved by projective transformations. Thus, it seems that the conformal camera would be very useful in modeling eye’s imaging functions related to the lower and intermediate-level natural vision. Other characteristics of the conformal camera that are uniquely useful in modeling primate visual perception are discussed in the remaining part of this article. Next, we briefly review the unity of geometry and numerical methods by showing that the conformal camera has its own projective Fourier transform (PFT). 5 Projective Fourier Analysis The projective Fourier analysis has been constructed by restricting geometric Fourier analysis of SL(2, C)—a direction in the representation theory of the semisimple Lie groups [33]—to the image plane of the conformal camera (see Section 5.1 in [62]). The resulting projective Fourier transform (PFT) of a given 13 image intensity function f(z) ∈L2(C) is the following bf(s, k) = i 2 Z f(z)|z|−is−1  z |z| −k dzdz (10) where (s, k) ∈R × Z and if z = x3 +ix1, then i 2dzdz = dx3dx1 is the Euclidean measure on the image plane. In this work, we consider only the noncompact picture of PFT and, for a complete mathematical account, that includes also the compact picture we refer to [62]. The noncompact and compact pictures in the case of Euclidean group correspond to the classical and spherical Fourier analyses, respectively (Section 3 in [61]). In the next remark we justify the name ‘projective Fourier transform’, and, for comprehensive account, we refer to [62]. Remark 1 The functions Πs,k(z) = |z|is  z |z| k ; s ∈R, k ∈Z are all one di- mensional unitary representations of the Borel subgroup B = MAN of SL(2, C), and they play in (10) the role complex exponentials play in the classical Fourier transform. These one dimensional representations are all finite unitary rep- resentations of the Borel subgroup B, as opposed to the fact that all nontrivial unitary representations of SL(2, C) are infinite. Furthermore, the group B ‘ex- hausts’ the projective group SL(2, C) by Gauss decomposition SL(2, C) .=NB, where ‘ .=’ means that the equality holds up to lower dimensional subset, that is, almost everywhere, and N in (7) represents Euclidean translations. In log-polar coordinates (u, θ) given by ln reiθ = ln r + iθ = u + iθ, bf(k, s) has the form of the standard Fourier integral bf(s, k) = Z Z f(eu+iθ)eue−i(us+θk)dudθ, (11) where we used i 2dzdz = e2ududθ. We see that a function f that is integrable on C∗= C\{0}, has finite PFT, bf(s, k) ≤ Z 2π 0 Z u1 −∞ f(eu+iθ)eududθ = Z 2π 0 Z r1 0 f(reiθ)drdθ < ∞. (12) Therefore, this f can be extended to C by f(0) = 0. Thus, in spite of the logarithmic singularity of log-polar coordinates, the projective Fourier transform of integrable functions on C is finite. This observation will be crucial when we discretize the PFT in the next section. Inverting (11), which is done in the (u, θ)-space, we get euf(u, θ) = 1 (2π)2 ∞ X k=−∞ Z bf(s, k)ei(us+θk)ds, (13) where f(u, θ) = f(eu+iθ). We stress that although f(eu+iθ) and f(u, θ) are numerically equal, they are given on different spaces; f(eu+iθ) is on the image 14 plane in polar coordinates and f(u, θ) is on the space defined by rectangular (u, θ)-coordinates. Finally, by expressing (13) in the z-variable, we obtain the inverse projective Fourier transform f(z) = 1 (2π)2 ∞ X k=−∞ Z bf(s, k)|z|is−1  z |z| k ds. (14) 5.1 Discrete Projective Fourier Transform To discretize the PFT we use the fact that bf(s, k) is finite for an integrable function f, see (12). By removing a disk |z| ≤ra, we can assume that the support of f(u, θ) is contained within (ln ra, ln rb)×[0, 2π). We approximate the integral in (11) by a double Riemann sum bf (2πm/T, n) ≈2πT NM M−1 X k=0 N−1 X l=0 eukf(eukeiθl)e−2πi(mk/M+nl/N) with M × N partition points (uk, θl) = (ln ra + kT/M, 2πl/N) ; 0 ≤k ≤M −1, 0 ≤l ≤N −1, T = ln(rb/ra). (15) Then, introducing fk,l = (2πT/MN)f(eukeiθl) and fk,l = (2πT/MN)f(uk, θl) (16) and defining bfm,n by bfm,n = M−1 X k=0 N−1 X l=0 fk,leuke−i2πmk/Me−i2πnl/N, (17) we obtain fk,l = 1 MN M−1 X m=0 N−1 X n=0 bfm,ne−ukei2πmk/Mei2πnl/N. (18) We note that bfm,n ≈bf (2πm/T, n) and refer to [28] for a discussion of numerical aspects on the approximation. Both expressions (17) and (18) can be computed efficiently by FFT algorithms since the exponents are taken at equidistant points. See simulation for a bar pattern in Fig. 4. Finally, on introducing zk,l = euk+iθl into (17) and (18), we arrive at the (M, N)-point discrete projective Fourier transform (DPFT) and its inverse: bfm,n = M−1 X k=0 N−1 X l=0 fk,l  zk,l |zk,l| −n |zk,l|−i2πm/T +1 (19) 15 Figure 4: Exp-polar sampling (the distance between circles partially displayed in the first quadrant changes exponentially) of a bar pattern on the retina is shown on the left. The bar pattern in the cortical space rendered by the inverse DPFT computed with FFT is shown on the right. The cortical uniform sampling grid, which is obtained by applying complex logarithm to the exp-polar grid in (a), is shown only in the upper left corner. and fk,l = 1 MN M−1 X m=0 N−1 X n−0 bfm,n  zk,l |zk,l| n |zk,l|i2πm/T −1, (20) now with fk,l = (2πT/MN)f(zk,l). The projectively adapted characteristics of the discrete projective Fourier analysis can be expressed as follows: f ′ k,l = 1 MN M−1 X m=0 N−1 X n=0 bfm,n z′ k,l |z′ k,l| !n |z′ k,l|i2πm/T −1, (21) where z′ k,l = g−1 · zk,l, g ∈SL(2, C) and f ′ k,l = (2πT/MN)f(z′ k,l). Although projective characteristics must be derived in z-coordinates, in prac- tical image processing, (21) should be expressed in log-polar coordinates to be fast computable by FFT. To this end, let (u′ m,n, θ′ m,n) denote log-polar coordi- nates of z′ m,n = eu′ m,neiθ′ m,n. In these coordinates, (21) is given by the following expression (see [61, 62] for details) f ′ m,n = 1 MN M−1 X k=0 N−1 X l=0 bfk,le−u′ m,nei2πu′ m,nk/T eiθ′ m,nlL. Thus, we can render image projective transformations in terms of projective Fourier transform of the original image only. 6 DPFT in Computational Vision We discussed before the relevance of the conformal camera to the intermediate- level vision task of grouping image elements into individual objects in natural 16 scenes. Here we want to discuss the relevance of the data model of image repre- sentation based on projective Fourier analysis to image processing in computa- tional vision, including visual neuroscience and biologically motivated machine vision systems. 6.1 Modeling the Retinotopy The mappings w = ln(z ± a) −ln a, with a > 0 and ±a indicating, for different signs, the left or right brain hemisphere, are accepted approximations of the topographic structure of primate primary visual cortex (V1) [54], where the parameter a removes the singularity of the logarithm. However, the discrete projective Fourier transform (DPFT) that provides the data model for retinal image representation, can be efficiently computed by FFT only in log-polar co- ordinates given by the complex logarithm w = ln z, the mapping with distinctive rotational and zoom symmetries: ln(eiθz) = ln z + iθ, ln(ρz) = ln z + ln ρ. Thus, we see that the Schwartz model of the retina comes with drastic conse- quences; it destroys rotation and zoom symmetries. We also recall that PFT in log-polar coordinates does not have a singularity at the origin, see (12). The following facts support our modeling with DPFT. First, for small |z| ≪ a, ln(z ± a) −ln a is approximately linear while, for large |z| ≫a, it is dom- inated by ln z. Secondly, to construct discrete sampling for DPFT, the image was regularized by removing a disc representing the fovea (see previous section). Thirdly, there is accumulated evidence pointing to the fact that the fovea and periphery have different functional roles in vision [51, 52, 70] and likely involve different image processes. Finally, by the split theory of hemispherical image representation, which we mentioned before, the foveal region has a discontinu- ity along the vertical meridian, with each half processed in a different brain hemisphere [39]. We note that the two hemispheres are connected by a massive bridge of 500 million neuronal axons called the corpus callosum. We conclude this discussion with the following remarks: both models our and Schwartz’ model in [54] (see Fig. 5), as well as all other similar models, are, in fact, fovea-less models [67].Furthermore, since the fovea is explicitly removed in our modeling, we expect to extend the present model to include foveal representation in the next stage of this modeling. In fact, the lack of the fovea in our modeling is one of the challenges that is stalling implementation of the model. We continue this discussion in Section 6.5.1. 6.2 On Numerical Implementation of DPFT The DPFT approximation was obtained using the rectangular sampling grid (uk, θl) in (15), corresponding, under the mapping, wk,l = uk + iθl 7−→zk,l = euk+iθl = rkeiθl 17 Figure 5: (a) Schwartz model of the retina: the strip of width 2a is removed and two half-maps of ln z are shifted to meet along the vertical meridian. (b) Our model: the fovea is removed and the retina is split along the vertical meridian, conforming to the split theory of the retino-cortical projection. to nonuniform sampling grid with equal sectors α = θl+1 −θl = 2π N , l = 0, 1, ..., N −1 (22) and with ring radii increasing exponentially ρk = rk+1−rk = euk+1 −euk = euk(eδ−1) = rk(eδ−1), k = 0, 1, ..., M −1, (23) where δ = uk+1 −uk. The radii rk = r0ekδ are given in terms of the spacing δ = T M and r0 = ra, where ra is the radius of the disc that has been removed to regularize logarithmic singularity, see (15). Lets assume that we have been given a picture of the size A × B in pixel units, which is displayed with K dots per unit length (dpl). Then, the phys- ical dimensions, in the chosen unit of length, of the pixel and the picture are 1/K × 1/K and A/K × B/K, respectively. Also, we assume that the retinal coordinates’ origin (fixation) is the picture’s center. The central disc of radius r0 represents the fovea with a uniformly distributed of grid points and the number of the foveal pixels Nf given by πr2 0 = Nf/K2. This means that the fovea cannot increase the resolution, which is related to the distance of the picture from the eye. The number of sectors is obtained from the condition 2π(r0 + r1)/2 ≈N(1/K), where N = [2πr0K + π]. Here [a] is the closest integer to a. To get the number of rings M, we assume that ρ0 = r0(eδ −1) = 1/K and rb = rM = r0eMδ. We can take either rb = (1/K) min(A, B)/2 or rb = (1/K) √ A2 + B2/2. Thus, δ = ln[(1 + 1/r0K] and M = (1/δ) ln(rb/r0). Example 2 We let A × B = 512 × 512 and K = 4 per mm, so the physical dimensions in mm are 128 ×128 and rb = 128 √ 2/2 = 90.5. Furthermore, we let Nf = 296, so r0 = 2.427 and N = 64. Finally, δ = ln(10.7084/9.7084) ≈ 0.09804 and (1/0.09804) ln(90.5/2.427) ≈M = 37. The sampling grid consists 18 of points in polar coordinates: (rk + ρk+1/2, θl + π/64) = (2.552ek0.09804, (2l + 1)π/64) k = 0, 1, ..., 36, l = 0, 1, ..., 63. In this example, the number of pixels in the original image is 262, 144, whereas the foveal (uniform sampling) and peripheral (log-polar sampling) rep- resentation of the image contain only 2, 664 pixels. We stress again that fk,l and fk,l are discretizations of the same image in different planes; fk,l are the image samples in the image plane sampled on a nonuniform grid eukeiθl , while the inverse DPFT output (18) gives the image samples fk,l on the uniform grid (uk, θl), where uk = ln rk. In summary, a simple description of the imaging model based on DPFT is as follows: an image (analog or digital) of a scene impinged on the retina is sampled on a nonuniform exp-polar grid, {rkeiθl}M×N, that approximates the density distribution of retinal ganglion cells, giving the set of pixels {fk,l}N×M. In this grid, the radial spacing changes exponentially: rk = raeδk, k = 1, 2, ..., M, and the angular spacing is constant: θl = αl, l = 1, 2, ..., N. As it was shown in Ex- ample 2, this sampling results in about 100 times less pixels than in the original image. To render {fk,l}N×M, the DPFT is formed and computed by FFT in log-polar coordinates (uk, θl) obtained by applying a complex logarithm as fol- lows: ln(raeδkeiαl) = ln ra +δk +iαl = uk +iθl, resulting in the set n bfk,l o M×N. Next, the IPFT is assembled and computed again by FFT, this time giving the image samples fk,l = fk,l rendered in cortical (log-polar) coordinates (uk, θl). 6.3 Relation to Other Numerical Approaches From the numerical approaches to foveate (or space-variant) vision, involving, for example, Fourier-Mellin transform or log-polar Hough transform, the most closely related to our work are results reported by Schwartz’ group at Boston University. We note that the approximation of the retinotopy by a complex logarithm (see Section 6.1) was first proposed by Eric Schwartz in 1977. This group introduced the fast exponential chirp transform (FECT) [11] in their attempt to develop numerical algorithms for space-variant image processing. Basically, both FECT and its inverse were obtained by the change of variables in both the spatial and frequency domains in the standard Fourier integrals. The discrete FECT was introduced somehow ad hoc, without references to nu- merical aspects of the approximation. Moreover, some basic components of Fourier analysis, such as underlying geometry or Plancherel measure was not considered. In comparison, projective Fourier transform (PFT) provides an effi- cient image representation well adapted to projective transformations produced in the conformal camera by the group SL(2, C) acting on the image plane by linear-fractional mappings. Significantly, PFT can be obtained by restricting geometric Fourier analysis of the Lie group SL(2, C) to the image plane of the conformal camera. Thus, the conformal camera comes with its own harmonic analysis. Moreover, PFT is computable by FFT in log-polar coordinates given by a complex logarithm that approximates the retinotopy. It implies that PFT 19 can integrate the head, eyes, and visual cortex into a single computational sys- tem. This aspect is discussed, with special attention to perisaccadic perception, in the remaining part of the paper. Another advantage of PFT is the complex (conformal) geometric analysis underlying the conformal camera. We discussed, in Section 4.1, the relation of this geometry to the intermediate-level vision prob- lem of grouping local contours into individual objects and the background of natural scenes. The other approaches to space-variant vision use the geometric transforma- tions, mainly based on a complex logarithmic function between the nonuniform (retinal) sampling grid and the uniform (cortical) grid for the purpose of devel- oping computer programs. These approaches can be classified into two different groups. The first group of problems deal with visualizing and classifying large information data sets. We give two examples for the first group. The first deals with the problem of mapping information space to the image space for navigation through complex two-dimensional data sets when viewing small details and at the same time the general overview [12]. The second gives the model based image processing in mathematical morphology for qualifying/segmenting/quantifying spots topology in genomic microarray-based data [1]. The second group of prob- lems is related to robotic vision. We give only a few examples of such problems, which include tracking [7], navigation [5], detection salient regions [57], and dis- parity estimation [42]. However, it seems that they share one common problem: high computational costs in the geometric transformation process. In the next figure, we show a simulation applied to Fig. 2 (a) with the software available over internet [8]. In Fig. 6, the San Diego skyline and har- bor shown in (a) is sampled in retinal exp-polar coordinates (with the vertical meridian deleted according to the split theory discussed before) and mapped by a complex logarithm transformation to rectangular log-polar coordinates (b). The inverse geometric transformation shown in (c) results in the retinal im- age that simulates the sampling by the ganglion cells density as a function of eccentricity. We note that the image processing presented here (see the last paragraph in the previous section) differs from the above simulation by one crucial aspect: we use projective Fourier analysis framework for image representation that provides low computational cost of the retino-cortical (logarithmic) transformation. 6.4 DPFT and Binocular Vision In order to carry out numerical experiments with the discrete PFT, the con- formal camera should work in the following setup: we get a set of samples fk,l = f(eukeiθl) of an image f from a camera with anthropomorphic visual sensors [9] or an ‘exp-polar’ scanner with the sampling geometry similar to the distribution density of the retinal ganglion cells. Next, we form DPFT bfk,l ac- cording to (17) and compute it with FFT. Then, we compute IDPFT of bfk,l given in (18), again with FFT. However this output from IDPFT renders the retinotopic image fk,l of the retinal samples in cortical log-polar coordinates. 20 Figure 6: (a) San Diego skyline and harbor. (b) Its log-polar image, the vertical meridian deleted, obtained by the geometric transformation of both the polar samples with the radial partition changing exponentially and a constant angular partition, into regular samples in log-polar rectangular plane (c). This setup provides an efficient model that integrates the head, eyes, and the cortex into a single computational system, which is introduced next. We discuss this integrated system by assuming that a 3D scene consists of a gray square with a red bar located in front of it (see Fig. 7). The integrated Figure 7: The scene consisting of a gray square with a red bar in front of it is seen by an observer. The visual pathway with the major cortical areas is shown. binocular system with eyes modeled by the conformal cameras and this scene as seen from above is shown in Fig. 8. A simulation of the integrated binocular system with the grey square-red bar scene can be seen in Fig. 9. Each eye sees the scene from a different vantage point ((a) and (c) in Fig. 9), as the eyes are separated laterally. The retinal projections are sampled on the exp-polar grid with the meridian line removed as implied by the split theory. The retinotopic images are simulated in Matlab using the program from 21 Figure 8: The head-eyes-visual cortex integrated system. Following from the fact that eyes are modeled by the conformal camera, theoretical horopters are conics that resemble empirical horopters. [8], and the cut-and-paste transformations are used to account for the global retinotopy topology. For example, the output from FFT computing the inverse DPFT of the scene projected on the right eye and sampled by ganglion cells is shown in Fig 10 (b). The cut-and-paste operation is applied to the output in Fig. 10 (b) and to the corresponding DPFT output of the left eye to obtain (f) in Fig. 9. 6.5 Modeling Perisaccadic Perception with DPFT Because of acuity limitations of foveate vision, a sequence of fast eye rotations is necessary for processing the details of the scene by fixating eyes consecutively on the targets of interest. The sequence of fixations, saccades and smooth pursuits, called the scanpath, is the most basic feature of foveate vision (cf., Fig 2). The fact that we do not see moving images on the retinas points to a poor integration of visual information across fixations during tasks such as reading, visual searching, or looking at a scene. Given the limited computational resources, it is critical that visual information is not only efficiently acquired during each fixation, but also that it is done without starting anew much of this acquisition process at each fixation. Although we are not aware of discontinuities in a scene perception when ex- ecuting a scanpath, this visual constancy is not perfect. In psychophysical lab- oratory experiments, the phenomenon of perisaccadic compression is observed: before the onset of the saccade, brief flashes are perceived by human subjects to be compressed around the impending saccade target [34, 53], see Fig. 11. However, perisaccadic perception experiments have revealed a multitude of mis- 22 Figure 9: In (a) and (c), the 3D scene from Fig. 8 is seen from a different vantage point by each eye (i.e., the conformal camera) due to eyes lateral dis- placement. The Matlab-simulated right and left retinal projections and the retinotopic image can be seen in (b), (d) and (f), respectively. localization phenomena, pointing to the involvement of many different neural processes. Accordingly, many different theories have been proposed, see [26]. Two computational theories of perisaccadic vision that have been proposed in visual neuroscience are related to our modeling. The first theory, suggested in [66], states that an efference copy generated by SC, a copy of an oculomotor command to rotate eyes in order to execute the saccade, is used to uniformly shift cortical neural activity representing spatial locations of the saccade target area toward foveal representation. It was proposed that this shift is reflected on the neuronal level by a transient spatial remapping of the receptive fields in numerous retinotopically organized cortical areas ([18, 37]), including the supe- rior colliculus (SC), parietal eye field (PEF), and frontal eye field (FEF). It can explain the perceived increase in spatial resolution around the saccade target as more foveal neurons are available there to process the details of objects. Fur- thermore, because the shift occurs in logarithmic coordinates that approximate retinotopy, the model can also explain perceived perisaccadic compression. The second theory, [38], explains the perisaccadic compression by directing spatial attention to the target of a planned saccade. The proposed computa- tional model assumes that the initial stimulus neuronal activity in the visual cortical area is distorted by the feedback of the retinotopically organized ac- tivity hill of the saccade target in the oculomotor SC layer, what pushes the population response of the flashed stimulus in retinotopically organized cortical 23 Figure 10: (a) The simulation of the rigth eye’s projected scene sampled by ganglion cells. (b) The retinotopic image of the sampled projection shown in (a); the vertical size corresponds to the lenght of the angular interval [−π, π] . areas (including PEF and FEF) towards the saccade target. This boost of per- formance at the target location of the saccade occurs immediately before the saccade onset increases spatial discrimination. The shift of the neuronal activ- ity in logarithmic coordinates, and hence perisaccadic compression, is a direct consequences of it. Because circuitry underlying receptive field remapping is widespread and not well understood, it cannot be easily decided whether saccadic remapping is the cause or consequence of saccadic compression. For example, only recently it was reported in [46] that a phenomenon very similar to the remapping occurs in extrastriate (V4, and, though progressively weaker, V3, V2 and V1) cortical areas in humans. Remarkably, remapping in extrastriate cortex could be func- tionally related to the integration of visual information from a constant object across saccades [23]. In this section, we model perisaccadic perception using the integrated binoc- ular system, addressing the process of presaccadic activity consisting shifts of neurons current receptive fields to their future postsaccadic locations, that is thought to underlie the scene remapping based on anticipated saccadic eye move- ment (efference copy) with the accompanied perisaccadic perceptual space com- pression. The postsaccadic activity during which actual integration of visual features takes place, will be considered in the next stage of our modeling. Al- though, our modeling directly conforms to the theory in [66], it may also be useful, on the image processing level, in representing the resulting receptive field shifts from ‘attentional multiplicative gain field interaction’ [38], especially since the efficiency of the whole modeling, which must be repeated three times per second, was not addressed by the authors. We start here by supplementing the integrated binocular system presented in Section 6.4 with the most important subcortical and cortical pathways of the visuo-saccadic neural processes. These pathways depicted by arrowhead lines in Fig. 12, include the SC of the midbrain, which contains retinotopically or- ganized visual and oculomotor layers, the PEF, and the FEF in the parietal and frontal lobes of the neocortex (which themselves obtain inputs from many visual cortical areas) for assisting the SC in the control of the involuntary (PEF) 24 Figure 11: The spatial pattern of perisaccadis compression. It shows experi- mental data of the absolute mislocalization (lower row), reference to the true position of flashed dot randomly chosen from an array of 24 dots and four dif- ferent saccade amplitude (upper row). Adapted from [38]. and voluntary (FEF) saccades. We also include the interhemispheric pathways, the corpus callosum (about 500 million of neuronal axons connecting cerebral cortical hemispheres), and the intercollicular commissure, because the coordi- nated movement of two eyes is a bihemispheric event. The motor commands that originate from the brain’s major hemisphere (the left hemisphere for most right-handed people) travel across the corpus callosum to the minor hemisphere then down to brainstem, where part of it again crosses to the other side of the brain before both eyes are finally moved in coordination [17]. We believe that building on the existing models [21] and accelerating advances in visual neuroscience will soon allow the inclusion of these pathways such that the oper- ation of a more complete system of perisaccadic perception can be addressed in numerical modeling in a way that could be useful in neural engineering designs. The course of events taking place during perisaccadic perception, shown in Fig. 12, is as follows: the eyes are fixated at F and the new stimulus appears at T. The SC population T′ at the retinotopic image of T (green spot in the left SC) calculates the position of the target T of an impending saccade. The SC also codes the motor command for the execution of the saccade. About 50 ms before the onset of the saccade, during the saccade (about 30 ms), and about 50 ms after the saccade, the visual sensitivity is reduced and flashes (dark blue dotes) around T are not perceived in veridical locations. Instead, a copy of the motor command (efference copy) is sent to translate the cortical image (light blue dots in V1) of flashes to remap it into a target-centered 25 Figure 12: The description is given in the text of the article. frame (red dotes in V1). This internal remapping results in the illusory compression of flashes, shown by red arrows. The compression is perceived around the incoming target T even though the eyes fixation is moving from F to T. The location of the cortical area of neural correlates of remapping is uncertain; it is required that this area is retinotopically organized. Although it could be PEF/FEF, here, for simplicity, this area is represented by V1. During the fixation of eyes at F, lasting on average about 300 ms, the image is sampled by ganglion cells fk,l = f(eukeiθl) and its DPFT bfk,l is computed by FFT in log-polar coordinates (uk, θl) where uk = ln rk. The inverse DPFT, computed again by FFT, gives a cortical image representation fk,l = f(uk, θl) = f(eukeiθl) 26 where disparity-sensitive cells contribute to the building 3D understanding of the scene. In the same fixation period, the next saccade’s target T is selected (PEF/FEF) and its position in respect to the fovea is calculated and converted into the motor command to move the eyes (SC). During that time interval of about 130 ms, the visual sensitivity is reduced, neural processes, using a copy of the eyes motor command (efference copy), transiently shift the cortical image. In our modeling, this shift is generated using the shift property of Fourier transform as follows f(uk + jδ, θl) = fk+j,l = 1 MN M−1 X k=0 N−1 X l=0 ei2πmj/M bfk,le−(uk+jδ)ei2πmk/Mei2πnl/N where δ is the corresponding spacing. It brings the presaccadic scene at F in fovea-centered coordinates into postsaccadic scene T in target-centered coordi- nates. However, f(uk + jδ, θl) = f(euk+jδeiθl) = f(ejδrkeiθl) compresses perceptual space. 6.5.1 Challenges with Implementation There are some problems that must be addressed before we can implement our modeling of primate perception. One problem is due to the fact that the model of retinotopy is fovea-less. The other is related to the global topology of retinotopy, and in particular, to the vertical meridian split in retinas (and hence in the visual field) of the brain’s hemispheric projections. In order to address the first problem, we need to develop a model of retino- topy that will include both foveal and peripheral regions. Hence, the projective Fourier transform that gives extrafoveal image representation must be comple- mented with a transform for the foveal image representation. Two different transforms, foveal and extrafoveal, could conform to the accumulated evidence indicating that the fovea and periphery have different functional roles in vision and may have visual processing differences [51, 52, 70]. Maybe the simplest way to construct the foveal image transform is restricting the group SL(2, C) action (which gives both image projective transformations and M¨obius geometry), to Euclidean or affine subgroups. We refer to Section 3.2 in [61] where Euclidean Fourier transform is introduced in the framework of representation theory to motivate the construction of PFT and can be seen as its ‘restriction’ to the Euclidean subgroup of SL(2, C). The affine subgroup could bring the wavelet transform to supplement PFT. The second problem involves two facts that are not compatible with each other: the computation of DPFT of an image in log-polar coordinates by FFT and the foveal split along the vertical meridian and partial crossing that re- organizes the retina outputs so that the left hemisphere destinations receive information from the right visual field, and the right hemisphere destinations receive information from the left visual field. The retina (that is, the image 27 plane of the conformal camera) with the foveal disc removed has the visual field representad by an annulus, which under the complex logarithm w = ln z is mapped into a rectangle. In order to discretize PFT, this rectangle must be extended periodically, which forces a quasiperiodic extension of the annulus, see Eq. 21 in [58]. In our numerical experiments with the image translation by the corresponding shift property of DPFT, the image ‘disappeared’ into the foveal region of the cortical area (the foveal region of the retina) to reappear from the opposite side of the rectangle (opposite circular boundary of the annulus). Also, we need to modify the FFT to account for the global retinotopy simulated in Fig. 9 (f) by the cut-and-paste transformations. Clearly, the two problems are interdependent. 7 Conclusions In this article we presented a comprehensive account of our approach to com- putational vision developed over the last decade. It was done by bringing in one place physiological and behavioral aspects of primate visual perception and the conformal camera’s computational harmonic analysis with the underlying geometry. This allowed us to discuss remarkable advantages that the conformal camera possesses over other cameras used in computational vision. First, the conformal camera geometry fully accounts for the basic concepts of co-circularity and scale invariance employed by human vision system in solving the difficult intermediate-level vision problems of grouping local elements into individual objects of natural scenes. Second, the conformal camera has its own harmonic analysis—projective Fourier analysis— for image representation and process- ing that is well adapted to image projective transformations and the retino- topic mapping of the brain visual and oculomotor pathways. Projective Fourier analysis integrates the binocular model consisting of the head, eyes (conformal cameras), and the visual cortex into a single computational system. Based on this binocular system, we proposed a model of the perisaccadic perception, in- cluding perisaccadic mislocalizations observed in laboratory experiments. More precisely, we modeled the presaccadic activity, which, through shifts of neurons current receptive fields to their future postsaccadic locations, is thought to un- derlie remapping based on anticipated saccadic eye movement (efference copy). The postsaccadic activity, during which the actual integration of visual features takes place, will be considered in the next stage of our modeling. Finally, we presented numerous challenges with the implementation of our modeling. First, the fovea-less model of the retina, based on the discrete pro- jective Fourier transform (DPFT) of an image, must be supplemented with the foveal image transform. Second, the computations of the DPFT with a fast Fourier transform algorithm (FFT) has to be modified in order to account for the global retinotopy of the brain visual pathway. It was observed that saccades cause, not only a compression of space, but also of time [47]. In order to preserve visual stability during the saccadic scan- path, receptive fields undergo a fast remapping at the time of saccades. When 28 the speed of this remapping approaches the physical limit of neural informa- tion transfer, relativistic-like effects are psychophysiologically observed and may cause space-time compression [14, 48]. Curiously, this suggestion can also be accounted for in our model based on projective Fourier analysis since the group of image projective transformations in the conformal camera is the double cover of the group of Lorentz transformations of Einstein’s special relativity. References [1] J. Angulo, Polar Modelling and Segmentation of Genomic Microarray Spots Using Mathematical Morphology, Image Analysis and Stereology, 27 (2008), pp. 107-124. [2] S. Anstis, Picturing peripheral acuity, Perception, 27 (1998), pp. 817-125. [3] F. Attneave, Some informational aspects of visual perception, Psychological Review, 61 (1954), pp. 183-193. [4] H. Awater and M. Lappe, Mislocalization of perceived saccade target po- sition induced by perisaccadic visual stimulation, Journal of Neuroscience, 26 (2006), pp. 12-20. [5] G. Baratoff, C. Toepfer, and H. Neumann, Combined Space-Variant Maps for Optical Flow Based Navigation, Biological Cybernetics: Special Issue on Navigation in Biological and Artificial Systems, 83 (2000), pp. 199-209. [6] M. Berger, Geometry I, Springer-Verlag, New York, 1987. [7] A. Bernardino, J. Santos-Victor, and G. Sandini, Foveated active tracking with redundant 2D motion parameters, Robotics and Autonomous Sys- tems, 39 (2002), pp. 205-221. [8] A. Bernardino, LogPolar Wrapper and LogPolar Mapper, Matlab codes available over the Internet, 2002. [9] F. Berton, G. Sandini, and G. Metta, Anthropomorphic Visual Sensors, in Encyclopedia of Sensors, Vol. X, C.A. Grimes, E.C. Dickey, and M.V. Pishko, ed., American Scientific Publishers, 2006, pp. 1-16. [10] H. Blum, Biological shape and visual science, Journal of Theoretical Biol- ogy, 38 (1973), pp. 205–287. [11] G. Bonmassar and E. L. Schwartz, Space-Variant Fourier Analysis: The Exponential Chirp Transform, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19 (1997), pp. 1080-1089. [12] J. B¨ottger, M. Balzer, and O. Deussen, Complex Logarithmic Views for Small Details in Large Context, IEEE Transactions on Visualization and Graphics, 12 (2006), pp. 845-852. 29 [13] D.A. Brannan, M.F. Esplen, and J.J. Gray, Geometry, Cambridge Univer- sity Press, Cambridge, 1998. [14] D. Burr and C. Morrone, Time Perception: Space-Time in the Brain, Cur- rent Biology, 16 (2006), pp. R171-R173. [15] R.H.C. Carpenter, The Saccadic System: A Neurological Microcosm, Ad- vances in Clinical neuroscience and Rehabilitation, 4 (2004), pp. 6-7. [16] C.C. Chow, D.Z. Jin, and A. Treves, Is the world full of circles?, Journal of Vision, 2 (2002), pp. 571-526. [17] I. Derakhshan, How do eyes move together? New understandings help ex- plain eye deviations in patients with stroke, Canadian Medical Association Journal: Neuroanatomy, 18 (2005), pp. 171-173. [18] J.R. Duhamel, C.L. Colby, and M.E. Goldberg, The updating of the rep- resentation of visual space in parietal cortex by intended eye movements, Science, 255 (1992), pp. 90-92. [19] D.J. Field, A. Hayes, and R.F. Hess, Contour Integration by the Human Visual System: Evidence for a Local ‘Association Field’, Vision Research, 33 (1993), pp. 173-193. [20] G.D. Field and E.J. Chichilnisky, Information Processing in the primate Retina: Circuitry and Coding, Annual Review Neuroscience, 30 (2007), pp. 1-30. [21] B. Girard and A. Berthoz, From brainstem to cortex: Computational mod- els of saccade generation circuitry, Progress in Neurobiology, 77 (2005), pp. 215-251. [22] P.W. Glimcher, Making choices: the neurophysiology of visual-saccadic decision making, TRENDS in Neuroscience, Rev., 24 (2001), pp. 654-659. [23] J. Gottlieb, From a Different Point of View: Extrastriate Cortex Integrates Information Across Saccades. Focus on ”Remapping in Human Visual Cor- tex”, Journal of Neurophysiology, 97 (2007), pp. 961-962. [24] O.J. Gr¨usser, On the history of the ideas of efference copy and reafference, Clio. Med. 33 (1995), pp. 35–55. [25] F.H. Hamker, M. Zirnsak, D. Calow, and M. Lappe, The Peri-Saccadic Perception of Objects and Space, PLOS Computational Biology, 4 (2008), pp. 1-15. [26] F.H. Hamker, M. Zirnsak, and M. Lappe, About the influence of post- saccadic mechanisms for visual stability on peri-saccadic compression of object location, Journal of Vision, 8 (2008), pp. 1-13. 30 [27] M. Henle, Modern Geometries. The Analytical Approach, Prentice Hall, Upper Saddle River, NJ, 1997. [28] P. Henrichi, Applied and Computational Complex Analysis, Vol. 3, Wiley, New York, 1986. [29] R.A. Herb, Harish-Chandra and his work, Bulletin of the AMS, 25 (1991), pp. 1-17. [30] D.D. Hoffman and W.A. Richards, Parts of recognition, Cognition, 18 (1984), pp. 65-96. [31] A. Hunt and P. Cavanagh, Clocking saccadic remapping, Journal of Vision, 8 (2008), pp. 818. [32] G. Jones and D. Singerman, Complex Functions, Cambridge University Press, Cambridge, 1987. [33] A.W. Knapp, Representation Theory of Semisimple Groups: An Overview Based on Examples, Princeton University Press, Princeton, NJ, 1986. [34] M. Kaiser and M. Lappe, Perisaccadic mislocalization orthogonal to saccade direction, Neuron, 41 (2004), pp. 293-300. [35] K. Kofka, Principles of Gestalt Psychology, Harcourt & Brace, New York, 1935. [36] I. Kovacs and B. Julesz, Perceptual sensitivity maps within globally defined visual shapes, Nature, 370 (1994), pp. 644-646. [37] B. Krekelberg, M. Kubischik, K.P. Hoffmann, and F. Gremmer, Neural correlates of visual localization and perisaccadic mislocalization, Neuron, 37 (2003), pp. 537-545. [38] M. Lappe, H. Awater, and B. Krekelberg, Postsaccadic visual references generate presaccadic compression of space, Nature, 403 (2000), pp. 892- 895. [39] M. Lavidor and V. Walsh, The nature of foveal representation, Nature Reviews: Neuroscience, 5 (2004), pp. 729–735. [40] M. Leyton, A theory of information structure I: General principles, Journal of Mathematical Psychology, 30 (1986), pp. 103-160. [41] M. Leyton, A theory of information structure II: A theory of perceptual organization, Journal of Mathematical Psychology, 30 (1986,), pp. 257-305. [42] R. Manzotti, A. Gasteratos, G. Metta, and G. Sandini, Disparity Esti- mation on Log-Polar Images and Vergence Control, Computer Vision and Image Understanding, 83 (2001), pp. 97-117. 31 [43] C.D. Martina, T. Guillaume, J-F. D´emonet, M. Robertsb, and T. Nazira, ERP evidence for the split fovea theory, Brain Research, 1185 (2007), pp. 212-220. [44] S. Martinez-Conde, S.L. Macknik, and D. Hubel, The Role of Fixational Eye Movements in Visual Perception, Nature Reviews: Neuroscience, 5 (2004), pp. 229-240. [45] D. Melcher, Predictive remapping of visual features precedes saccadic eye movements, Nature neuroscience, 10 (2007), pp. 903-907. [46] E.P. Merriam, C.R. Genovese, and C.L. Colby, Remapping in human visual cortex, Journal of Neurophysiology, 97 (2007), pp.1738–1755. [47] M.C. Morrone, J. Ross, and D. Burr, Saccadic eye movements cause com- pression of time as well as space, Nature Neuroscience, 8 (2005), pp. 950- 954. [48] M.C. Morrone, J. Ross, and D. Burr, Keeping vision stable: Rapid updating of spatiotopic receptive fields may cause relativistic-like effect, In Problems of Space and Time in Perception, R. Nijhawan, ed., 2008, Cambridge. [49] J.L. Mundy and A. Zisserman, eds., Geometric Invariance in Computer Vision, MIT Press, 1992. [50] J. Najemnik and W.S. Geisler, Optimal eye movement strategies in visual search, Nature, 434 (2005), pp. 387-391. [51] Y. Petrov, M. Carandini, and S. McKee, Two distinct Mechanisms of Sup- pression in Human Vision, Journal of Neuroscience, 25 (2005), pp. 8704- 8707. [52] J. Prado, S. Clavagnier, H. Otzenberger, C. Scheiber, H. Kennedy, and M- T. Perenin, Two Cortical Systems for Reaching in Central and Peripheral Vision, Neuron, 48 (2005), pp. 849-858. [53] J. Ross, M.C. Marrone, and D.C. Burr, Compression of visual space before saccades. Nature, 386 (1997), pp. 698-601. [54] E. L. Schwartz, Computational anatomy and functional architecture of stri- ate cortex, Vision Research, 20 (1980), pp. 645-669. [55] L.S. Shapiro, Affine Analysis of Image Sequences, Cambridge University Press, 1995. [56] M. Sigman, G.A. Cecchi, C.D. Gilbert, and M.O. Magnasco, On a com- mon circle: Natural scenes and Gestalt rules, Proceedings of the National Academy of Sciences of the United States of America, 98 (2001), pp. 1935- 194. 32 [57] N. Tamayo and V.J. Traver, Entropy-Based Saliency Computation in Log- Polar Images, Proceedings of the International Conference on Computer Vision Theory and Applications, 2008, pp. 501-506. [58] J. Turski, Projective Fourier analysis in computer vision: Theory and com- puter simulations, in Vision Geometry VI, SPIE Vol. 3168, R.A. Malter, A.Y. Wu, and L.J. Latecki eds, 1997, pp. 124-135. [59] J. Turski, Harmonic analysis on SL(2, C) and projectively adapted pattern representation, Journal of Fourier Analysis and Applications, 4 (1998), pp. 67-91. [60] J. Turski, Projective Fourier analysis for patterns, Pattern Recognition, 33 (2000), pp. 2033-2043. [61] J. Turski, Geometric Fourier Analysis of the Conformal Camera for Active Vision, SIAM Review, 46 (2004), pp. 230-255. [62] J. Turski, Geometric Fourier Analysis for Computational Vision, Journal of Fourier Analysis and Applications, 11 (2005), pp. 1-23. [63] J. Turski, Computational Harmonic Analysis for Human and Robotic Vi- sion Systems, Neurocomputing, 69 (2006), pp. 1277-1280. [64] J. Turski, Harmonic Analysis for Cognitive Vision: Perisaccadic Percep- tion, in Human Vision and Electronic Imaging XIV, SPIE Vol. 7240, B.E. Rogowitz and T.N. Pappas, eds, 2009, pp. 72401A-1 - 72401A-5. [65] S. Ullman, Higher-Level Vision: Object Recognition and Visual Cognition, MIT Press, 1996. [66] R. VanRullen, A simple translation in cortical log-coordinates may account for the pattern of saccadic localization errors. Biological Cybernetics, 91 (2004), pp. 131-137. [67] C.F.R. Weiman, Log-polar Binocular Vision System, Transition Research Corporation: NASA Phase II SBIRFinal Report, 1994. [68] M. Wertheimer, Laws of Organization in Perceptual Forms, Harcourt, Brace & Jovanovitch, London, 1938. [69] R.H. Wurtz, Neuronal mechanisms of visual stability, Vision Research, 48 (2008), pp. 2070-2089. [70] J. Xing and D.J. Heeger, Center-surround interaction in foveal and periph- eral vision, Vision Research, 40 (2000), pp. 3065-3072. [71] Y. Yarbus, Eye Movements and Vision, Plenum Press, New York, 1967. 33