A 3D Moviemap and a 3D Panorama

Copyright 1997 Society of Photo-Optical Instrumentation Engineers

This paper was published in SPIE Proceedings, Vol. 3012, San Jose, 1997, and is made available as an electronic reprint with permission of SPIE. Single print or electronic copies for personal use only are allowed. Systematic or multiple reproduction, or distribution to multiple locations through an electronic listserver or other electronic means, or duplication of any material in this paper for a fee or for commercial purposes is prohibited. By choosing to view or print this document, you agree to all the provisions of the copyright law protecting it.

A 3D Moviemap and a 3D Panorama Michael Naimark Interval Research Corporation Palo Alto, California

ABSTRACT
Two immersive virtual environments produced as art installations investigate "sense of place" in different but complimentary ways. One is a stereoscopic moviemap, the other a stereoscopic panorama. Moviemaps are interactive systems which allow "travel" along pre-recorded routes with some control over speed and direction. Panoramas are 360 degree visual representations dating back to the late 18th century but which have recently experienced renewed interest due to "virtual reality" systems. Moviemaps allow "moving around" while panoramas allow "looking around," but to date there has been little or no attempt to produce either in stereo from camera-based material.

"See Banff!" (1993-4) is a stereoscopic moviemap about landscape, tourism, and growth in the Canadian Rocky Mountains. It was filmed with twin 16mm cameras and displayed as a single-user experience housed in a cabinet resembling a century-old kinetoscope, with a crank on the side for "moving through" the material.

"BE NOW HERE (Welcome to the Neighborhood)" (1995-6) is a stereoscopic panorama filmed in public gathering places around the world, based upon the UNESCO World Heritage "In Danger" list. It was filmed with twin 35mm motion picture cameras on a rotating tripod and displayed using a synchronized rotating floor.

Keywords: moviemaps, panoramas, immersive virtual environments, art installations

1. CAMERA-BASED IMMERSION
"Immersion," in the context of media and virtual environments, is often defined as the feeling of "presence" or "being there," of being "inside" rather than "outside looking in." Various attempts have been made to taxonomize the elements required for presence, 1,2,3,4 but at the very least studies suggest that visual presence is directly related to field of view (FOV). 5,6 Stereopsis and orthoscopy (i.e., where the viewing FOV matches the recording FOV, thus maintaining proper scale) often enhance immersion. 7 A variety of special-venue film formats have been developed to deliver immersive experiences. 8 These experiences are camera-based and represent the physical world, but are linear and don't allow any interaction or navigability. Computer-based immersive virtual environments do allow interaction and navigability, but are restricted to whatever can be made into 3D computer models. Since such computer models are built from scratch, they often represent imaginary or fantasy environments. Making computer models of actual places has proven to be non-trivial, and even today's best models display unwanted artifacts.
The work described here represents something between filmic and computer graphic immersion. It is camera-based and the imagery is of the physical world (in the documentary or ethnographic tradition), but it also has elements of interactivity and navigability, be they constrained.

2. MOVIEMAPS
Moviemaps allow virtual travel through pre-recorded spaces. Routes are pre-determined and filmed with a stop-frame camera triggered by distance rather than by time (typically done via an encoder on a wheel). Distance-triggering maintains constant speeds during playback at constant frame rates, which is often not practical or possible during production with a conventional (time-triggered) movie camera. The result, in a very real sense, is the transfer of speed control from the producer to the end-user, who has control over frame rate through an input device like a joystick or trackball.
In addition to speed control, limited control of direction is possible by filming registered turns at intersections. By match-cutting between a straight sequence and a turn sequence, the user can "move" from one route to another. Care must be taken to minimize visual discontinuities such as sun position and transient objects (e.g., cars and people). The goal is to make the cuts appear as seamless as possible.

2.1 Past Moviemaps
The first interactive moviemap was produced at MIT in the late 1970s of Aspen, Colorado. A gyroscopic stabilizer with 16mm stop-frame cameras was mounted on top of a camera car and a fifth wheel with an encoder triggered the cameras every 10 feet. Filming took place daily between 10 AM and 2 PM to minimize lighting discrepancies. The camera car carefully drove down the center of the street for registered match-cuts. In addition to the basic "travel" footage, panoramic camera experiments, thousands of still frames, audio, and data were collected. The playback system required several laserdisc players, a computer, and a touch screen display. Very wide-angle lenses were used for filming, and some attempts at orthoscopic playback were made. 9

The author has since conceived and directed several moviemap productions, each with its own unique playback configuration. The "Paris VideoPlan" (1986) was commissioned by the RATP (Paris Metro) to map the Madeleine district of Paris from the point-of-view of walking down the sidewalk. It was filmed with a stop-frame 35mm camera mounted on an electric cart, filming one frame every 2 meters. An encoder was attached to one of the cart's axles. Rather than filming all the turn possibilities at each intersection, a mime was employed to stand in each intersection and simply point in the possible turn directions. The idea was to substitute the perceptual continuity of actual match-cuts with cinematic continuity. The playback system was built in a kiosk and exhibited in the Madeleine Metro Station.

The "Golden Gate Videodisc Exhibit" (1987) was produced for San Francisco's Exploratorium as an aerial moviemap over a 10 by 10 mile grid of the Bay Area. It was filmed with a gyro-stabilized 35mm motion picture camera on a helicopter, which flew at a constant ground speed and altitude along one-mile grid lines determined by LORAN satellite navigation technology, effectively filming one frame every 30 feet. The camera was always pointed at the center of the Golden Gate Bridge, hence no turn sequences were necessary since the images always matched at each intersection regardless of travel direction. The playback system used a trackball to control both speed and direction, with the feel of "tight linkage" to the laserdiscs. The result was the sensation of moving smoothly over the Bay Area at speeds much faster than normal.

"VBK: A Moviemap of Karlsruhe" was commissioned by the Zentrum fur Kunst und Medientechnologie (ZKM). Karlsruhe, Germany, has a well-known tramway system, with over 100 kilometers of track snaking from the downtown pedestrian area out into the Black Forest. A 16mm stop-frame camera was mounted in front of a tram car and interfaced to the tram's odometer. Triggering was programmed to be at 2, 4, or 8 meter increments per frame depending on location. Filming on a track resulted in virtually perfect spatial registration. The playback system consisted of a pedestal with a throttle for speed control and 3 pushbuttons for choosing direction at intersections. The camera had a very wide-angle lens (85 degree horizontal FOV) and playback employed a 16 foot wide video projection. The input pedestal was strategically placed in front of the screen to achieve orthoscopically correct viewing, resulting in a strong sense of visual immersion.

2.2 The "See Banff!" Kinetoscope
But the immersive experience with the Karlsruhe Moviemap was monoscopic. One might argue that binocular disparity is not an important factor for landscape imagery (e.g., compared to infinity focus or motion disparity), but no one had yet made a stereoscopic moviemap.

In 1992, the author was working on field recording studies in the "Art and Virtual Environments" program at the Banff Centre for the Arts, and after several attempts at making computer models from camera-based imagery, reverted back to using moviemap production techniques, but this time in stereo.10 The concept was to film scenic routes in and around the Banff region of the Canadian Rocky Mountains.

2.2.1 Camera Design and Production
The camera rig had to be small, portable, and rugged. As attractive as gyro stabilization may have been, it would have been much too heavy to take down mountain trails, on glaciers, and over narrow bridges. Without such stabilization, the stability of the imagery would be at the mercy of the terrain and, to some extent, the skill of the operator.

The basis of the camera rig was a 3-wheeled "super jogger" baby carriage, reinforced for extra sturdiness and modified to hold a tripod (see Figure 1). An encoder was installed on one of the rear wheels, with electronics for triggering the cameras from 1 frame every centimeter on up. A custom mount was built to hold two 16mm stop-frame cameras in parallel so that they could be released for film loading but would mount back in precisely the same position.

The cameras were fitted with the 85 degree horizontal FOV wide-angle lenses, with the intention of making a wide-angle orthostereoscopic display system.

The cameras were always triggered in sync. The stop-frame motors rotated at 1/8 second. The shutters were variable and modified to lock at 30 degrees, resulting in a shutter speed of 1/96 second, enough to freeze most motion if the rig moved at walking speed.

Figure 1. The "See Banff!" camera rig.

After much theoretical debate about optimal interocular distance between cameras, the minimum practical distance the cameras could be mounted apart was about 8 inches due to the size of the stop-frame motors. After reviewing several hundred turn-of-the-century landscape stereograms, it became clear that the exaggerated depth resulting from abnormally large interocular distances was the rule more than the exception, so no sleep was lost over our rig design.

The intent of production was to film a wide variety of unconnected routes without any intersections. The playback system would require speed control and route selection but not control over turns. This decision broadened the range of possible filming, since the notion of completion (i.e., covering all possibilities, filming turns, completing grids) wasn't necessary.

As it turned out, capturing the beauty of the landscape became overshadowed by the presence of tourists everywhere. Busload after busload appeared even in very remote areas. Dozens of people of all ages and cultures, clad in bright colors and toting cameras, wandered through the landscape. It became clear that, as an artwork, there was strength in counterpointing the beauty of the landscape with the actualities of tourism. The presence of tourists also created a lively foreground, giving the imagery a greater sense of 3D.

Filming took place during September 1993 in a wide range of locations. Stability ranged from finding smooth paved wheelchair trails to carrying the rig over rocky terrain. Frame rates were determined on-the-spot as a function of stability of the surface and distance of nearest objects, and ranged from 1 frame every centimeter to 1 frame every meter. Usually the cameras were pointed forward in the direction of movement but sometimes were pointed off to the side (and occassionally were stationary in a clock-driven timelapse mode). Over 120 scenes were filmed.

2.2.2 Playback System
Shortly after production, the film was transferred to videotape, edited, and transferred to 2 laserdiscs, one for each eye. A trackball-based interactive system was produced, with simple optics and mirrors arranged in a Wheatstone configuration for single-user stereoscopic viewing.

The first system approximated true orthoscopic viewing, with an 85 degree horizontal FOV. Such a wide FOV is considerably larger than sitting in the front row of the grandest of movie theaters, and the video resolution was extremely coarse spread out over such a large area. We backed off to a FOV closer to 60 degrees. It still looked very large and still appeared orthoscopically correct. (One is tempted to speculate that the human perceptual system acts like Saul Steinberg's "New Yorker's Map of the U.S., " that anything bigger or farther than that with which we are familiar appears so nonlinear that beyond some point it doesn't matter.)

The next step was to package it into a traveling exhibit. In roughing out a basic design - a podium-like box to house the hardware, an eyehood for single-user wide-angle orthostereo viewing, a one-dimensional input device - it became clear that a strikingly similar device had already been built. In April 1894, the Edison kinetoscope made its public debut. This was at a turbulent moment in the history of cinema, when the camera had already been actualized but projection had not. It was now exactly 100 years later, and the temptation to suggest an analogy was overwhelming (see Figure 2).

The final exhibit, called "See Banff!" (irony intentional), was built of walnut and brass in an authentic but exaggerated kinetoscope design, with a lever for selecting one of several scenes on exhibit and a crank for "moving through" the material. (History buffs will note that kinetoscopes never had cranks, mutoscopes did, in part because Edison was peddling another of his inventions, electricity and electric motors.) The crank was equipped with a force-feedback brake which freezes movement when the user reaches the boundaries of each scene, simulating film mechanics. (Perhaps not surprisingly, it feels "not right" when the brake is disabled.)

Partly for practical and partly for aesthetic reasons, a single laserdisc was used with field-sequential stereoscopic video and 30 Hz LCD shutters built into the optics. The flicker is noticeable, but after all, it is a kinetoscope.

3. PANORAMAS
The word "panorama" was coined in 1792 in London to describe the first of what became a popular form of public entertainment: a large elevated cylindrical room entirely covered by a 360 degree painting.11 From the beginning, a distinction was made between displaying a panoramic image all at once (circular or stationary panoramas) and over time (moving panoramas). This distinction has carried over to cinema as well.

3.1 "Moving Movies"
In 1977, the author began investigating what happens when a projected motion picture image physically moves the same way as the original camera movement. If the angular movement of the projector equals the angular movement of the camera, and if the FOVs are equal, spatial correspondence is maintained, and the result appears as natural as looking around a dark space with a flashlight.12 A simple demonstration was made using a super-8 film camera and projector on a slowly rotating turntable. Later a more complex system which recorded the pan and tilt axes was built. Finally a series of art installations were realized using the simple turntable inside a space furnished to resemble a livingroom, whose entire contents were spray-painted white after filming to become a 3D relief projection screen for itself.13 This phenomena of motion picture projection physically moving around a playback space was named "moving movies."

Moving movies have no lateral and only angular movements, which must be around the camera's nodal point. As such, they are strictly 2D spatial representations. Computer-based moving panoramas such as Apple Computer's Quicktime VR, Microsoft's Surround Video, and Omniview's Photobubbles (to name a few) are similar insofar as they rely on 2D representations from a single point of view, produced by tiling multiple images or by using a single fisheye lens. Making any 2D visual representation from more than one point of view results in distortion, by definition. (The essence of 2D photographic representation and indeed, photorealism, is strictly from a single point of view. Alternative forms like Cubism and David Hockney's photo-collages are fine counter-examples.)

A 2D panorama represents a single point of view, but stereopsis requires at least two points of view. Making stereoscopic panoramas with two 2D panoramic images is problematic. For example, if two panoramic images are taken with stationary cameras placed apart at an interocular distance, the disparity will vary as the user looks around (and will become zero on the axis between the 2 cameras). One the other hand, if the cameras move laterally during exposure to keep disparity constant, each camera's image will no longer represent a single point of view, resulting in distortion (e.g., circles become ovals, image appear fragmented, perspective becomes discontinuous).

3.2 "Be Now Here"
In December 1994, the author was invited to produce an art installation for the Center for the Arts at the Yerba Buena Gardens in San Francisco to open in December 1995. The installation proposed was called "Be Now Here (Welcome to the Neighborhood)." Just as the Banff kinetoscope was an experiment in making a stereoscopic version of interactively moving around, "Be Now Here" was to compliment it by making a stereoscopic version of interactively looking around.

The concept was to assemble an experimental camera system to film stereoscopic panoramas, then to go to public gathering places (commons) and film throughout the course of a day from a single position. The experience would be analogous to standing in a single place, with both eyes open, and being able to look around but not move from the spot.

Site selection was based on the "In Danger" list issued by the UNESCO World Heritage Centre in Paris.14 Of the 440 UNESCO-designated World Heritage Sites, 17 had been further designated "In Danger." Of these, 4 are cities: Jerusalem, Dubrovnik (Croatia), Timbuktu (Mali), and Angkor (Cambodia). With assistance of UNESCO, the plan would be to visit each site, to work with local collaborators and to determine the most representative public commons and a single spot in which to set up the camera system. (Partly for art and partly for research reasons, going into interesting but fragile environments to make an statement about "place" seemed appropriate.)

3.2.1 Camera Design and Production
The camera design was based on 2 cameras (for stereo), 60 degree horizontal FOV lenses (for immersion), and a slowly rotating tripod (for panoramics), rotating once per minute (1 rpm). This is a compromise since it takes a minute to capture an entire 360 degree scene, but using multiple camera pairs to capture the entire scene at once was not practical. Sunlight variation was assumed to be negligible during the course of a minute, so using multiple images from the same scene for projection or panoramic tiling would have artifacts only from moving objects. A 1 rpm closed-loop crystal synchronized motor was mounted on a tripod.

The question of how to arrange the cameras with respect to the axis of rotation resulted in lively debate. Mimicking mammal head rotation suggested placing both cameras symmetrically in front of the axis of rotation. A colleague, John Woodfill, had a novel suggestion: rotate the camera pair around the nodal point of one of the cameras. Such a configuration would result in a perfect 2D panorama from one camera and would place all of the disparity difference in the other camera. As a vision researcher interested in the footage, he felt this could be useful. The "Woodfill Configuration" was deployed, with careful manual determination of one of the camera's nodal points. (One might speculate that looking at a stereoscopic panning movie where one eye sees no disparity and the other eye sees all the disparity would be noticeable, but one could counter-speculate that it's possible to sit on a rotating stool with one eye directly over the axis of rotation and conclude that there's nothing special about it.)

After much deliberation, it was decided to use 35mm motion picture film. The resolution would be 4 times greater than 16mm film and much greater than video, particularly with respect to dynamic range. It is also well-known that 35mm motion picture cameras are simple yet durable and time-tested, with less likelihood of failure than video in the field. Arriflex cameras with Zeiss lenses were selected. As with Banff, the size of the cameras made it difficult to obtain normal human interocular distance, so an exaggerated interoccular distance of 8 inches was used.

It was further decided to film at a frame rate of 60 frames per second (fps). Such footage could be transferred to video with each film frame corresponding to a single video field (half-frame). The result would have the best qualities of both film and video: it maintains the high dynamic range of film while having the motion smoothness of video (which updates at 60 fps). A sync box made by Cinematography Electronics was used to synchronize both cameras to a single controller, which allowed syncing phase as well as frame rate. The shutters were closed down to 30 degrees, resulting in an exposure time of 1/720 second, fast enough to freeze most everything.

Color negative film daylight-balanced with an ASA of 50 was used. Since all filming was to take place during daylight, the low ASA coupled with fast shutter speed wouldn't be a problem, with most filming possible at apertures between F4 and F11.

Using stereoscopic 35mm cameras with high quality lenses and low-speed film, running at 60 fps, and with synchronized shutter and rotational speeds would result in unrivaled fidelity. It would, at the very least, have 5 times the resolution of theatrical 35mm film (twice the spatial and 2.5 times the temporal resolution).

The complete camera rig, including cases, weighed 500 pounds but was built for travel (see Figure 3). All production took place during October 1995, including filming the Yerba Buena Gardens in San Francisco for counterpoint. A pro-DAT audio recorder with a shotgun microphone was used to collect sounds at each site for later mixing into 4 channel rotating sound. Enough stock to film 5 panoramas (10 reels of 400' film) was taken to each site. Miraculously, production stayed on schedule and everything came out.15

Figure 3. The "Be Now Here" camera rig on location in Timbuktu.

3.2.2 Playback System
After production, the film was transferred to videotape, edited and mixed with the audio, then transferred to 2 laserdiscs. A simple input pedestal was made to allow site selection. Three scenes from each site were selected and aligned with each other such that perfectly registered time-of-day changes could be experienced within the same location.

Unlike Banff, Be Now Here would retain realtime motion and sacrifice browsability, enabling the user to control place and time but not speed. People movement would appear natural (not the case with See Banff) and coupled audio would be possible. This decision was based on the difference between footage of moving along long pathways (where browsability is desirable) and footage which, literally, goes around in circles.

A 12 by 16 foot highly reflective front projection screen would be used in conjunction with dual polarized video projection. The input pedestal would be strategically placed approximately 14 feet in front of the screen to entice the viewers to stand at the orthoscopically correct spot. The audience would wear inexpensive polarized glasses.

The obvious way to achieve spatially correspondent playback would be to rotate the projected image around the viewing space, another moving movie. But this proved impractical given the need for wide-angle projection. The solution: rather than rotate the projection around the static audience, to rotate the audience inside a static projection.

A 16 foot diameter rotating platform was used, rotating at 1 rpm in sync with the imagery (see Figure 4). The audience, limited to 10 at a time, stands on it (standing seems more desirable for ambient rather than narrative media). A black tent-like cylindrical structure surrounded most of the viewing space to mask out the non-rotating world.

Figure 4. The "Be Now Here" installation.

The resulting effect is difficult to describe. Most viewers reported that after several seconds it felt like they were still and that the image was rotating around them. This effect is similar to the "moving train illusion," when a train sitting in the station pulls out and observers on the adjacent (non-moving) train believe their train is the moving one. Some viewers reported feeling the rotational force from the turntable, but most did not. (In a NASA design study on space colonies, it was determined that 1 rpm was the maximum rotational rate which would be undetectable by the general population. 16) Though one might argue that physchophysical ambiguity exists between such audio-visual and vestibular cues, one could equally argue that a conventional panning image viewed in a conventional (non-rotating) movie theater produces the same degree of ambiguity between audio-visual and vestibular senses.

Most everyone reported feeling a strong visceral sense of place. And that's what the installation was about: conveying the feeling of presence by connecting our eyes and ears with the ground.

ACKNOWLEDGMENTS

The author wishes to express thanks and gratitude to the many collaborators 17,18 for See Banff and Be Now Here. See Banff was produced with the Banff Centre for the Arts. Be Now Here was produced for the Center for the Arts Yerba Buena Gardens in San Francisco with special thanks to the UNESCO World Heritage Centre in Paris. Both projects were entirely supported by Interval Research Corporation in Palo Alto.

REFERENCES

1. M. Naimark, "Elements of Realspace Imaging: a Proposed Taxonomy," SPIE Electronic Imaging Proceedings, vol. 1457, pp. 169-179, 1991.

2. T. B. Sheridan, "Musings on Telepresence and Virtual Presence," Presence, vol. 1, no. 1, pp. 120-126, 1992.

3. D. Zeltzer, "Autonomy, Interaction, and Presence," Presence, vol. 1, no. 1, pp. 127-132, 1992.

4. J. Steuer, "Defining Virtual Reality: Dimensions Determining Telepresence," Journal Of Communication, vol. 42, no. 4, 1992.

5. T. Hatada, H. Sakata, H. Kusaka, "Psychophysical Analysis of the 'Sensation of Reality' Induced by a Visual Wide-Field Display," SMPTE Journal, vol. 89, pp. 560-569, 1980.

6. C. Hendrix and W. Barfield, "Presence within Virtual Environments as a Function of Visual Display Parameters," Presence, vol. 5, no. 3, pp. 274-289, 1996.

7. E. M. Howlett, "Wide Angle Orthostereo," SPIE Electronic Imaging Proceedings, vol. 1256, 1990.

8. M. Naimark, "Expo '92 Seville," Presence, vol. 1, no. 3, pp. 364-369, 1992.

9. R. Mohl, Cognitive Space in the Interactive Movie Map: An Investigation of Spatial Learning in Virtual Environments, PhD dissertation, Education and Media Technology, M.I.T., 1981.

10. M. Naimark, "Field Recording Studies," Immersed in Technology: Art and Virtual Environments, M. A. Moser, ed., pp. 299-302, MIT Press, Cambridge, 1996.

11. A. Miller, "The Panorama, the Cinema, and the Emergence of the Spectacular," Wide Angle, vol. 18, no. 2, pp. 34-69, 1996.

12. M. Naimark, "Spatial Correspondence in Motion Picture Display," SPIE Proceedings, vol. 462, pp. 78-81, 1984.

13. M. Naimark, "Moving Movie," Aspen Center for the Arts, 1980; "Movie Room," Center for Advanced Visual Studies, M.I.T., 1980; "Displacements," San Francisco Museum of Modern Art, 1984.

14. See: http://www.unesco.org/whc/list.htm .

15. See: http://www.naimark.net/writing/trips/bnhtrip.html.

16. R. D. Johnson and C. Holbrow (ed.), Space Settlements: A Design Study, NASA SP-413, p. 22, 1977.

17. See: http://www.naimark.net/projects/banff.html .

18. See: http://www.naimark.net/projects/benowhere.html .