a version of this paper is published in the proceedings of ISEA
2002,
the International Symposium of Electronic Art, Nagoya, Japan.
VR Webcams: Time Artifacts as Positive Features
Michael Naimark
www.naimark.net
A spatially contiguous triptych of three different times of day in Timbuktu
Abstract
"Virtual Reality" and "webcams" are currently incompatible
suppositions, placing sensory richness in opposition to liveness. Large
immersive images, sent through a "narrow pipe" such as today's
Internet, must "accumulate" over time. Time artifacts result,
since not everything can be transmitted at the same time.
Such time artifacts were explored using visual material from a previous
art installation, filmed with a custom-built camera system, where such
factors as frame rate, lens angles, and panning speed were known. Though
the footage was pre-recorded, it approximated what a live "VR webcam"
could be.
Scenes of the same places at different times of day were combined in
various ways to simulate "narrow pipe" time artifacts. Studies
produced from this footage suggest that time artifacts, while reducing
the verisimilitude of the imagery, can increase its density or activity.
In such "hyper-real" images, "more" can "happen."
A "VR Webcam" is proposed.
Introduction
In 1560, Flemish painter Pieter Bruegel the Elder painted "Children's
Games," depicting, like much of Bruegel's work, everyday life
[1]. In it, over a hundred children can be seen actively playing dozens
of games in a village square. Though the scene may have actually taken
place, we know it didn't. Too much is happening at once, and all the
action is perfectly composed. Not even Cecil B. DeMille could have created
such a scene with a set and live actors. We assume that "Children's
Games" is a realistic representation of an unrealistic event, an
aggregate composition based on a "accumulation" of instances
in Bruegel's memory or in his imagination.
In 1979, American cartoonist Robert Crumb drew "A
Short History of America," depicting, in 12 frames, the progression
of a single place from an untouched meadowland to a frontier village
to an American street corner complete with convenience store and a clutter
of power lines [2]. Even as a cartoon, the details are comprehensive.
If a camera existed in the early days of colonial American history,
and was positioned motionless in the same place for 200 years, "A
Short History of America" could have been a time-lapse film.
Bruegel's painting and Crumb's cartoon are both place-based works depicting
"accumulated" views, Crumb, over time in a progression of
frames, and Bruegel, all at once. The elements that are accumulated,
in theory, can be stored as separate data, and added or deleted interactively.
This class of "hyper-real" imagery may be a model for cameras
on the Internet.
The VR/Webcam Dilemma
The dream of "virtual reality" and the reality of "webcams"
could not be further distanced. We associate VR with multi-sensory,
high-bandwidth, immersive, interactive experiences, while webcams are
associated with postage-stamp size images that rarely update faster
than once per second. While the attraction of VR is sensory richness,
the attraction of webcams is liveness.
This dilemma exists for several reasons, such as the need for rich,
immersive source material and the need for immersive display technology,
but the most prominent reason is due to the narrow pipe of the Internet.
Consider that a good Internet connection speed for the home (e.g., DSL
or cable modem) is rarely higher than 1 megabit per second. Uncompressed
high definition television is one thousand times higher, 1 gigabit per
second, and Imax is approximately ten times higher than HDTV. The bottleneck
for an immersive "VR-like" webcam experience is the narrow
pipe of the Internet
.
Even with a narrow pipe, it is possible to use a great deal of inexpensive
computational horsepower and digital memory at both ends. For example,
one could build an immersive camera system (e.g., high definition, stereoscopic,
panoramic) with a local host computer which stores short sequences and
transmits them slowly to remote destinations, where they are restored.
Since such a system can not operate in real time, decisions about what
to transmit will be necessary. These decisions can be of a smaller granularity
than that of the actual frame. Consider, for example, having the ability
to only transmit "interesting" elements from a common street
scene - lovers walking hand-in-hand, a dog jumping in the air, a bird
in flight - even if these events are not simultaneous.
Now imagine having a library of such events. One could under-populate
or over-populate the scene as one desires. (Imagine an interactive Bruegel!)
But the scene will never look perfect, in the sense of credible verisimilitude,
because of time artifacts. Events occurring even a few minutes apart
will often exhibit time artifacts due to the change of sunlight. Such
artifacts are not of the sort easily fixable in PhotoShop. Semantic
knowledge of the scene and events are required. Indeed, transforming
an element recorded at night to appear during the day may never be truly
possible.
Studies
What would such time artifacts look like? Will an image retain its wholeness
as a "hyper-representation?" Will the place represented retain
its "place-ness?"
These questions were addressed in a series of studies made from pre-existing
footage from one of my earlier installations, "Be
Now Here" [3]. Be Now Here was filmed in four "endangered"
cities on the UNESCO World Heritage list using a custom camera configuration.
It consisted of two synchronized motion picture cameras side by side
(for stereopsis), 60-degree (horizontal) wide-angle lenses (for immersion),
and a precision motorized tripod that rotated once per minute (for panoramics).
In the final installation, visitors wore inexpensive polarized glasses
for 3D and stood on a slowly rotating floor, rotating in sync with the
image, resulting in the illusion that the movie was rotating around
the visitors. Four-channel sound was composed from asynchronous recordings
made at each location. (It is noteworthy that artificial accumulation
of sound elements into a single composition often has no loss of credibility.)
Five times of day were recorded at each of the four locations, plus
in San Francisco.
Three studies were produced from the Be Now Here material to explore
time artifacts [4]. The first study involved "match-cutting"
three different times of day as the camera panned, starting with
one cut per second and increasing to faster rates. The results are ambiguous,
depending on what the viewer fixates. When one fixates on transient
elements, such as people walking, the results are jarring. But when
one fixates on the non-transient elements, such as buildings, whose
color and shadows transform but remain stationary, the results appear
smooth.
The second study required only two
frames from the same location, with the camera pointing in the same
direction, at different times of day. A small circular mask was made
in PhotoShop that allowed a portion of one image to appear through the
other. The mask could be moved in real time. The result was like an
interactive "hole in time," with all non-transient elements
(trees, buildings, etc.) staying perfectly registered. This simple effect
appeared magic to many viewers, who thought much more was occurring.
Anyone can replicate this effect with a camera, a tripod, and PhotoShop.
The third study was produced
by projecting three images side by side as a triptych. Given the properties
of the footage, several experiments were made. The most obvious was
to simply offset the same footage by ten seconds between each screen,
recreating a spatially seamless 180-degree scene of the same place at
almost the same time. With no
transient elements, the scene looked virtually perfect, since the
sun and clouds did not change enough in ten seconds to be noticeable.
With transient elements in the scene, things become more complex. When
the scene contains slow,
prominent moving elements (such as a camel caravan in Timbuktu),
the ten-second offset was enough to create mis-alignment of the moving
elements between screens. When the scene contains fast,
prominent moving elements (such as a security truck in Jerusalem),
the repeated motion of the same element on all three screens is obvious.
When motion occurs at the
edge of the frame (such as a little boy standing still, then walking
away at the instant his image exits the frame), this too is very obvious.
But when many non-prominent transient elements appear in the frame (such
as a crowd of people), the repetition on all three screens, offset by
ten seconds, is difficult to detect.
Another three-screen experiment displayed the same place, spatially
synchronized, but at very different times of day, e.g., dawn, mid-day,
and early evening. In both a rural
example (Angkor Wat) and an urban
one (Dubrovnik), time artifacts were obvious: shadows fall in different
directions, the sky and clouds change, and the color temperature shifts.
Yet it's obvious that the triptych still represents the same place:
"place-ness" appeared retained.
A final three-screen experiment displayed the same time, temporally
synchronized, but in different places. Sunrise
sequences were synchronized such that the sun appeared to move smoothly
across the frame in Jerusalem, then continued moving across the next
frame in Dubrovnik, then again in the next frame in Timbuktu. This sort
of continuity is difficult to describe. The triptych clearly represented
a noticeable continuity, but one that was more abstract than simple
spatial continuity. Some observers noted the existence of continuity
but couldn't detect what it was.
The Grounded VR Webcam
What makes such spatially coherent, accumulated images possible, in
the end, is a grounded camera. Being physically anchored to a particular
location, it enables perfect spatial registration on different image
elements. This camera can be big, with immersive optics and robotic
movement, and it can employ powerful computing. It can also be connected
to the Internet via a very broadband connection (allowing "wide
pipe" alternatives for destinations that also have such connectivity).
It could also serve as a local head-end for smaller wireless cameras.
Such an integrated system would be ideal not only for accumulated "hyper-images,"
but possibly for accumulated environmental data as well.
While much of the high-tech community is focussing on wireless, it is
also accepting the compromises such low-bandwidth access entails, such
as loss of large-scale immersion. Such a loss too often strips imagery
of "sense of place." Grounded "VR webcams" offer
an alternative and complimentary way of making and experiencing images,
particularly place-based ones.
References
[1] http://www.artchive.com/artchive/B/bruegel/bruegel_games.jpg.html
[2] http://www.crumbmuseum.com/history2.html
[3] http://www.naimark.net/projects/benowhere.html
[4] a web-based version of this publication, including video clips of
the studies, can be found at: http://www.naimark.net/writing/vrwebcam.html
The author gratefully acknowledges the support of the Artist Residency
Program of the Institute of Advanced
Media Arts and Sciences (IAMAS), Ogaki, Japan.