Movie Tagger

Movie Tagger

a method and system for parsing and richly tagging

every movie ever made

Michael Naimark, Research Associate Professor

Steve Anderson, Assistant Professor

Maya Churi, Graduate Student

Perry Hoberman, Research Associate Professor

Andres Kratky, Adjunct Professor and Visiting Scholar

Erik Loyer, Former Adjunct Professor and Consultant

Interactive Media Division

School of Cinematic Arts

University of Southern California

2007 USC Provost's Seed Grant Proposal for Teaching and Technology
2007 USC Stevens Institute for Innovation description

Abstract

"Movie Tagger" is conceived as an easy-to-use online system that allows a community of users to parse and add relevant keywords ("tags") to movies - scene by scene, shot by shot, and frame by frame. The tags can be unlimited in number but entirely searchable. They can be about the performers, action, dialogue, cinematics, mood, environment, or anything else.

The system is based on unique collaborative timeline and custom logging tools. The collaborative timeline tool is an interactive graphical interface for video and any other one-dimensional data sets. For any point or any interval on the timeline, every associated tag can be seen both as a list and as a graphic, displayed as nested brackets of in and out points. Tags can be ranked, searched, and filtered. In a wiki style, tags can be entered (and modified and contested) by the user community, along with attribution and discussion. The custom logging tool allows tags to be easily entered, along with in and out points, while watching a movie DVD on a personal computer. This metadata (without any image or sound) is uploaded to our shared database.

The Movie Tagger system can also be used for video other than feature length movies, for example, for such video databases as the USC Shoah Foundation Institute for Visual History and Education Archive. It can also be used for other one-dimensional data sets, such as historical timelines.

We conjecture that, given that every movie and every scene has its scholars and fans, that a massive online community will embrace Movie Tagger.

1. Introduction

1.1 Background

For the 2007 Academy Awards, Apple Computer produced a television ad for their iPhone called "Hello" in which it "remixed" 30 short clips from popular movies of well-known performers saying "hello" into a telephone, all in under 22 seconds. How did they find these clips? The answer is through brute force and a lot of manual searching, since no database exists for popular movies with such shot-by-shot detail.

Artists also have the brute force and wherewithal to make such things, e.g., Christian Marclay's "Video Quartet" (4-screen video installation with short clips from over 100 movies) and Jennifer and Kevin McCoy's "Every Shot / Every Episode" (10,000 organized shots from the Starsky and Hutch T.V. series). And though several attempts are currently underway to automate such endeavors, there is an alternative approach which may be faster, more accurate, and richer.

Fans.

Every movie has fans, lots of fans. Movie Tagger taps this enormous, distributed resource. Looking at it from another angle, every "type" of shot has fans as well, e.g., people who know every Citroen car crash, every close up of toes, every compensated zoom, and every POV animal shot.

We believe that given the right toolset, a massive text-based descriptive database of all shots in all movies can emerge. Such a toolset would be part of a long, deep education movement based on "learning by doing." This movement (without splitting hairs) includes the Constructionism of Piaget and Papert, the One Laptop per Child initiative of MIT and the UN, the "DIY" (do it yourself) and "Make" communities, as well as bloggers, MySpace members, and YouTube contributors.

1.2 Rich Tags

Something magical happened to Flickr, the highly successful photo sharing website, around its second year. Initially, in 2004, they recommended "You can give your photos a 'tag', which is like a keyword or category label." The first year's worth of tags were sparse and basic, similar in style to those used in the National Geographic slide library.

National Geographic, it turns out, typically assigns 4-6 keywords per slide. A directory is used to agree on terms ("bluffs" see "cliffs"). Young, entry-level workers assign keywords to new photos. It's relatively easy to agree on basic common keywords.

Similarly, the Google Image Labeler pairs two people in real time to agree on tags for random images. It encourages the most common and basic tags.

But what about the assigning a tenth, or twentieth, tag to the same image, after all of the most obvious tags have been already assigned? This is what happened, in spirit at least, to Flickr, which allows up to 70 tags to each photo, around its second year. Tags started getting deeper and more abstract: "blue," "love," "cats in sinks," "circles in squares." This level of expression is highly creative and uniquely human.

1.3 Organizing Tags

One might have a tendency to make specific categories for tags. For example, in the case of movies, we can confidently say that some tags will be about the actors, some about the cinematic elements, some will be dialogue, etc.

Historically, two "religious denominations" have grown up around organizing keywords. One denomination believes that categories give order and make inputting and sorting easier. The other denomination worries less about a priori organization and relies on the power of search. This second denomination was one of the basic tenants of Apple's HyperCard (which begat Sherlock which begat Spotlight). Bill Atkinson, HyperCard's designer, even advocated assigning arbitrary numbers on such actual items as books, then indexing them each to a HyperCard loaded with keywords. More recently, the word "Folksonomy" (created in 2002) has become popular to emphasize a "user generated taxonomy."

We've adopted this second approach. We think it will encourage creativity, and we're confident that our community will normalize their usages and conventions organically.

2. Basic System Disclosure

Our grandest mission is to tap a massive community of Internet users to build and refine a massive database quickly, efficiently, and enjoyably. Examples already exist: Wikipedia, Flickr, Del.icio.us, Google Earth. We intend to do this for cinema.

The system is based on two components:

2.1 Collaborative Timeline Tool

The Collaborative Timeline Tool is conceived as an interactive graphical interface for video and any other one-dimensional data sets. For any point or any interval on the timeline, every associated tag can be seen both as a list and as a graphic, displayed as nested brackets of in and out points. Tags can be ranked, searched, and filtered. In a wiki style, tags can be entered (and modified and contested) by the user community, along with attribution and discussion.

Though several interactive timelines have begun to appear on the Internet, none currently exist which combine open collaborative tagging with a timeline. This is our unique approach, and our preferred embodiment is designed from scratch building on related interfaces and state-of-the-art technologies, using Flash and ShockWave for the front end and PHP MySQL for an Internet server-based back end.

2.2 Logging Tool

While it's entirely possible for an impassioned cineaste to log an entire movie, shot by shot, using a pencil and clipboard while watching a DVD on home television, it's hardly ideal. Our Logging Tool is conceived as an easy to use software tool that's integrated with video playback for entering tags and in / out points. It allows easy "scrubbing" through the movie to find particular in and out points, like a simplified version of a digital video editing logging process. A tag directory is always available to help keep tags in common when desired.

A preferred embodiment relies on Internet-enabled computers, particularly laptops, with built-in DVD players using internal PC control code to control its DVD Player application. When a movie DVD is loaded, its current Collaborative Timeline is downloaded from our server. If no data yet exists for that particular movie, a new Collaborative Timeline appears. (This is exactly what happens when a music CD is loaded and iTunes displays its playlist.)

Two noteworthy priorities exist here. Parsing and tagging must be frame accurate. The "shower scene" in Psycho, for example, has 90 separate shots in 45 seconds, a pace not uncommon throughout entire movies today (e.g., Bourne Supremacy). Second, our system is for creating and sharing metadata, not audiovisual material directly. Our system will not require storing or recording of any copywritten movie content.

3. Optional Features and Future Applications

3.1 Use for other video databases

Movie Tagger can be used for any video database, not just commercial movies. For example, the USC Shoah Foundation Institute for Visual History and Education Archive consists of nearly 52,000 testimonies of survivors and other witnesses of the Holocaust, almost 120,000 hours of video. It is the largest video database in the world. The Collaborative Timeline Tool can provide an intuitive front-end and the Logging Tool can open up the database to additional annotation.

3.1 Use for other one-dimensional data sets

Finally, we believe that the Movie Tagger system can add unique value to other one-dimensional data sets, such as historical timelines. We can envision a Wikipedia-style "Timeline of World History" may result.

4. Some Conjectures

- There will be lots of people interested in tagging particular movies in full and tagging particular types of shots across movies just to do it, with no particular end use in mind. We have a serious joke: if we make Movie Tagger right and make it freely accessible, 98% of every movie ever made will be parsed and tagged in the first two months.

- One obvious end use is scholarship. Another obvious end use is visualization, both for research and for art. An amusing (and probably spectacular) end use is watching the timeline scroll and display tags while watching a DVD movie in real time.

- An inevitable end use is for remix, like the iPhone ad. On the one had, we're carefully steering away from actually digitizing images and sound: this is a collection of community-made metadata. But we're mindful that such a massive and powerful database is a proactive step toward a world where all movies are readily available for reuse when no one objects.

Back to Projects Pending 2010