Thumb-Frame Generator

“Thumb-Frame Generator”
A Motion Picture Logging Tool

Michael Naimark

michael@naimark.net

Abstract

“Thumb-Frame Generator” is a motion picture logging tool that adds the feature of “thumb-frames,” thumbnail images of motion picture frames, to text keywords, or “tags”, to motion picture databases. These databases, such as USC’s “Movie Tagger,” may be shared and server based, with the intention of crowd-sourcing the massive task of parsing and richly tagging every movie ever made. Thumb-Frame Generator leverages the US courts’ 2003 decision that thumbnail images are considered “fair use” for Internet search engines. Two ways and preferred embodiments are disclosed: one relying on incremental selection of thumb-frames and the other relying on batch processing entire motion picture sequences, converting every consecutive frame – literally hundreds of thousands – into thumb-frames. Additional enhancements and features are described.

1. Introduction

1.1 “Movie Tagger”

“Movie Tagger: a method and system for parsing and richly tagging every movie ever made” is an early-stage project which has been on the books at the USC Stevens Center for Innovation since May 2007 (1). Movie Tagger is conceived as “an easy-to-use online system that allows a community of users to parse and add relevant keywords (“tags”) to movies–scene by scene, shot by shot, and frame by frame–with a custom logging tool. The tags can be about the performers, action, dialogue, cinematics, mood, environment, or anything else. They can be unlimited in number. This metadata (without any image or sound) is uploaded to a shared database, where it can be used to rank, search, filter, and visualize tags through a unique graphical timeline interface.”

The original idea was to use DVD movies as the source material, and with a custom logging tool, to only store in-points, out-points (both time-code-style numbers), and tags (text). Hence the metadata would be alphanumeric and very lean, and would generally not make much sense or have much utility without being used with the actual DVD movie.

The first disclosure of Movie Tagger was made in an email from Michael Naimark to USC faculty members Andreas Kratky, Steve Anderson, and Scott Fisher on 4 February 2007 (Section 6 below). No patent applications have been filed to date. Movie Tagger may be the first disclosure of using “crowd-sourcing” to undertake this massive task.

1.2 Thumbnail Images and Fair Use

Thumbnails are defined by Wikipedia as “reduced-size versions of pictures, used to help in recognizing and organizing them, serving the same role for images as a normal text index does for words. In the age of digital images, visual search engines and image-organizing programs normally use thumbnails, as do most modern operating systems or desktop environments” (2).

The US courts have decided that thumbnails do not infringe copyright. Also from Wikpedia: “In 2002, the court in the US case Kelly v. Arriba Soft Corporation ruled that it was “fair use” for Internet search engines use thumbnail images to help web users to find what they seek” (3). Fair use is determined by an analysis of the four factors that are stated in of the United States Copyright Act, section 107 (4), and the analysis conducted in the revised Ninth Circuit opinion of July 2003 for the Kelly v. Arriba Soft case ruled in favor of fair use, in particular, citing that “because the images were not being sold as pictures but rather were to facilitate the identification of the images in the search engine” and that if “the secondary user only copies as much as is necessary for his or her intended use, then this factor will not weigh against him or her.”

The size limits, i.e., what constitutes “reduced size,” were not determined in this case.

1.3 “Thumb-Frames”

Consider thumbnails of motion picture frames, which one might call “thumb-frames.” A typical feature-length film is 100 minutes in length; at 24 frames per second, it consists of 144,000 frames. If thumb-frames are made for fair use, how many would constitute “as much as necessary”?

Movie Tagger is based on the belief that single-frame accuracy is an absolute requirement for parsing and tagging. Scholars and fans of cinema share this belief as well. Consider, for example, that the “shower scene” in Psycho has 90 separate shots in 45 seconds, a pace not uncommon throughout entire movies today. Hence any logging tool for Movie Tagger must also be frame accurate.

It therefore makes sense that the answer to how many frames are “necessary” may be “all,” or potentially all. From a pedagogical point of view, one might say that “all frames are taggably equal.”

The actual numbers, in terms of computer storage and access, are not daunting. Thumbnail images on Google Image Search are typically on the order of 120 by 150 pixels and stored as 6 KB jpg images. A 100-minute movie in which each frame is made into an individual thumb-frame would require a total storage of 864 MB, less than a gigabyte.

2. Description

2.1 Basics

Thumb-Frame Generator is an invention that converts motion picture frames into reduced-size images for the purpose of recognizing and organizing elements of the motion picture, primarily along the axis of time. These thumb-frames are used to help visually mark and index an “in-point” and an “out-point” associated with every text tag. They are generated and uploaded to a shared server and indexed by their corresponding time-code style index number.

Thumb-frames may be generated in two different ways.

2.2 Incremental Generation

One way to generate thumb-frames is through incremental selection, one by one, either by hand or by an automated process, from the motion picture source. The motion picture source may be physical medium such as a DVD, a stored video file such as mpg, or a video stream such as Flash video from YouTube. Individual frames may be selected by a human using known “scrubbing” tools to shuttle and jog through the video material for find the appropriate frames. They may also be selected using an automated process, e.g, using known techniques as “cut detection” or increasingly more sophisticated forms of image recognition. These frames are digitized and uploaded through known means with their associated time-code-style index number, e.g., as a single absolute number or as [hours:minutes:seconds:frames]. This incremental way of generating thumb-frames requires users to have access to the motion picture source as well as to the digitizing and uploading means in order to add new ones.

2.3 Batch Generation

A second way to generate thumb-frames is as a “batch process” for every consecutive frame. The motion picture source may be physical medium such as a DVD, a stored video file such as mpg, or a video stream such as Flash video from YouTube. Every consecutive frame is digitized and uploaded through known means with their associated time-code-style index number. The batch process may be in real time or non-real time, using known techniques, and may upload the thumb-frames to the shared server either one at a time as they are generated or as a batch after all or some portion of all are completed. This batch process way allows users to tag any sequence of a motion picture without needing to have the motion picture source or the digitizing and uploading means.

2.3 Time Code and Frame Accuracy

Depending on the motion picture source, true frame-accurate time-code may or may not be possible. For example, “interframe” video compression techniques do not store every individual video frame but reply on storing frame differences, and in some highly compressed forms of playback, frame accuracy is not reliable. One solution is to rely on the absolute nature of “I-frames” (intra-coded fully specified frames interspersed in a compressed stream) in conjunction with an internal clock for frame accuracy. Another solution is to mark the beginning of a motion picture to use as a “slate” from which an internal clock can be used for frame accuracy.

2.4 Database Organization and Graphical Timeline Interface

Thumb-Frame Generator is a specific improvement over integrated movie tagging systems, by enabling the use of thumbnail frames stored in the server database along with the alphanumeric metadata. Issues of storing, cross-correlating, and organizing the tags database (e.g., hierarchical or “folksonomic”) and how the database is accessed after logging (e.g., using a graphical timeline interface) are outside of the scope of this disclosure.

3. Preferred Embodiments

3.1 For Incremental Generation

A preferred embodiment for thumb-frame incremental generation is using a high-quality, frame-accurate, easily “scrub-able” (control shuttle and jog) source such as DVDs together with Internet-connected hardware such as an Internet-connected DVD player (e.g., Blu-ray) or an Internet-connected computer (such as a laptop or PC) with a DVD drive.

Incremental generation requires that thumb-frames are explicitly entered. In addition to scrubbing tools to move through the video material, and in addition to a means for entering text for tags, a means for digitizing and uploading thumb-frames may also be required. The server may indicate whether or not a marked frame has already been uploaded or not. Uploading thumb-frames may take place frame by frame during the logging process, or may be pre-stored and uploaded together.

3.2 For Batch Generation

A preferred embodiment for thumb-frame batch generation is a more general-purpose tool that converts virtually any motion picture source into individual thumb-frames, one thumb-frame for every motion picture frame. Once, or as, a motion picture (in the broadest sense, meaning any video of genre and of any duration) has been converted, the entire set of thumb-frames may be uploaded to the shared server.

As such, this embodiment is more flexible and open but less rigorous regarding quality and accuracy. It caters to a broader community beyond traditional cinema. It allows anyone to upload any batch and anyone to add tags whether or not they have (or have even seen) the motion picture source.

4. Additional Enhancements and Features

4.1 Intraframe Marking and Logging

The thumb-frame generator and logging tool currently supports only interframe logging, i.e., where tags are associated with a sequence of frames. An additional enhancement may be to add intraframe logging, whereby users can specify “zones” within the frame itself and add zone-specific tags.

4.2 Playing Thumb-Frames as a “Thumb-Movie”

Another enhancement may be to use known means to allow the thumb-frames to play back as a movie. If all of the frames exist as thumbnails (e.g., if batch generated), users could access a “thumb-movie,” played back in real time with the same frame rate as the source motion picture. Even if all of the frames do not exist as thumbnails, real time playback is still possible, like dynamic storyboards. (“Play” is distinguished from “jog” and “shuttle” precisely because of its true real-time feature.)

Whereas the current US precedent supports fair use of thumbnail images, accessing them as a motion picture may be supported as well.

4.3 “Thumb-Audio”

If real time playback of thumb-frames may be considered fair use, an additional enhancement may be the addition of the equivalent of “reduced size” audio. This may manifest in two ways. One way may be to simply lower the audio quality or fidelity, perhaps even to the point of being just-recognizable. Another way may be to keep the quality acceptable but reduce the duration allowed at any given time. Either way, using audio to help recognize and organize the contents of a motion picture may be a valid as fair use as thumbnails.

5. References

(1) http://stevens.usc.edu/IP/4003

(2) http://en.wikipedia.org/wiki/Thumbnail

(3) http://en.wikipedia.org/wiki/Kelly_v._Arriba_Soft_Corporation

(4) http://www4.law.cornell.edu/uscode/17/107.html

6. First Disclosure

From: Michael Naimark <michael@naimark.net>
Date: Sun, 04 Feb 2007 17:42:14 -0500
To: Andreas Kratky <akratky@cinema.usc.edu>, Steve Anderson <sanderson@annenberg.edu>
Cc: Scott Fisher <sfisher@telepresence.com>
Conversation: the parse-and-richly-tag-all-movies-shot-by-shot project
Subject: the parse-and-richly-tag-all-movies-shot-by-shot project

OK, it needs a better name.

The idea is to set up a website and protocol by which a community of users can use text aligned to simple timecode to parse and richly tag every movie that’s available externally, without the need to have the movies on any servers. The common denominator is the timecode.

The inspiration is in part what happened to Flickr after a year or so, when tags went from predictable and sparse to eclectic and numerous. “Cats in sinks,” “Circles in squares,” “Love,” “Red,” etc.

Metadata is legally free and clear. Transcription of audio dialogue into script may be legally gray. Thumbnails for every shot is I think OK (though tedious to produce) based on legal precedents set by Google Image.

The funny thing is that I bet there would be producers of this information even if the reason for consumption isn’t clear, since every movie has fans. And many fans are Net savvy and have Netflix accounts.

Having this database would be a first step toward remix and reuse possibilities. Some movie producers (or directors) may not wish the integrity of their expression to be messed with, while others may embrace it.

Since the database can be text only, this should be easy, right? (Kidding.) Anyway, are you both interested enough to meet about it?

Thanks,

-M

Back to Projects Pending 2010