Michael Naimark
michael@naimark.net
ÒThumb-Frame
GeneratorÓ is a motion picture logging tool that adds the feature of
Òthumb-frames,Ó thumbnail images of motion picture frames, to text keywords, or
ÒtagsÓ, to motion picture databases. These databases, such as USCÕs ÒMovie
Tagger,Ó may be shared and server based, with the intention of crowd-sourcing
the massive task of parsing and richly tagging every movie ever made. Thumb-Frame
Generator leverages the US courtsÕ 2003 decision that thumbnail images are
considered Òfair useÓ for Internet search engines. Two ways and preferred
embodiments are disclosed: one relying on incremental selection of thumb-frames
and the other relying on batch processing entire motion picture sequences,
converting every consecutive frame – literally hundreds of thousands
– into thumb-frames. Additional enhancements and features are described.
ÒMovie Tagger: a method and
system for parsing and richly tagging every movie ever madeÓ is an early-stage project which has been on the books at the USC Stevens Center
for Innovation since May 2007 (1). Movie Tagger is conceived as Òan easy-to-use online system that allows a community of
users to parse and add relevant keywords (ÒtagsÓ) to movies–scene by
scene, shot by shot, and frame by frame–with a custom logging tool. The
tags can be about the performers, action, dialogue, cinematics, mood,
environment, or anything else. They can be unlimited in number. This metadata
(without any image or sound) is uploaded to a shared database, where it can be
used to rank, search, filter, and visualize tags through a unique graphical
timeline interface.Ó
The
original idea was to use DVD movies as the source material, and with a custom
logging tool, to only store in-points, out-points (both time-code-style
numbers), and tags (text). Hence the metadata would be alphanumeric and very
lean, and would generally not make much sense or have much utility without
being used with the actual DVD movie.
The
first disclosure of Movie Tagger was made in an email from Michael Naimark to
USC faculty members Andreas Kratky,
Steve Anderson, and Scott Fisher on 4 February 2007 (Section 6 below). No
patent applications have been filed to date. Movie
Tagger may be the first disclosure of using Òcrowd-sourcingÓ to undertake this
massive task.
1.2 Thumbnail
Images and Fair Use
Thumbnails are defined by Wikipedia as Òreduced-size versions of pictures,
used to help in recognizing and organizing them, serving the same role for
images as a normal text index does for words. In the age of digital images,
visual search engines and image-organizing programs normally use thumbnails, as
do most modern operating systems or desktop environmentsÓ (2).
The US courts
have decided that thumbnails do not infringe copyright. Also from Wikpedia: ÒIn
2002, the court in the US case Kelly v. Arriba Soft Corporation ruled that it was Òfair useÓ for Internet search engines use
thumbnail images to help web users to find what they seekÓ (3). Fair use is
determined by an analysis of the four factors that are stated in of the United
States Copyright Act, section 107 (4), and the analysis conducted in the revised Ninth Circuit
opinion of July 2003 for the Kelly v. Arriba Soft case ruled in favor of fair
use, in particular, citing that Òbecause
the images were not being sold as pictures but rather were to facilitate the
identification of the images in the search engineÓ and that if Òthe secondary user only copies as much as is necessary for
his or her intended use, then this factor will not weigh against him or her.Ó
The size
limits, i.e., what constitutes Òreduced size,Ó were not determined in this
case.
Consider
thumbnails of motion picture frames, which one might call Òthumb-frames.Ó A
typical feature-length film is 100 minutes in length; at 24 frames per second,
it consists of 144,000 frames. If thumb-frames are made for fair use, how many
would constitute Òas much as necessaryÓ?
Movie Tagger
is based on the belief that single-frame accuracy is an absolute requirement
for parsing and tagging. Scholars and fans of cinema share this belief as well.
Consider, for example, that the Òshower
sceneÓ in Psycho has 90 separate shots in 45 seconds, a pace not uncommon throughout
entire movies today. Hence any logging tool for Movie Tagger must also be frame accurate.
It therefore
makes sense that the answer to how many frames are ÒnecessaryÓ may be Òall,Ó or
potentially all. From a pedagogical point of view, one might say that Òall
frames are taggably equal.Ó
The actual
numbers, in terms of computer storage and access, are not daunting. Thumbnail
images on Google Image Search are typically on the order of 120 by 150 pixels
and stored as 6 KB jpg images. A 100-minute movie in which each frame is made
into an individual thumb-frame would require a total storage of 864 MB, less
than a gigabyte.
Thumb-Frame
Generator is an invention that converts motion picture frames into reduced-size
images for the purpose of recognizing and organizing elements of the motion
picture, primarily along the axis of time. These thumb-frames are used to help visually
mark and index an Òin-pointÓ and an Òout-pointÓ associated with every text tag.
They are generated and uploaded to a shared server and indexed by their
corresponding time-code style index number.
Thumb-frames
may be generated in two different ways.
2.2
Incremental Generation
One way to
generate thumb-frames is through incremental selection, one by one, either by
hand or by an automated process, from the motion picture source. The motion
picture source may be physical medium such as a DVD, a stored video file such
as mpg, or a video stream such as Flash video from YouTube. Individual frames
may be selected by a human using known ÒscrubbingÓ tools to shuttle and jog
through the video material for find the appropriate frames. They may also be
selected using an automated process, e.g, using known techniques as Òcut
detectionÓ or increasingly more sophisticated forms of image recognition. These
frames are digitized and uploaded through known means with their associated
time-code-style index number, e.g., as a single absolute number or as [hours:minutes:seconds:frames].
This incremental way of generating thumb-frames requires users to have access
to the motion picture source as well as to the digitizing and uploading means
in order to add new ones.
2.3 Batch
Generation
A second way to
generate thumb-frames is as a Òbatch processÓ for every consecutive frame. The
motion picture source may be physical medium such as a DVD, a stored video file
such as mpg, or a video stream such as Flash video from YouTube. Every
consecutive frame is digitized and uploaded through known means with their
associated time-code-style index number. The batch process may be in real time or
non-real time, using known techniques, and may upload the thumb-frames to the
shared server either one at a time as they are generated or as a batch after
all or some portion of all are completed. This batch process way allows users
to tag any sequence of a motion picture without needing to have the motion
picture source or the digitizing and uploading means. 2.3 Time Code
and Frame Accuracy
Depending on
the motion picture source, true frame-accurate time-code may or may not be
possible. For example, ÒinterframeÓ video compression techniques do not store
every individual video frame but reply on storing frame differences, and in
some highly compressed forms of playback, frame accuracy is not reliable. One
solution is to rely on the absolute nature of ÒI-framesÓ (intra-coded fully
specified frames interspersed in a compressed stream) in conjunction with an
internal clock for frame accuracy. Another solution is to mark the beginning of
a motion picture to use as a ÒslateÓ from which an internal clock can be used
for frame accuracy.
2.4 Database
Organization and Graphical Timeline Interface
Thumb-Frame
Generator is a specific improvement over integrated movie tagging systems, by
enabling the use of thumbnail frames stored in the server database along with
the alphanumeric metadata. Issues of storing, cross-correlating, and organizing
the tags database (e.g., hierarchical or ÒfolksonomicÓ) and how the database is
accessed after logging (e.g., using a graphical timeline interface) are outside
of the scope of this disclosure.
3. Preferred Embodiments 3.1 For
Incremental Generation
A preferred
embodiment for thumb-frame incremental generation is using a high-quality,
frame-accurate, easily Òscrub-ableÓ (control shuttle and jog) source such as
DVDs together with Internet-connected hardware such as an Internet-connected DVD
player (e.g., Blu-ray) or an Internet-connected computer (such as a laptop or
PC) with a DVD drive.
Incremental
generation requires that thumb-frames are explicitly entered. In addition to
scrubbing tools to move through the video material, and in addition to a means
for entering text for tags, a means for digitizing and uploading thumb-frames
may also be required. The server may indicate whether or not a marked frame has
already been uploaded or not. Uploading thumb-frames may take place frame by
frame during the logging process, or may be pre-stored and uploaded together.
3.2 For Batch
Generation
A preferred
embodiment for thumb-frame batch generation is a more general-purpose tool that
converts virtually any motion picture source into individual thumb-frames, one
thumb-frame for every motion picture frame. Once, or as, a motion picture (in
the broadest sense, meaning any video of genre and of any duration) has been
converted, the entire set of thumb-frames may be uploaded to the shared server.
As such, this
embodiment is more flexible and open but less rigorous regarding quality and
accuracy. It caters to a broader community beyond traditional cinema. It allows
anyone to upload any batch and anyone to add tags whether or not they have (or
have even seen) the motion picture source.
4. Additional Enhancements and Features 4.1 Intraframe
Marking and Logging
The
thumb-frame generator and logging tool currently supports only interframe
logging, i.e., where tags are associated with a sequence of frames. An
additional enhancement may be to add intraframe logging, whereby users can
specify ÒzonesÓ within the frame itself and add zone-specific tags.
4.2 Playing
Thumb-Frames as a ÒThumb-MovieÓ
Another
enhancement may be to use known means to allow the thumb-frames to play back as
a movie. If all of the frames exist as thumbnails (e.g., if batch generated),
users could access a Òthumb-movie,Ó played back in real time with the same
frame rate as the source motion picture. Even if all of the frames do not exist
as thumbnails, real time playback is still possible, like dynamic storyboards.
(ÒPlayÓ is distinguished from ÒjogÓ and ÒshuttleÓ precisely because of its true
real-time feature.)
Whereas the
current US precedent supports fair use of thumbnail images, accessing them as a
motion picture may be supported as well.
4.3
ÒThumb-AudioÓ
If real time
playback of thumb-frames may be considered fair use, an additional enhancement
may be the addition of the equivalent of Òreduced sizeÓ audio. This may
manifest in two ways. One way may be to simply lower the audio quality or
fidelity, perhaps even to the point of being just-recognizable. Another way may
be to keep the quality acceptable but reduce the duration allowed at any given
time. Either way, using audio to help recognize and organize the contents of a
motion picture may be a valid as fair use as thumbnails.
5. References
(1)
http://stevens.usc.edu/IP/4003
(2)
http://en.wikipedia.org/wiki/Thumbnail
(3)
http://en.wikipedia.org/wiki/Kelly_v._Arriba_Soft_Corporation
(4)
http://www4.law.cornell.edu/uscode/17/107.html
6. First Disclosure
From: Michael Naimark <michael@naimark.net>
OK,
it needs a better name. |