A Method And System For Automated Storyboard Generation

Abstract: The embodiments herein disclose a system and a method for creating pre-visualization videos for a given script. Further, the proposed system for creation of pre-visualization videos is fully automatic. The proposed system takes a novel script for which pre-visualization videos are to be constructed, as an input. The described system proposes several features based on a novel script and training segments’ textual and visual information to produce storyboard videos consistent with respect to the textual information described in a novel script and consistent across selected video shots. Further, the system makes use of semi-Markov conditional random field (Semi-markov CRF) mechanism to perform simultaneous segmentation and assignment of video shots. Further, inference algorithms and weight learning algorithm are used in the process of creation of pre-visualization videos. Further, the proposed system is capable of generating multiple storyboard videos for a given novel script.

Patent Information

Application #

Filing Date

03 October 2011

Publication Number

18/2013

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Patent Number

Legal Status

Grant Date

2019-09-26

Renewal Date

Applicants

Indian Institute of Technology

Powai Mumbai Postcode 400076 Maharashtra INDIA

Inventors

1. Prof. Subhasis Chaudhuri

B-129 IIT Powai Mumbai 400076

2. Prof. Sunita Sarawagi

Computer Science Engg. IIt Powai Mumbai-400076

3. Rishabh Iyer

Building No. 4 Flat No. 5 LNHO compound 7 K.M Munshi Marg Chowpatty Mumbai – 400007.

4. Shah Ronak

304 DRDO Building IIT Bombay Mumbai-400076

Specification

1/30
FORM 2
The Patent Act 1970
(39 of 1970)
&
The Patent Rules, 2005
COMPLETE SPECIFICATION
(SEE SECTION 10 AND RULE 13)
TITLE OF THE INVENTION
A method and system for automated Storyboard generation
APPLICANTS:
Name Nationality Address
Indian Institute of Technology
Bombay
India Powai, Mumbai
Postcode 400076
Maharashtra, INDIA
The following specification particularly describes and ascertains the nature
of this invention and the manner in which it is to be performed:-
2/30
TECHNICAL FIELD
[001] The embodiments herein relate to a method of creating previsualization
videos and, more particularly, automating the process of
creation of pre-visualization of videos.
5 BACKGROUND
[002] Making of motion picture, which subsume movies and serials,
comprise five main steps: script writing, pre-production, production, postproduction
and distribution. Script writers construct over-all story of a
movie or an episode of a serial through scripts, which include places of
10 enactment, actors participating in scenes, speakers and their dialogues. Preproduction
subsumes deciding upon actors and actual set-ups for various
scenes. Actual shooting of a movie or an episode takes place during
production, whereas editing and other modifications take place during postproduction
step. The final step of distribution gives rise to release of a
15 movie or airing of an episode. Finally, subtitles are provided to facilitate
viewers which are generally written by other viewers and often made
publicly available.
[003] Before starting to shoot for a particular scene of a motion
picture or an episode, it is important for actors and production crew to
20 visualize requirements of a scene, which include camera placements,
lighting conditions and positioning of actors and their actions. One can
visualize a scene, about to be shot, by generating a storyboard video, which
3/30
is one of the most efficient ways of pre-visualizing motion pictures for new
scripts and has been used for a long time in movie making. The goal of a
storyboard is to provide a visual layout of underlying events in the way
they will be seen on the screen.
[004] Typically, storyboard videos are manually 5 generated by
storyboard artists, who provide visualization of underlying stories through
sketches and drawings. This process typically involves creating rough
sketches based on the script of the motion picture, following which more
detailed images are created for depicting shots to express movie or episode
10 director’s intent. Generating storyboards in this manner is quite intricate,
time consuming and resource intensive. Given the typical short time-lines
associated with such productions, it is in general difficult to manage such
an intricate procedure.
[005] Another way of generating storyboard videos provide
15 animated cartoon strips for visualization, however, as in the previous case,
this approach does not provide realistic visualization. Specifically, they
lack in many fine details about the screenplay, which could make such
videos more understandable and immersive. Again, generating storyboard
videos in this manner is a cumbersome task, as it requires a person to spend
20 significant amount of time on a computer creating such videos. In other
words, this process is very inefficient and constructing multiple such videos
can take a lot of human time and energy.
4/30
SUMMARY
[006] In view of the foregoing, an embodiment herein provides a
method for generating storyboard videos for a novel script. The method
comprises steps of segmenting input script into corresponding 5 segments,
comparing information associated with segments of novel script with
information of training segments, selecting video segments from training
data which match with the provided details of novel script and
concatenating the selected video segments to form a pre-visualization
10 video. The method initially segments the novel script, for which previsualization
videos are to be created, into multiple segments. The method
then compares information of the segments of novel script with the
information of training segments. Further, based on the comparison, video
segments are selected from the training data whose information matches
15 with the details of novel script. The selected videos are then concatenated
to form a pre-visualization video corresponding to that particular script.
[007] These and other aspects of the embodiments herein will be
better appreciated and understood when considered in conjunction with the
following description and the accompanying drawings.
20
5/30
BRIEF DESCRIPTION OF THE FIGURES
[008] The embodiments herein will be better understood from the
following detailed description with reference to the drawings, in which:
[009] FIG. 1 illustrates a general block diagram of the previsualization
media as disclosed in the embodiments 5 herein;
[0010] FIG. 2 illustrates a block diagram of the pre-visualization
server and its components as disclosed in the embodiments herein; and
[0011] FIG. 3 illustrates a flow diagram of processes involved in
creation of pre-visualization videos for a novel script as disclosed in the
10 embodiments herein.
6/30
DETAILED DESCRIPTION OF EMBODIMENTS
[0012] The embodiments herein and the various features and
advantageous details thereof are explained more fully with reference to the
non-limiting embodiments that are illustrated in the accompanying
drawings and detailed in the following description. Descriptions 5 of wellknown
components and processing techniques are omitted so as to not
unnecessarily obscure the embodiments herein. The examples used herein
are intended merely to facilitate an understanding of ways in which the
embodiments herein may be practiced and to further enable those of skill in
10 the art to practice the embodiments herein. Accordingly, the examples
should not be construed as limiting the scope of the embodiments herein.
[0013] The embodiments herein disclose a system for automatically
creating pre-visualization videos by accepting a new script as input.
Referring now to the drawings and more particularly to FIG. 1 through 3,
15 where similar reference characters denote corresponding features
consistently throughout the figures, there are shown preferred
embodiments.
[0014] FIG. 1 illustrates a general block diagram of the previsualization
media as disclosed in the embodiments herein. The system
20 comprises an administrator 101, a pre-visualization server 102, database
103 and a user 104. Administrator 101 is capable of providing a script as an
input to the system. In an embodiment, administrator 101 is also
7/30
responsible for access related mechanism such as list of people authorized
to access the system, level of access permitted for authorized person and so
on. Pre-visualization server 102 is the major component of the system
which accepts a script as input, performs computations to identify multiple
video segments appropriate to the script. Database 5 103 contains video
segments and their corresponding textual information obtained from
training data containing videos, scripts and subtitles. In an embodiment,
training data can be past episodes of a serial or simply a collection of video
segments with corresponding textual descriptions as in scripts and subtitles.
10 Further, the video segments are short clips corresponding to various shots,
which may indicate sudden change in camera angles and lighting
conditions. Consequently, each video shot represents a set of visually and
temporally coherent images. Further, each video shot has certain
information associated with it which may comprise actors present in the
15 scene, speakers and corresponding dialogues, location of the scene and the
time period over which the shot is occurring. User 104 is any person related
to the production of the current motion picture, such as director,
cameraman, lighting expert and so on. In another embodiment, user 104
may perform some or all functions of an administrator 101.
20 [0015] FIG. 2 illustrates the block diagram of a pre-visualization
server and its components as disclosed in the embodiments herein. The
system comprises language detection module 201, video analysis module
8/30
202 and a controller module 203. The language detection module 201
analyses a novel script and extract details such as locations of various
scenes, actors involved in these scenes and dialogues spoken by speakers.
The script as well as the corresponding details extracted by the language
detection module 201 is passed to the video analysis 5 module 202. The
video analysis module 202 compares details extracted by the language
detection module 201 with information available from training data. Based
on comparisons between details extracted by language detection module
and information available from training data, the video analysis module 202
10 selects a video shot or a set of video shots which match description of the
novel script. The controller module 203 receives data from the video
analysis module 202 and performs further actions on the received data. The
controller module 203 may merge more than one video shot into a
concatenated sequence. In an embodiment, the controller module 203
15 performs concatenation of video sequences depending on the options set by
a user or an administrator. Further, the controller module 203 may also
perform certain operations, such as adding subtitles, language translation
and so on, depending on the requirements.
[0016] FIG. 3 illustrates a flow diagram of the processes involved in
20 creation of pre-visualization videos as disclosed in the embodiments herein.
The user and/or the administrator feed a novel script for which the previsualization
videos are to be created, as the input of the system. In an
9/30
embodiment, each shot depicts change in camera angles and lighting
conditions. Scripts may contain the scene information, speakers and
corresponding dialogues and may or may not contain timing information.
Further, Subtitles contain only dialogues and timing information and lack
other information present in corresponding scripts. 5 Further, if timing
information is not present in the script, in order to form timed scripts, the
system synchronizes (301) scripts with corresponding sub-titles. In an
embodiment, the synchronization can be performed using dynamic time
wrapping with Levenshtein distance or any such suitable method. Further,
10 an administrator and/or a user provides video segments corresponding to
scripts and subtitles to the system, which are then segmented into shots by
using similarity computed through certain image statistics such as color
histograms (302). Further, timing information from scripts and subtitles are
utilized to assign parameters comprising scene information, actors
15 participating in a scene, speakers and their dialogues and episode
information to each video shot. The extracted information from scripts and
subtitles forms (303) a database, which contains various segments, each
containing a set of video frames (video shot) and corresponding textual
information. In an embodiment, the videos from the database are used to
20 generate videos for the input novel scripts.
[0017] In an embodiment, a novel script contains scenes, speakers
and corresponding dialogues. However, the script may lack timing
10/30
( ) . ( , )
( )
1
| , W F x s e
Z x
P s x W =
information. Further, these scripts may lack segmentation in terms of
speakers and corresponding dialogues. For such a script, number of
possible segmentations is exponentially large. Hence, searching for suitable
video segments for all possible segmentations of a script is prohibitive. In
5 order to effectively solve this issue of segmentation, this problem is
formulated as one of simultaneous segmentation and assignment and an
appropriate optimization technique may be used without limiting scope of
this invention. In an embodiment, semi–Markov conditional random field
(semi-Markov CRF) formalism is employed in the present invention which
10 provides a probability density function over all possible segmentations of a
given script. By virtue of the semi-Markov CRF formalism, the problem of
simultaneous segmentation and assignments reduces to maximizing the
probability measure P(s|x), where x is a given novel script and s is one of
the segmentations of x. A class conditional estimator as per the semi-
15 Markov CRF formalism is given as:
(1)
Here W denotes weights attached to the components of F, which represents
a sum of features over the length of segmentation given by
S=
=
| |
1
( , ) ( , , )
s
i
F x s f i x s (2)
where f = (f1,….,fK) is a vector of K features, where each component fj 20 (i, x,
s) is a mapping for an index i of (x, s) to ii a set of real numbers. Hence,
11/30
F (x,s) is a sum of all fj''s over the segment length |s|. S
" Î
=
s S
W F x s Z x e
''
. ( , '') ( ) is
a partition function, which is a sum over all possible proper segmentations
denoted by set f.. In an embodiment, semi-Markov CRF formalism allows
non-Markovian behavior inside an individual segment, as a result of which
5 semi-Markov CRF is capable of incorporating varying length segments
inside segmentation. In an embodiment, varying length segments refer to
segments with different number of dialogues. Since Z(x) is constant for all
segmentations, Equation (1) can now be modified as:
s argmax P(s | x,W) argmaxW.F(x, s)
s S s S
opt
Î Î
= = (3)
10 where, sopt is the optimal segmentation of the novel script that we
wish to achieve.
[0018] In an embodiment, in order to limit computational
complexity, a Markovian assumption may be made for features utilized in
the process. Further, for the first order Markovian assumption, computation
of all features fj i f, for kth segment of a script, depend only upon (k-1)th 15
segment’s label lk-1. Now based upon the first order Markovian
assumption, Equation (3) can now be written as:
S -
Î
=
k
k k k k
s S
opt s argmaxW. f (t ,u , , , x) 1
l l (4)
where tk and uk denote start and end indices of the script and lk is an
index of the segment of the training data, which is assigned to the kth 20
segment of a script.
12/30
[0019] In an embodiment, the training data used during creation of
pre-visualization videos includes sets of various attributes such as
dialogues, speakers, scenes, actors, episode numbers and so on, extracted
from videos of past episodes or scenes and corresponding scripts and
subtitles. Let D be a set of N dialogues across the entire 5 corpus. Each
dialogue Di is a collection of words. Let R be a set of speakers such that the
Ri
th speaker speaks the Di
th dialogue. Let C represent a set of scenes where
speakers speak these dialogues such that the Ri
th speaker speaks the Di
th
dialogue at the Ci
th place. Typically in movies and episodes of a serial,
10 actions take place only at a few distinct places. That means C represents a
categorical data with limited number of categories. Similarly, let A
represent a set of actors participating in scenes corresponding to C. Let, E
represent a set of episode numbers from the corpus starting from 1 and
going to the number of episodes considered for training. In another
15 embodiment, for movie database this parameter can be ignored. The set {D,
R, C, A, E} is collectively referred to as the textual information. In an
embodiment, some of the set elements may be null.
[0020] In another embodiment, the system groups various types of
metadata extracted from the input script such as speakers, dialogues
20 corresponding to speakers, scenes where the speakers speak these dialogues
and so on. An exemplary description of the corpuses is given below. The
terms used to represent the features in the description of corpuses and
13/30
proposed features are for the purposes of illustrating the application and do
not aim to limit the scope of the application. Further, these features may be
replaced or modified accordingly to suit the underlying application. All
such variations fall within the scope of this application.
[0021] Further let M be the number of video shots 5 present in the
database, represented as a set V = [V1, V2, ..., VM]. In order to reduce
computational complexity for later processing we assign a representative
frame for each video shot. Such representative frame is assigned by
maximizing the similarity measure, defined over color histograms, across
10 all frames of a video shot. For each representative frame, we compute color
histogram and gist descriptors. Let, H = [H1, H2, ..., HM] and G = [G1, G2,
..., GM] denote a set of color histograms and gist descriptors across a
corpus. In order to take cognizance of clothing of actors, upper-bodies of
actors are computed and their color histograms are referred to as P = [P1,
15 P2, ..., PM]. In an embodiment, the choice of features such as color
histograms, gist, upper body histogram and so on are for the purpose of
illustration and do not limit the scope of invention.
[0022] Further, the training segments form a set S = [S1, S2, ..., SM]
and lth segment of this set may be represented as Sl = (Tl, Ul, l), where Tl
20 and Ul represent start and end indices of dialogues in D such that T1 = 1,
UM = N and Tl+1 = Ul
+ 1 for 1 = l = M-1, where l represents a label of a
video shot Vl which is associated with a color histogram Hl, gist descriptor
14/30
Gl and a color histogram of upper bodies Pl. Further, the zero-dialogue
segments, video segments which do not have any dialogues in them, can be
represented as Tl = Ul = Ul - 1. Such zero-dialogue segments are handled
separately and in the subsequent discussion we assume segments to have at
least one dialogue in them until explicitly 5 mentioned.
[0023] Further, consider a novel script (x) containing n dialogues,
for which a pre-visualization video is to be generated. For the given script,
d = [d1, d2, ..., dn], r = [r1, r2, ..., rn], c = [c1, c2, ..., cn] and a = [a1, a2, ...,
an] represent dialogues, speakers, scene locations and actors participating in
10 a scene, respectively. A segmentation, s, containing m segments, for this
script may be represented as s = [s1, s2, ..., sm], where kth segment of this
segmentation may be denoted as sk = (tk, uk, lk) : 1 = k = m, 1 = lk = M.
Here lk denotes an index of a segment of the training data, whose video
frames are to be assigned to this segment of a novel script. Further, lk
15 indicates a label assigned to the segment and l as a running index over the
segments of training data. Further, for the assignment of a training segment
to the segment of a novel script, episode information, color histogram, gist
descriptor and an upper-body color histogram are assigned to the segment
which are denoted as ek, hk, gk and pk, respectively.
20 [0024] Further, the system compares (304) the details extracted
from the input novel script and the details from the corpuses of the training
data to identify the training segments suitable to the novel script. Based on
15/30
S [ ]
=
=
= =
i l
k k
j T U
i t u
k k i j f t u l I r R
:
:
1 ( , , )
the comparison, the system selects (305) video shots from the training data
which are most appropriate for the novel script. Further, the video segments
selected from the training data are concatenated (306) to form previsualization
videos for the novel script.
5 [0025] Various actions in method 300 may be performed in the
order presented, in a different order or simultaneously. Further, in some
embodiments, some actions listed in FIG. 3 may be omitted.
[0026] In an embodiment, the system chooses a set of different
features so as to capture consistency in textual information and continuity
in selected video shots. These parameters are denoted as f1, f2…, fK 10 and are
together represented as f. An exemplary description of the process of
calculation of such features is given below. Further, the terms used to
represent the features in the description of corpuses and proposed features
are for the purposes of illustrating the application and do not aim to limit
15 the scope of the application. Further, these features may be replaced or
modified accordingly to suit the underlying application. All such variations
fall within the scope of this application.
[0027] Consider the case of speaker order within a script. Normally,
camera remains in focus over an actor who delivers a dialogue. Based on
20 this premise, speaker order in a chosen training segment should be similar
to the one in the novel script. This can be represented as:
(5)
16/30
S
=
=
? ?
?
?
? ?
?
?
= -
l l
k k
j T U
i t u i j
i j
k k d D
Lev d D
f t u l
:
:
2
max(|| ||, || ||)
( , )
( , , ) 1
S
=
=
=
l l
k k
j T U
i t u i j
i j
k k d D
d D
f t u l
:
:
3
max(|| ||, || ||)
min(|| ||, || ||)
( , , )
where ri represents the ith speaker who speaks the di
th dialogue in
a novel script and Rj represents the jth speaker who speaks the Dj
th dialogue
in the corpus. I[.] represents an identity function, which is 1 if and only if
5 corresponding argument is true.
[0028] In an embodiment, Levenshtein distance is used for
capturing dialogue similarity which essentially calculates the minimum
number of operations required to transform a dialogue in the first argument
to the second. Let, ||D|| represent a number of alphabets in D, then
10 (6)
where di represents an ith dialogue of a novel script and Dj
represents the jth dialogue across the corpus. Lev(., .) represents the
Levenshtein distance between two arguments.
[0029] Further, actors remain in focus for the duration they speak.
15 To reduce the mismatch between the length of a dialogue and the time for
which an actor speaks in a candidate segment, similarity between numbers
of words across dialogues is considered. The same can be calculated as:
(7)
20
[0030] By considering the speakers participating in a scene,
continuity of actors across video segments is captured by keeping track of
17/30
| |
| |
( , , ) 4
l
l
t k T
t k T
k k a A
a A
f t u l
È
Ç
=
? ? ?
?
? ?
?
? -
-
-
= l
| |
5
1
( , )
k Tl e E
k f t l e
all actors present in shots. This feature may be represented as:
(8)
where atk refers to a set of actors present in the kth
segment of a novel script and ATl refers to a set of actors present in the lth 5
segment of a training corpus.
[0031] By placing a constraint on the episodes, it is indicated that a
shot should be picked in the vicinity of the shot chosen for the previous
segment in order to compensate for changes in appearance of actors and
10 scenes. Similarity this feature can be computed as:
(9)
(9)
Where ? is an appropriately chosen number, say ?=36. ek-1 is
episode number from which training segment for the (k-1)th segment of a
novel script was taken and ET1 represents an episode number of the lth 15
segment of a training corpus.
[0032] The Color histogram is used to capture visual similarity
across video segments. Color histograms of a previous segment of a novel
script and a candidate segment of the training data are compared. This
20 similarity is clipped to the score obtained between the video shot assigned
to the previous segment of a novel script and the video shot next to that
segment in the training data. This operation aids in generating the same
18/30
? ?
?
?
? ?
?
?
=
+
+
-
-
'' 2 '' 1 2
'' '' 1
1 2 2
6 1
|| || || ||
.
,
|| || || ||
.
( , , '' ) min
l l
l l l
H H
H H
h H
h H
f t l
k l
k l
k
? ?
?
?
? ? ?
?
=
+
+
-
-
2 1 2
1
1 2 2
7 1
|| || || ||
.
,
|| || || ||
.
( , , '' ) min
t t
t t
G G
G G
g G
g G
f t l
k l
k l
k
l l
l l l
videos as in training data for training scripts. Computation performed in
this manner also aids other features to dominate when many shots have
similar visual layout. In an embodiment, this feature can be calculated as:
5 (10)
[0033] where hk-1 is a histogram corresponding to the previous
segment of a novel script and ||.|| 2 denotes an L2 norm of the argument.
Hl''+1 denotes a histogram of the l’th segment of a training corpus, which
10 was essentially assigned to (k-1)th segment of the novel script ie. hk-1 =
Hl''+1 and “.” denotes a dot product between corresponding vectors.
[0034] The gist feature is used to capture visual continuity across
shots. In an embodiment, this feature can be calculated as:
15
(11)
where gk-1 is the gist descriptor of the kth segment of the
novel script and Gl’+1 is assigned to the (k-1)th segment of the novel script,
20 i.e. gk-1 = Gl’. These features need to generate suitable scores for segments
with varying number of lengths (dialogues). Features f4, f5, f6 and f7
described in the Equation (8), (9), (10) and (11) generate scores in [0, 1]
19/30
2 2 || || || ||
.
( , '' '' )
z l
P z l
p P
p P
f l l =
| |
| |
argmax
1,..., 1
j l
j l
t T
t T
j k a A
a A
z
È
Ç
=
= -
range, which is satisfactory for segments with one dialogue. However,
higher scores are to be generated for segments with more dialogues in
them. For this purpose, in one embodiment, each of the above mentioned
features are multiplied by the length of the segment, i.e. by the number of
5 dialogues in them.
[0035] The Upper-body color histogram is a feature using which
continuity in clothing of actors is captured. This is done by using an upperbody
detector. As camera can suddenly change focus to different set of
actors in a scene, similarity of clothing needs to be carried out with respect
10 to the past shot having maximum number of speakers in common to the
present segment of a novel script. We use the feature fp for re-ranking ten
best shots obtained from utilizing the features of f = [f1, f2, ..., f7], which
are described above. This feature can be computed as:
(12)
15
where
(13)
where Pz is a color histogram of the upper-bodies of the zth
20 segment of a training segment and Pl is a color histogram of the upperbodies
of the lth segment of a training corpus.
[0036] In another embodiment, the system implements various
20/30
algorithms for carrying out simultaneous segmentation and assignment of
video shots through inference algorithms and for learning weights for
features described above. In an embodiment, the inference algorithms used
may be a sequential segmentation algorithm, a Viterbi-based segmentation
algorithm or any other algorithm that can be used as an inference 5 algorithm.
In another embodiment, algorithm used for learning weights for features
may be a Perceptron based algorithm or any suitable algorithm that can be
used for learning weight of features.
[0037] In an embodiment, the sequential segmentation algorithm
10 used for performing simultaneous segmentation and assignment is a
forward classification mechanism which looks only in the forward direction
so as to assign a segment to the segment of a novel script at each stage.
This algorithm is linear in time with respect to the number of dialogues in
the script. In an embodiment, the search can be restricted only to segments
15 with at most r number of dialogues. This restriction comes from a training
corpus as in practice, a large number of segments have only a few
dialogues in them, whereas only a few segments have relatively a large
number of dialogues in them. Let us assume that we have segmentation up
to j dialogues of a novel script, then in order to find the next segment, we
20 proceed in the following manner. Let, s’= [s1, s2,….sk-1] be a segmentation
such that t1 = 1, uk-1 = j, ti+1 = ui + 1; "i: 1 = i = k-2, then kth segment sk can
be obtained as:
21/30
( 1, , ) k opt opt s = j + j + c l
( )
| |
. 1, , , '' ,
( , ) argmax
:1 ,, 1
1,2,..., c
W f j j c l x
c l
l l M U T c
c r
opt opt
l l
+ + l
=
" £ £ - + =
=
( ) ( )
? ? ?
- + - + >
" £ £ = = - +
Otherwise
V j c W f j c j l x if j
l l M V j l l l c U T
0;
max , '' . 1, , , '', ; 0
:1 ; ( , ) ''; 1
l l
l
(14)
where
5 (15)
where in Equation (15) right hand side is divided by |c| to generate
scores in [0, 1] interval for varying size of segments.
[0038] In another embodiment, Viterbi-based recursive algorithm
10 can be used to obtain the optimal segmentation of Equation (3). Viterbibased
recursive scheme looks in both forward and backward directions in
order to pick assignments. Due to such characteristic, this algorithm
succeeds in providing the optimal segmentation at a computational cost
higher than the cost of employing the sequential segmentation algorithm.
15 Further, for partial segmentations from 1 to j-1 dialogues of a query script,
if search is restricted to training segments having at most r dialogues,
optimal segmentations s up to jth dialogue of the query script can be given
as:
20
(16)
22/30
t t ( t t ) ( t k ) W W F x , S F x , s 1 = + - +
( , ; ) (1 ) ( , ; ) t t k t t t score x W s > -b × score x W S
where V (j, l) is the score of the best segmentation of first j
dialogues of a script x, which concludes with segment si Îs such that ui = j
and li = l.
[0039] In another embodiment, the Perceptron-based algorithm is
employed for finding optimum weights W for features f = [f1, f2, …, fK5 ].
Further, weights W of various features are learned in order to improve the
performance of the inference algorithm by iteratively making corrections to
weights, W. For this purpose, segmentations achieved by the inference
algorithm are compared to the known ground truth segmentations {S1,
10 S2….ST} of the training scripts {x1, x2, …., xT}, where (xt, St) represents a
tuple consisting of tth training script and its true segmentation, respectively.
Let score(x, W; s) = W. F(x, s) correspond to a score of the segmentation s.
Further, let {s1, s2, …, sK} be the K best segmentations achieved from the
inference algorithm in terms of scores achieved for the script, xt. Then,
weights for sk
th 15 segmentation of script xt is updated as:
(
17)
provided,
20
(18)
In the described embodiment, we used K = 2, ß = 0.05 and iterated
23/30
over all T = 24 scripts for 10 times. Such iterations over all scripts are
reported to yield better performance in semi-Markov CRF formalism.
[0040] The embodiments disclosed herein can be implemented
through at least one software program running on at least one hardware
device and performing network management functions 5 to control the
network elements. The network elements shown in Fig. 2 include blocks
which can be at least one of a hardware device, or a combination of
hardware device and software module.
[0041] The embodiment disclosed herein specifies a system for
10 creation of pre-visualization videos. The mechanism allows automation of
the entire process of creation of pre-visualization videos and providing a
system thereof. Therefore, it is understood that the scope of the protection
is extended to such a program and in addition to a computer readable means
having a message therein, such computer readable storage means contain
15 program code means for implementation of one or more steps of the
method, when the program runs on a server or mobile device or any
suitable programmable device. The method is implemented in a preferred
embodiment through or together with a software program written in e.g.
Very high speed integrated circuit Hardware Description Language
20 (VHDL) another programming language, or implemented by one or more
VHDL or several software modules being executed on at least one
hardware device. The hardware device can be any kind of device which can
24/30
be programmed including e.g. any kind of computer like a server or a
personal computer, or the like, or any combination thereof, e.g. one
processor and two FPGAs. The device may also include means which could
be e.g. hardware means like e.g. an ASIC, or a combination of hardware
and software means, e.g. an ASIC and an FPGA, 5 or at least one
microprocessor and at least one memory with software modules located
therein. Thus, the means are at least one hardware means and/or at least one
software means. The method embodiments described herein could be
implemented in pure hardware or partly in hardware and partly in software.
10 The device may also include only software means. Alternatively, the
invention may be implemented on different hardware devices, e.g. using a
plurality of CPUs.
[0042] The foregoing description of the specific embodiments will
so fully reveal the general nature of the embodiments herein that others can,
15 by applying current knowledge, readily modify and/or adapt for various
applications such specific embodiments without departing from the generic
concept, and, therefore, such adaptations and modifications should and are
intended to be comprehended within the meaning and range of equivalents
of the disclosed embodiments. It is to be understood that the phraseology or
20 terminology employed herein is for the purpose of description and not of
limitation. Therefore, while the embodiments herein have been described in
terms of preferred embodiments, those skilled in the art will recognize that
25/30
the embodiments herein can be practiced with modification within the spirit
and scope of the claims as described herein.
26/30
WE CLAIM:
1. A method for generating storyboard videos for a script, said method
comprising:
Segmenting said script into at least one segment, wherein each segment
corresponds to one training shot;
Comparing information of said segments of novel script with
information of plurality of training segments;
Selecting video segments from said plurality of training video
segments, wherein information of said selected video segments match
with the provided details of a novel script; and
Concatenating said selected video segments to form a pre-visualization
video.
2. The method, as claimed in claim 1, wherein said method considers a
plurality of features, said features comprising
Speakers in said segment;
Order of said speakers in said segment;
Similarity of dialogues between said segment and said plurality of
training segments;
Similarity between number of words across dialogues;
27/30
Constraining said plurality of training segments to be compared;
Visual similarity between said segment and said plurality of training
segments;
Visual continuity across segments; and
Continuity in clothing of actors in said segment and said plurality of
training segments.
3. The method, as claimed in claim 2, wherein similarity of dialogues is
captured using Levenshtein distance.
4. The method, as claimed in claim 2, wherein visual similarity between
said segment and said plurality of training segments is checked using a
colour histogram.
5. The method, as claimed in claim 2, wherein continuity in clothing of
actors in said segment and said plurality of training segments is checked
using an upper body colour histogram.
6. The method, as claimed in claim 2, wherein weights are assigned to
said plurality of features.
7. The method, as claimed in claim 6, wherein a Perceptron-based
algorithm is used to find said weights.
28/30
8. The method, as claimed in claim 1, wherein said method further uses
interference algorithms.
9. The method, as claimed in claim 1, wherein said selected video
segments are displayed to a user for each segment of a novel script.
10. The method, as claimed in claim 9, wherein said selected video
segments are merged into at least one video sequence, before displaying to
said user.
11. The method as claimed in claim 1, wherein the said information is at
least one of textual information, visual information and combination
thereof.
12. The method as claimed in claim 1, wherein the said method is fully
automatic.
13. The method, as claimed in claim 1, wherein said method further
comprises of segmenting said video segments into sub-segments, based on
said script.
29/30
14. The method, as claimed in claim 1, wherein a plurality of previsualization
videos are present for each segment.
15. A system performing a method as in at least one of preceding method
claims 1 to 14.
Dated September 30, 2011
Dr.Kalyan Chakravarthy
Patent Agent
30/30
ABSTRACT
The embodiments herein disclose a system and a method for
creating pre-visualization videos for a given script. Further, the proposed
system for creation of pre-visualization videos is fully automatic. The
proposed system takes a novel script for which pre-visualization videos are
to be constructed, as an input. The described system proposes several
features based on a novel script and training segments’ textual and visual
information to produce storyboard videos consistent with respect to the
textual information described in a novel script and consistent across
selected video shots. Further, the system makes use of semi-Markov
conditional random field (Semi-markov CRF) mechanism to perform
simultaneous segmentation and assignment of video shots. Further,
inference algorithms and weight learning algorithm are used in the process
of creation of pre-visualization videos. Further, the proposed system is
capable of generating multiple storyboard videos for a given novel script.FIG. 3

Documents

Application Documents

#	Name	Date
1	2818-MUM-2011-FORM 8(25-10-2011).pdf	2011-10-25
1	2818-MUM-2011-RELEVANT DOCUMENTS [08-09-2022(online)].pdf	2022-09-08
2	2818-MUM-2011-EDUCATIONAL INSTITUTION(S) [26-10-2021(online)].pdf	2021-10-26
2	2818-MUM-2011-FORM 18(25-10-2011).pdf	2011-10-25
3	2818-MUM-2011-EVIDENCE FOR REGISTRATION UNDER SSI [26-10-2021(online)].pdf	2021-10-26
3	2818-MUM-2011-CORRESPONDENCE(25-10-2011).pdf	2011-10-25
4	2818-MUM-2011-RELEVANT DOCUMENTS [29-09-2021(online)]-1.pdf	2021-09-29
4	2818-MUM-2011-FORM 26(11-11-2011).pdf	2011-11-11
5	2818-MUM-2011-RELEVANT DOCUMENTS [29-09-2021(online)].pdf	2021-09-29
5	2818-MUM-2011-FORM 1(11-11-2011).pdf	2011-11-11
6	2818-MUM-2011-RELEVANT DOCUMENTS [24-03-2020(online)].pdf	2020-03-24
6	2818-MUM-2011-CORRESPONDENCE(11-11-2011).pdf	2011-11-11
7	Other Document [10-09-2015(online)].pdf	2015-09-10
7	2818-MUM-2011-IntimationOfGrant26-09-2019.pdf	2019-09-26
8	Form 13 [10-09-2015(online)].pdf	2015-09-10
8	2818-MUM-2011-PatentCertificate26-09-2019.pdf	2019-09-26
9	2818-MUM-2011-FER_SER_REPLY [25-08-2018(online)].pdf	2018-08-25
9	2818-MUM-2011-FORM 4(ii) [24-07-2018(online)].pdf	2018-07-24
10	2818-MUM-2011-Amendment Of Application Before Grant - Form 13 [24-08-2018(online)].pdf	2018-08-24
10	Form-5.pdf	2018-08-10
11	2818-MUM-2011-FER.pdf	2018-08-10
11	Form-3.pdf	2018-08-10
12	ABSTRACT1.jpg	2018-08-10
12	Form-1.pdf	2018-08-10
13	Drawings.pdf	2018-08-10
14	ABSTRACT1.jpg	2018-08-10
14	Form-1.pdf	2018-08-10
15	2818-MUM-2011-FER.pdf	2018-08-10
15	Form-3.pdf	2018-08-10
16	2818-MUM-2011-Amendment Of Application Before Grant - Form 13 [24-08-2018(online)].pdf	2018-08-24
16	Form-5.pdf	2018-08-10
17	2818-MUM-2011-FORM 4(ii) [24-07-2018(online)].pdf	2018-07-24
17	2818-MUM-2011-FER_SER_REPLY [25-08-2018(online)].pdf	2018-08-25
18	2818-MUM-2011-PatentCertificate26-09-2019.pdf	2019-09-26
18	Form 13 [10-09-2015(online)].pdf	2015-09-10
19	Other Document [10-09-2015(online)].pdf	2015-09-10
19	2818-MUM-2011-IntimationOfGrant26-09-2019.pdf	2019-09-26
20	2818-MUM-2011-RELEVANT DOCUMENTS [24-03-2020(online)].pdf	2020-03-24
20	2818-MUM-2011-CORRESPONDENCE(11-11-2011).pdf	2011-11-11
21	2818-MUM-2011-RELEVANT DOCUMENTS [29-09-2021(online)].pdf	2021-09-29
21	2818-MUM-2011-FORM 1(11-11-2011).pdf	2011-11-11
22	2818-MUM-2011-RELEVANT DOCUMENTS [29-09-2021(online)]-1.pdf	2021-09-29
22	2818-MUM-2011-FORM 26(11-11-2011).pdf	2011-11-11
23	2818-MUM-2011-EVIDENCE FOR REGISTRATION UNDER SSI [26-10-2021(online)].pdf	2021-10-26
23	2818-MUM-2011-CORRESPONDENCE(25-10-2011).pdf	2011-10-25
24	2818-MUM-2011-FORM 18(25-10-2011).pdf	2011-10-25
24	2818-MUM-2011-EDUCATIONAL INSTITUTION(S) [26-10-2021(online)].pdf	2021-10-26
25	2818-MUM-2011-FORM 8(25-10-2011).pdf	2011-10-25
25	2818-MUM-2011-RELEVANT DOCUMENTS [08-09-2022(online)].pdf	2022-09-08

Search Strategy

1	searchstartegy_12-12-2017.pdf