System For Hardware, Channel, Language And Ad Length Agnostic

< Back

System For Hardware, Channel, Language And Ad Length Agnostic Detection Of Televised Advertisements

Abstract: The present disclosure provides a system for hardware, channel, language and ad length agnostic detection of multi-lingual televised advertisements. The detection is performed across live streams of media content of one or more broadcasted channels. The system selects a set of frames per second from a pre-defined set of frames. The system extracts a pre-defined number of keypoints from each selected frame and derivation of a pre-defined number of binary descriptors from the extracted keypoints. The system creates of a special pyramid of the binary descriptors and accessing a second vocabulary of binary descriptors. The system compares each spatially identifiable binary descriptor from the first vocabulary with spatially identifiable binary descriptors in clusters of the second vocabulary. The system progressively scores each selected frame and detects the first ad in the live streams of the media content. TO BE PUBLISHED WITH FIGURE 3

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

07 June 2017

Publication Number

50/2018

Publication Type

INA

Invention Field

COMMUNICATION

Status

gsdavar06@gmail.com

Parent Application

Applicants

SILVEREDGE TECHNOLOGIES PVT. LTD.

Plot No. 131, 2nd Floor, Sector 44, Gurgaon – 122002, Haryana, India.

Inventors

1. DEBASISH MITRA

Plot No. 131, 2nd Floor, Sector 44, Gurgaon – 122002, Haryana, India.

2. HITESH CHAWLA

Plot No. 131, 2nd Floor, Sector 44, Gurgaon – 122002, Haryana, India.

Specification

TECHNICAL FIELD
[0001] The present invention relates to the field of computer vision and machine
learning and, in particular, relates to a hardware, channel, language and ad length agnostic real time detection of televised video advertisements.
BACKGROUND
[0002] With the proliferation and penetration of internet and televised media
network in recent years, the affordability and access of media networks has concurrently grown. With such penetration and public viewership, advertisers have shifted to the internet space and televised media networks for promoting their products and services. The demand for ad space has increased in the televised media networks concurrently with an increase in number of broadcasted channels. These advertisers, publishers and content distributors need to closely monitor airing of advertisements across multiple broadcasted television channels and socials networks. The need for faster ad analytics across multiple channels and platforms is witnessed with adoptions of competitive bidding between advertisers for booking ad space.
[0003] A television broadcast essentially consists of videos of scheduled
programs and sponsored advertisements. Each advertisement video is generally scheduled to run for 10 to 35 seconds approximately on multiple channels at different or same time. The advertisements are provided by advertisers to run in between the scheduled broadcast of the program on each channel. Traditionally, these advertisements are either detected manually by assigning each user for each channel to records ads related data. In technological approaches, the use of supervised detection of key features of the ad in the stream of the broadcast is performed. The approach focuses on detection of advertisements by extracting

and analyzing digital audio fingerprints. The audio fingerprints are probabilistically matched with similar fingerprints in a master database. The probabilistic match is analyzed for positive validation of the airing media as an ad.
[0004] The present solutions have several disadvantages. The present solutions
are inefficient for removing redundancy of false positives in ad detection. The use of audio fingerprints for detecting the ad increases the error rate with the use of similar audio tones and fingerprints in non-ad content. The time for detection of the advertisement also increases with the usage of computationally inefficient and error prone techniques of supervised audio fingerprinting. The increase in detection time affects the synchronization and operations of cross platform ad campaigns. These solutions detect same ad with the different language of communication on different native language channels as different ads. These solutions are not language or completely platform agnostic. In addition, these solutions lack the precision and accuracy to differentiate programs from advertisements.
[0005] In light of the above stated discussion, there is a need for a method and
system which overcomes the above stated disadvantages.
SUMMARY
[0006] In an aspect, the present disclosure provides a computer system. The
computer system includes one or more processors and a memory coupled to the one or more processors. The memory causes the one or more processors to perform a method for hardware, channel, language and ad length agnostic real time detection of multi-lingual televised advertisements. The real time detection of the multi-lingual televised advertisements is performed across live streams of media content of one or more broadcasted channels. The method includes a first step of selecting a set of frames per second from a pre-defined set of frames. The set of frames is selected from the pre-defined set of frames in each second of each

live stream of the media content in each broadcasted channel of the one or more broadcasted channels. The method includes another step of extracting a pre-defined number of keypoints from each selected frame of the media content. The pre-defined number of keypoints is extracted on the basis of validation of at least one of a scale invariance criterion, a rotation invariance criterion and a Harris score criterion. The method includes yet another step of deriving a pre-defined number of binary descriptors from the pre-defined number of extracted keypoints. Each binary descriptor is characterized by a binary string with a length of 256 bits. Further, the method includes yet another step of creating a special pyramid of the pre-defined number of binary descriptors to obtain a pre-defined number of spatially identifiable binary descriptors. The special pyramid is created for obtaining a first vocabulary of binary descriptors of each selected frame. The method includes yet another step of accessing a second vocabulary of binary descriptors for a curated comprehensive repository of ad frames. The curated comprehensive repository of the ad frames corresponds to a comprehensive set of televised advertisements. The second vocabulary includes a set of tree structured clusters of binary descriptors. The method includes yet another step of comparing each spatially identifiable binary descriptor from the first vocabulary of each selected frame with a plurality of spatially identifiable binary descriptors in at least one or more clusters of the second vocabulary of the binary descriptors. The method includes yet another step of progressively scoring each selected frame from each live stream of the media content for validation of the selected frame as the ad frame of a first ad. The first ad is progressively scored for each positively validated frame to obtain a progressive ad score. The progressive score is calculated for the ad length independent detection of the first ad. The method includes yet another step of detecting the first ad in each live stream of the media content of the one or more broadcasted channels when the progressive score for the first ad is above a threshold score. The first ad is broadcasted by the one or

more broadcasted channels in one or more languages. The first ad is detected independent of the one or more languages of the first ad broadcasted in the one or more broadcasted channels.
BRIEF DESCRIPTION OF THE FIGURES
[0007] FIG. 1A illustrates a system for hardware, channel, language and ad
length agnostic real time detection of multi-lingual televised advertisements, in accordance with various embodiments of the present disclosure;
[0008] FIG. IB illustrates an example of creating special pyramid for a selected
frame of the media content;
[0009] FIG. 1C illustrates an example of clustering binary descriptors into
million clusters;
[0010] FIG. ID illustrates an example of frame selections from the media
content for detecting ads and detection time thereof;
[0011] FIG. 2 illustrates a block diagram of an advertisement detection system,
in accordance with various embodiments of the present disclosure; and
[0012] FIG. 3 illustrates a block diagram of a computing device, in accordance
with various embodiments of the present disclosure.
DETAILED DESCRIPTION
[0013] FIG. 1A illustrates a system 100 for hardware, channel, language and ad
length agnostic real time detection of multi-lingual televised advertisements across live streams of media content of one or more broadcasted channels, in accordance with various embodiments of the present disclosure. The system 100 performs real time supervised detection of televised video advertisements across the live streams of the media content of the one or more broadcasted channels. The real time detection of televised video advertisements is significantly interrelated to the detection time for positive validation of airing content as a televised video ad. The system 100 implements sophisticated statistical learning and mature computer vision techniques on video frames of the media content.

The use of efficiently structured database management, feature extraction and
analysis provides robust and non-redundant detection of multi-lingual televised
video advertisements in the live streams of the media content. Moreover, the
system 100 describes a network and computational infrastructure for coherent,
concurrent and non-redundant reception, statistical analysis, recognition and
supervised detection of televised ads for one or more television channels. The
implementation of computer vision, feature extraction and statistical learning
techniques on real time collected and pre-processed statistical models of
advertisement videos facilitates the supervised detection of televised video
advertisements. The media content in the live stream of the channel is selectively
processed in units of frames and robust statistical learning models are developed
over the selectively processed frames of the media content. The robust statistical
learning models are parametrically analyzed and prominent and prospective
features are classified and compared for an improvement in time and error rate of
detection of the televised video ads. The detection time is significantly reduced
and ad analytics can be synchronized with social media or television ad
campaigns.
[0014] The system 100 includes one or more channel media reception devices
102 and an advertisement detection system 104. The above stated elements of the system 100 operate coherently and synchronously to detect televised ads in the media content of the one or more broadcasted channels. Each channel media reception device is a device for reception, authentication, decoding of the live stream of encoded media content in the form of channel feed of each broadcasted channel. Each channel media reception device 102 is attached directly or indirectly to a receiving antenna or dish. The receiving antenna receives a broadcasted signal carrying one or more channel feeds. The one or more channel feeds are encoded in a pre-defined format. In addition, the one or more channel feeds have a set of characteristics. The set of characteristics include but may not

be limited to a frame rate, an audio sample rate, encoding scheme, transcoding scheme, bit rate and channel frequency band.
[0015] The broadcasted signal carrying the one or more channel feeds is initially
transmitted from a transmission device. In an embodiment of the present disclosure, the broadcasted signal carrying the one or more channel feeds is a multiplexed MPEG-2 encoded signal having a constant bit rate. In another embodiment of the present disclosure, the broadcasted signal carrying the one or more channel feeds is a multiplexed MPEG-2 encoded signal having a variable bit rate. In yet another embodiment of the present disclosure, the broadcasted signal carrying the one or more channel feeds is any digital standard encoded signal. The bit rate is based on complexity of each frame in each of the one or more channel feeds. The quality of the multiplexed MPEG-2 encoded signal will be reduced when the broadcasted signal is too complex to be coded at a constant bit-rate. The bit rate of the variable bit-rate MPEG-2 streams is adjusted dynamically as less bandwidth is needed to encode the images with a given picture quality. In addition, the broadcasted signal is encrypted for a conditional access to a particular subscriber. The encrypted broadcast signal is uniquely decoded by the one or more channel media reception devices 102 uniquely.
[0016] In an example, a digital TV signal is received on each channel media
reception device as a stream of MPEG-2 data. The MPEG-2 data has a transport stream. The transport stream has a data rate of 40 megabits/second for a cable or satellite network. Each transport stream consists of a set of sub-streams. The set of sub-streams is defined as elementary streams. Each elementary stream includes an MPEG-2 encoded audio, an MPEG-2 encoded video and data encapsulated in an MPEG-2 stream. Moreover, each elementary stream includes a packet identifier (hereinafter "PID") that acts as a unique identifier for corresponding elementary stream within the transport stream. The elementary

streams are split into packets in order to obtain a packetized elementary stream (hereinafter "PES").
[0017] In an embodiment of the present disclosure, the one or more channel
media reception devices 102 are digital set top boxes. In another embodiment of the present disclosure, the one or more channel media reception devices 102 are hybrid set top boxes. In yet another embodiment of the present disclosure, the one or more channel media reception devices 102 are any standard broadcast signal processing device. Each channel media reception device may receive the broadcast signal from any broadcast signal medium. In an embodiment of the present disclosure, the broadcast signal medium is an Ethernet cable. In another embodiment of the present disclosure, the broadcast signal medium is a satellite dish. In yet another embodiment of the present disclosure, the broadcast signal medium is a coaxial cable. In yet another embodiment of the present disclosure, the broadcast signal medium is a telephone line having DSL connection. In yet another embodiment of the present disclosure, the broadcast signal medium is a broadband over power line (hereinafter "BPL"). In yet another embodiment of the present disclosure, the broadcast signal medium is wireless ordinary VHF or UHF bandwidth.
[0018] Each channel media reception device of the one or more channel media
reception devices 102 primarily includes a signal input port, an audio output port, a video output port, a de-multiplexer, a video decoder, an audio decoder and a graphics engine. The broadcast signal carrying the one or more channel feeds is received at the signal input port. The broadcast signal carrying the one or more channel feeds is de-multiplexed by the de-multiplexer. The video decoder decodes the encoded video and the audio decoder decodes the encoded audio. The video and audio corresponds to a channel selected in each channel media reception device. Each channel media reception device of the one or more

channel media reception devices 102 carries the one or more channel feeds multiplexed to form a single transporting stream.
[0019] The decoded audio and the decoded video for each live stream of each
broadcasted channel are received at the audio output port and the video output port. The decoded video is characterized by a set of features. The set of features include but may not be limited to a frame height, a frame width, a frame rate, a video resolution, a bit rate and an encoding scheme. Further, the decoded video may be of any display resolution. Examples of the display resolution include but may not be limited to 144p, 240p, 360p, 480p, 720p, 1080p and 1080L Here, p and i denotes progressive scan and interlace scan techniques.
[0020] The decoded video for each broadcasted channel is transferred to the
advertisement detection system 104. The decoded video corresponds to the live stream of media content for each broadcasted channel. The live streams of the media content include ad streams and non-ad streams. The ad streams include one or more ads. The non-ad streams may correspond to televised published content, channel related data, subscription related data, operator messages and the like.
[0021] The media content is characterized by a pre-defined frame rate, a pre-
defined number of frames and a pre-defined bit rate for a pre-defined interval of broadcast. The pre-defined frame rate is defined as a pre-defined set of frames streamed per second in the live streams of the media content. The pre-defined set of frames per second is often referred as frame per second (FPS). Examples of the pre-defined set of frames per second include but may not be limited to 25 FPS, 30 FPS, 20 FPS and 35 FPS.
[0022] The advertisement detection system 104 selects a set of frames per
second from the pre-defined set of frames. The pre-defined set of frames are streamed in each second of each live stream of the media content for each broadcasted channel of the one or more broadcasted channels. Each selected

frame in the selected set of frames differs by a frame gap of at least 8 frames. In an embodiment of the present disclosure, the selected set of frames per second from the pre-defined set of frames is 3 per second. In another embodiment of the present disclosure, the selected set of frames per second from the pre-defined set of frames is more or less than 3 per second. For example, the advertisement detection system 104 may be configured to select every 8* frame from 25 frames per second of the media content. The pre-defined set of frames may be every 8*, 16th and 24th frame per 25 frames in each second of the media content.
[0023] The advertisement detection system 104 performs histogram
normalization on each selected frame from the selected set of frames of the media content broadcasted on each broadcasted channel. The advertisement detection system 104 sets each pixel value in each frame of the media content to a standard pixel value range set. Further, the advertisement detection system 104 scales each frame of the media content to a pre-defined scale. In an embodiment of the present disclosure, the pre-defined scale of each frame is 640 by 480. Each selected frame of the set of selected frame per second is characterized by a pre-defined number of pixels. Each pixel is fundamentally a combination of Red, Green and Blue colors (hereinafter "RGB"). Each color of the pixel (RGB) has the pre-defined value in a range of 0-255. The advertisement detection system 104 gray-scales each selected frame of the selected set of frames. The implementation of gray-scaling and histogram normalization on each selected frame removes any hardware dependency of the selected set of frames in the media content across the live streams of the one or more broadcasted channels. Moreover, the removal of the hardware dependency facilitates in hardware agnostic supervised detection of the televised video advertisements across the live streams of the media content of the one or more broadcasted channels.
[0024] The advertisement detection system 104 extracts a pre-defined number of
keypoints from each selected frame of the selected set of frames per second of the

media content. Each extracted keypoint is characterized by a spatial position in each selected frame. In general, each keypoint in any gray-scaled frame corresponds to a pixel with enough neighborhood pixels in a different gray area with the pixel. Each keypoint is generally selected by choosing an arbitrary pixel point as center to form a circular area and assigning the circular area as the pixel point's neighborhood. The circular area of the selected keypoint is analyzed for satisfying the condition:
|Ix - Ip| > IT Where Ix is the intensity of gray-scaled neighborhood pixels and Ip is the intensity of a selected feature pixel.
[0025] The advertisement detection system 104 uses Harris detector to select the
pre-defined number of keypoints and sorts the selected pre-defined set of keypoints. The pre-defined number of keypoints is extracted based on validation of at least one of scale invariance criterion, rotation invariance criterion, Harris score criterion and translational invariance criterion. The extracted pre-defined number of keypoints is invariant to at least one of rotation, translation and scaling of objects in selected frame in the set of frames of the media content. In an exemplary embodiment of the present disclosure, the pre-defined number of keypoints for each selected frame in the selected set of frames is 700. In another embodiment of the present disclosure, the pre-defined number of keypoints for each selected frame in the selected set of frames is more than 700. In yet another embodiment of the present disclosure, the pre-defined number of keypoints for each selected frame in the selected set of frames is less than 700. The number of selections depends on a scale of detection and limited computational capacity of the advertisement detection system 104.
[0026] The advertisement detection system 104 calculates direction in corners of
each keypoint. The directions in the form of intensity vectors are calculated using uses intensity centroid technique. The advertisement detection system 104

derives a pre-defined number of binary descriptors from the pre-defined number of extracted keypoints. Each binary descriptor is characterized by a binary string with a length of 256 bits. Each 256 bit of binary descriptor uniquely identifies a feature point in the selected frame of the media content. Each binary descriptor is derived after adaptive implementation of rotational, translational and scale invariance techniques. The number of binary descriptors depends on the number of extracted keypoints. In an exemplary embodiment of the present disclosure, the pre-defined number of binary descriptors is 700. In another embodiment of the present disclosure, the pre-defined number of binary descriptors is more than 700. In yet another embodiment of the present disclosure, the pre-defined number of binary descriptors is less than 700.
[0027] The advertisement detection system 104 is a computing server with
provisions for volatile memory as random access memory (RAM) and permanent non-volatile storage spaces. The advertisement detection system 104 maintains one or more databases in the volatile memory and permanent non-volatile storage space. The one or more databases in the advertisement detection system 104 includes a first vocabulary database 104a, a second vocabulary database 104b, an ad frame repository 104c and an ad detection database 104d. The advertisement detection system 104 uses the one or more databases for storing the descriptor data and other statistical and detection data.
[0028] The advertisement detection system 104 stores the derived descriptor
data in the first vocabulary database 104a. The advertisement detection system 104 assigns a weight value to each binary descriptor of the pre-defined number of binary descriptors in the first vocabulary database 104a. The assignment of the weight values to the binary descriptors is based on a term frequency and an inverse document frequency of the descriptors in the first vocabulary database 104a. For example, the weight value for a binary descriptor (B) with a higher term frequency will be less than the weight value for a binary descriptor (C) with

a lower term frequency. The weight values of the binary descriptors are L1
normalized to lie in a range of 0 and 1. The normalized weight values for the
binary descriptors are characterized by an arithmetic sum of 1.
[0029] Mathematically, if Wi denotes the weight values of N descriptors, then
5 the L1 normalization of the Wi weights is defines as:
W i (Normalized) = W i / W Norm Where N
W Norm = ∑W i N=1
[0030] The normalization of the weight values of the pre-defined number of the
binary descriptors is performed to simplify calculation parameters and reduce
statistical sparse distribution of the weight values. For example, an assigned
weight value W1 is 30 for a specific term frequency and another weight value W2
10 is 150 for another term frequency. The L1 normalization of W1 and W2 will be
30/ (30+150) and 150/ (30+150), i.e. 0.167 and 0.833 respectively. The L1
normalized weights of the pre-defined number of binary descriptors are stored in
the first vocabulary database 104a of the advertisement detection system 104.
[0031] The spatial information of the extracted keypoint and associated binary
15 descriptor is lost with the derivation of the binary descriptors for the extracted
keypoints from each selected frame of the media content. The spatial information is revalidated and identified by creating a special pyramid of binary descriptors (as shown in FIG. 1B). The advertisement detection system 104 creates the special pyramid of the pre-defined number of derived binary descriptors to obtain
20 a pre-defined number of spatially identifiable binary descriptors. The special
pyramid is a database of binary descriptors with three stages of each selected frame. The three stages are labeled for illustration in FIG. 1B. The three stages are stage 1, stage 2 and stage 3 respectively. The three stages are segregated into
13

nine discrete levels for the same selected frame. The nine discrete levels are
labeled in FIG. 1B as L1, L2, L3, L4, L5, L6, L7, L8 and L9 respectively. Each
level segregates the keypoints and descriptors according to a specific sliced
region in the selected frame. The first stage includes the entire selected frame and
5 the binary descriptors as sparsely distributed over the entire frame in the first
level (L1). The spatial information is uncertain in L1. The second stage includes
a second level (L2) and a third level (L3) of the binary descriptors as slices of the
selected frame into two equal rectangular halves. The second level (L2) maps to a
first fraction of the binary descriptors and the third level (L3) maps to remaining
10 binary descriptors. The third stage includes six levels (L4, L5, L6, L7, L8 and L9)
of the binary descriptors. The six levels (L4, L5, L6, L7, L8 and L9) are
distributed over six slices of the selected frame. Each slice maps a specific
fraction of the binary descriptors. Each binary descriptor can be mapped spatially
in each stage and subsequent level of the special pyramid. The third stage is most
15 significant for spatially identifying each binary descriptor in the pre-defined
number of the binary descriptors. The number of spatially identifiable binary
descriptors is thrice the number of the binary descriptors for each selected frame.
In an embodiment of the present disclosure, the pre-defined number of spatially
identifiable binary descriptors is 2100 when the pre-defined number of binary
20 descriptors is 700. The pre-defined number of spatially identifiable binary
descriptors are updated, sorted and stored as records in the first vocabulary
database 104a (as shown in FIG. 1B). Also, each spatially identifiable binary
descriptor is tagged with a level identifier in the first vocabulary database 104a.
Each weight value for each binary descriptor in the special pyramid of the pre-
25 defined number of spatially identifiable binary descriptors is again L1
normalized. Each weight value in the special pyramid is L1 normalized such that
the arithmetic sum of the weight values is 1. The advertisement detection system
104 finally obtains a first vocabulary of binary descriptors for the pre-defined
14

number of spatially identifiable binary descriptors of each selected frame in the
first vocabulary database 104a.
[0032] The detection of a first ad in the live stream of the media content is based
on implementation of supervised detection techniques. The advertisement
5 detection system 104 accesses a second vocabulary of binary descriptors in the
second vocabulary database 104b. The second vocabulary of binary descriptors corresponds to a curated comprehensive repository of ad frames from a comprehensive set of televised advertisements. The curated comprehensive repository of ad frames from the comprehensive set of televised advertisements is
10 stored in the ad frame repository 104c of the advertisement detection system 104.
The second vocabulary of the binary descriptors is created prior to real time detection of the live stream of the media content. The second vocabulary of binary descriptors is elementally stored in the second vocabulary database 104b as spatially identifiable records with tagged levels of the special pyramid and
15 normalized weight values. The advertisement detection system 104 creates the
second vocabulary of binary descriptors from the comprehensive repository of ad frames. The advertisement detection system 104 extracts the pre-defined number of keypoints and the pre-defined number of binary descriptors from each frame in the repository of ad frames. The advertisement detection system 104 creates the
20 special pyramid of the binary descriptors for each ad frame to obtain the pre-
defined number of spatially identifiable binary descriptors.
[0033] The advertisement detection system 104 clusters the binary descriptors
into a first set of clusters (as shown in FIG. 1C). The binary descriptors are clustered into the first set of clusters. The first set of clusters is created based on
25 an evaluation of a minimum hamming distance between each binary descriptor in
the pre-defined number of spatially identifiable binary descriptors. Each cluster in the first set of clusters is associated with a centroid. The centroid of each cluster is a middle of each cluster. The centroid is a vector which includes a
15

number for each variable in each cluster of the first set of clusters. Each number
is a mean of the variables in each cluster of the first set of clusters. Each centroid
is measure of location of each cluster in the set of clusters. In an embodiment of
the present disclosure, the first set of clusters includes 10 centroids associated
5 with 10 clusters (C1-C10). In another embodiment of the present disclosure, the
first set of clusters includes more or less than 10 centroids associated with more or less than 10 clusters. Each cluster in the first set of clusters includes a fraction of the pre-defined number of spatially identifiable binary descriptors. Further, the advertisement detection system 104 iteratively clusters the binary descriptors in
10 each cluster of the first set of clusters and each subsequent cluster for a pre-
determined iteration level to obtain a set of tree structured clusters. The pre-determined iteration level for iteratively clustering the derived binary descriptors obtained from the repository of ad frames is 6 (as shown in FIG. 1C). Each cluster in the set of tree structured clusters is associated with a centroid. In an
15 embodiment of the present disclosure, the set of tree structured clusters includes
106 centroids in association with 106 clusters (C1 (105) - C 10(105)) when the pre-determined iteration level is 6. Each binary descriptor in each cluster of the set of tree structured clusters is optimally clustered for a lowest hamming distance among the fraction of the binary descriptors.
20 [0034] The advertisement detection system 104 assigns the weight value to each
clustered binary descriptor in the set of tree structured clusters. Each weight value is evaluated based on optimal adjustment of the term frequency and the inverse document frequency. Further, each weight value is L1 normalized such that the evaluated arithmetic sum of weight values is 1. The binary descriptors
25 and associated L1 normalized weight values in the set of tree structured clusters
are optimally stored in a distributed chain of structured and sorted tables as a second vocabulary of binary descriptors. The second vocabulary of the binary descriptors is stored in the second vocabulary database 104b. Furthermore, the
16

distributed chain of tables includes one or more descriptor tables and one or more
look up tables. The one or more look up tables includes records pointing each
clusters and associated descriptors in each clusters of the set of tree structured
clusters. The second vocabulary includes the set of tree structured clusters of
5 binary descriptors as leaf nodes. Each leaf node is associated with normalized
weight values. The second vocabulary of the binary descriptors is characterized
as an n-ary tree data structure with the leaf nodes.
[0035] In an exemplary example, a large collection of televised video
advertisements, say 2000 are stored in the permanent non-volatile storage of the
10 advertisement detection system 104. Considering each ad to have an average
length of 15 seconds, the advertisement detection system 104 may create the comprehensive repository of ad frames from the 2000 ads. If each ad is run at 25 FPS for 15 seconds, then total number of frames for the ad will be 375 frames. The total number of frames for 2000 ads will be roughly 750000 frames for 25
15 FPS. Let us suppose that the advertisement detection system 104 extracts roughly
250000 prominent frames to create the comprehensive 250000 strong repository
of ad frames.
[0036] Referring to the aforementioned example, the advertisement detection
system 104 may extract, say 700 keypoints for each ad frame of the 250000 ad
20 frames in the repository of ad frames. The total number of keypoints for 250000
ad frames is 175 million keypoints. The advertisement detection system 104 derives 175 million binary descriptors against the 175 million keypoints. The advertisement detection system 104 performs iterative clustering on the 175 million binary descriptors. In first iteration, the first set of clusters is obtained.
25 Each set in the first set of clusters includes a fraction of 175 million descriptors.
Let us assume that the 1st cluster, 2nd cluster, 3rd cluster, 4th cluster, 5th cluster, 6th cluster, 7th cluster, 8th cluster, 9th cluster and 10th cluster includes 20, 22, 30, 25, 23, 22, 8, 4, 12, 7 million binary descriptors respectively. Each cluster is obtained
17

on the basis of minimum inter-binary descriptor hamming distance. Let us take
six 8 bit binary numbers as an example for six 256 bit binary descriptors. The 8
bit descriptors are 01101010, 01011100 and 10101010 respectively. The
hamming distance between the first descriptor and the second descriptor, the
5 second descriptor and the third descriptor and the third and fourth descriptor is 4,
6 and 2. The advertisement detection system 104 segregates and clusters the
binary descriptors with hamming distance of 4, 6 and 2 respectively.
[0037] In second iteration, each cluster in the first set of cluster is again
clustered into 10 different clusters. The total number of clusters obtained after
10 second iteration is 102. In third iteration, each cluster in 102 clusters is again
clustered to obtain 103 clusters. In fourth iteration and fifth iteration, each cluster in the 103 and 104 clusters is again clustered. The iterative clustering is done till sixth iteration and a total of 106 clusters are obtained. Each cluster containing approximately 1 million binary descriptors. 106 clusters are stored and structured
15 as the n-ary tree data structure with each cluster at each leaf node of the n-ary
tree. The advertisement detection system 104 creates the special pyramid of the binary descriptors in the 106 clusters for validating spatial information of the keypoints. The tree structured clusters are stored as the second vocabulary in the second vocabulary database 104b.
20 [0038] The advertisement detection system 104 compares each spatially
identifiable binary descriptor in the first vocabulary with spatially identifiable binary descriptors in at least one or more clusters of the set of tree structured clusters. The set of tree structured clusters correspond to the second vocabulary of the binary descriptors of the repository of the ad frames. The advertisement
25 detection system 104 searches for a cluster in the set of tree structured clusters
that includes binary descriptors with minimum hamming distances with the binary descriptor in the first vocabulary of the selected frame in the live stream of the media content. Each binary descriptor from the first vocabulary of the binary
18

descriptors for the selected frame of the media content is matched with the binary
descriptor in the searched cluster for the minimum hamming distance. The binary
descriptors are matched frame by frame. The advertisement detection system 104
calculates a Bhattacharya distance between weight values of each matched pair of
5 the binary descriptors. The Bhattacharya distance is defined as square root of a
product of a first weight value and a second weight value. The advertisement
detection system 104 sums the calculated Bhattacharya distance for each matched
descriptor in the first vocabulary of the binary descriptors for the selected frame
of the media content to obtain a summed feature value. The summed feature
10 values lies in a normalized range of 0 to 1.
[0039] If Wi is the weight value for an ith binary descriptor in the first
vocabulary and W j is the weight value of the jth binary descriptor in the set of tree structured clusters, then the feature value (FV) is calculated as:
N FV = ∑ (W i * W j) 1/2 i, j =1
DBH (Bhattacharya Distance) = (W i * W j) 1/2
Where i and j denote an index of the binary descriptor in the first vocabulary and
15 the second vocabulary of binary descriptors respectively and N denotes the
number of the binary descriptors in the first vocabulary of the binary descriptors
and where:
[0040] For example, let us assume that the 700 binary descriptors in the first
vocabulary of the selected frame are matched with binary descriptors in one of the
20 clusters of the second vocabulary of the binary descriptors. Each pair of matched
descriptors in the first vocabulary of the binary descriptors and the second
vocabulary of the binary descriptors is characterized by the weight values. The
advertisement detection system 104 calculates the Bhattacharya distance between
19

the weight values of the matched pair by taking the square root of the product of
the weight values. The advertisement detection system 104 calculates the feature
value (FV) by summing the Bhattacharya distance between the matched pair of
the binary descriptors. The summed feature value lies in a range of 0 to 1.
5 [0041] The advertisement detection system 104 progressively scores each
selected frame across each live stream of the media content for validation of the selected frame as ad frame of the first ad. The first ad is assumed to be present in the live stream of the one or more broadcasted channels. Moreover, the first ad is broadcasted by the one or more broadcasted channels in one or more languages.
10 The one or more languages of the first ad are present as an audio stream in the
first ad. The advertisement detection system 104 adopts video processing techniques for supervised detection of the first ad and avoids the processing of language dependent audio stream of the first ad. The implementation of supervised video processing techniques facilitates the one or more languages
15 independent detection of the first ad across the live streams of the one or more
broadcasted channels.
[0042] The detection of the first ad in the live streams of the media content is
based on progressive evaluation of the summed feature values for each selected
frame in the set of frames of the media content. Each selected frame for the first
20 ad in the live streams of the media content is progressively scored in at least one
or more steps. The ad may be positively evaluated at any of the steps in the one or more steps. The one or more steps include a first step of comparison test. The advertisement detection system 104 compares the summed feature value for each selected frame in the set of frames per second of the media content with a first
25 threshold value in the comparison test. The summed feature value is compared
for progressive validation of the selected frame as the ad frame. In an embodiment of the present disclosure, the first threshold value for validation of the selected frame as ad frame is 0.02. The advertisement detection system 104
20

evaluates a ratio test between selected frames in the selected set of frames of the
media content. The ratio test is evaluated for determination of a degree of
difference in consecutively selected frames in the set of frames per second of the
media content. The ratio test is evaluated by dividing summed feature value for a
5 second frame by corresponding summed feature value for a first frame in the
selected set of frames. The advertisement detection system 104 rewards a first score for the first ad of the one or more ads in the live streams of the media content. The first ad is streamed in real time in the live stream of the media content. The first ad is rewarded by assigning a first ad score for the positive
10 validation of the evaluated ratio. The first score is assigned to the first ad score
when the evaluated ratio is less than a second threshold value. In an embodiment of the present disclosure, the first ad may be assigned a first ad score of 1.5 when the ratio of feature value for the second frame to that of the first frame is less than the second threshold value of 0.7. In another embodiment of the present
15 disclosure, the first ad may be assigned a different first ad score when the ratio of
feature value for the second frame to that of the first frame is less than a different
second threshold value.
[0043] For example, a first frame (A) may have a feature value of 0.11 and a
second frame (B) may have a feature value of 0.09. The ratio of the feature value
20 for the second frame (B) to that of the first frame (A) is 0.07/0.11 or 0.636. The
ratio test evaluates to positive evaluation as the evaluated ratio is less than 0.7.
The first ad is assigned a first ad score of 1.5 for the first frame.
[0044] The advertisement detection system 104 penalizes a second ad score for a
second ad of the one or more ads of the media content by deducting a second
25 score from the assigned first second ad score for the second ad. The second ad is
a past ad streamed before the first ad. As the first ad is progressively detected in the live stream with progressive additions of score, the second ad is progressively devaluated with progressive deductions of the second score from the second ad
21

score of the second ad. In an embodiment of the present disclosure, the second
score as penalty to the second ad score of the second ad is 0.5. In another
embodiment of the present disclosure, the second score as penalty to the second
ad score of the second ad is more or less than 0.5.
5 [0045] The advertisement detection system 104 performs an equality test for
consecutive frames in the selected set of frames of the media content. The equality test is based on equality criterion between the feature value of the first frame and the feature value of the second frame in the selected set of frames. The advertisement detection system 104 rewards the first ad score for the first ad of
10 the one or more ads in the live stream of the media content with additions of a
third score to the first ad score. The first ad score is rewarded with the third score when the feature value of the first frame is equal to the feature value of the second frame in the selected set of frames. In an embodiment of the present disclosure, the third score as reward to the first ad score of the first ad is 0.5. In another
15 embodiment of the present disclosure, the third score as reward to the first ad
score of the first ad is more or less than 0.5.
[0046] The advertisement detection system 104 performs a vicinity test for
ascertaining neighboring frames as the ad frames of the first ad. The vicinity test
is needed to robustly detect video ads of different duration. The vicinity test
20 makes the detection length agnostic. The advertisement detection system 104
rewards the first ad score for the first ad of the one or more ads in the live stream of the media content with addition of a fourth score to the first ad score. The fourth score is rewarded on vicinity criterion and the vicinity criterion is based on successive positive validation of the first score for the first frame and the second
25 frame of the set of frames. In an embodiment of the present disclosure, the fourth
score as reward to the first ad score of the first ad is 0.5. In another embodiment of the present disclosure, the fourth score as reward to the first ad score of the first ad is more or less than 0.5. The advertisement detection system 104
22

calculates a progressive ad score for the first ad and the second ad. The
progressive score is calculated on the basis of at least one of progressive addition
and subtraction of the second score, the third score and the fourth score to the first
ad score. The first ad is progressively scored for each positively validated frame
5 to obtain the progressive ad score. The progressive score is calculated for the ad
length independent detection of the first ad. For example, the first ad may be broadcasted in a first live stream of a first broadcasted channel for 10 seconds and in a second live stream of a second broadcasted channel for 15 seconds. The 10 seconds and 15 seconds version of the first ad differ by few frames. The
10 advertisement detection system 104 uses vicinity test to evaluate whether the
selectively processed frames in the vicinity of the positively tested ad frames are related to the same first ad. The advertisement detection system 104 facilitates the ad length agnostic detection of the first ad across the one or more broadcast channels.
15 [0047] For example, let us suppose a first frame (F1) of the first ad is positively
evaluated. The first ad may have an ad score of 5.5. The subsequent frames (F1 and F2) are evaluated on the basis of the vicinity criterion. The vicinity criterion is positively validated when consecutive frames in the neighborhood of the first frame (F1) are positively validated. The ad score is increased by 0.5 for each
20 positive validation. The advertisement detection system 104 calculates the
progressive score for the first ad as the sum of 3 and 0.5 as 6. The progressive score may be analyzed for a specific duration of the live stream of the media content. If the evaluated score is greater than or equal to a specified threshold score of, say 6, then the ad is said to be positively detected.
25 [0048] The advertisement detection system 104 detects the first ad in the live
streams of the media content when the progressive score for the first ad is above a threshold score. In an embodiment of the present disclosure, the threshold score for the positive detection of the first ad is 6. In another embodiment of the
23

present disclosure, the threshold score for the positive detection of the first ad in
the live streams of the media content is more than 6. In yet another embodiment
of the present disclosure, the threshold score for the positive detection of the first
ad is greater than 1.5. In yet another embodiment of the present disclosure, the
5 threshold score for the positive detection of the first ad in the live streams of the
media content is based on any suitable range.
[0049] The advertisement detection system 104 detects the first ad in the live
streams of the media content in a detection period (DP) (as shown in FIG. 1D). The detection period (DP) is evaluated as equivalent of positive frame detection
10 for at least two consecutively selected frames per second. For example, let us
suppose that for selection of 3 frames from 25 frames per second, the advertisement detection system 104 positively evaluates the first two or the last two frames as the ad frames. Moreover, the time for reception of each frame of the 3 selected frames per second is 0.33 x 2, i.e. 0.66 seconds and therefore, the
15 detection period (DP) for detection of each ad in the live streams of the media
content is equivalent to reception of at least two frames of each ad. The detection
period lies in a range of 0.6 seconds to 1 second.
[0050] Referring to FIG. 1D, the pre-defined set of frames are illustratively
labeled with an index from 1 to 25 for a second of the media content from the
20 broadcasted channel. The advertisement detection system 104 selects 3 frames
from every 8th position of 1 second of the media content. The three selected frames are labeled 8, 16 and 24 for 8th frame, 16th frame and 24th frame. The detection period (DP) for positive detection of the first ad in the live streams of the media content is equivalent to selection time for at least two frames of the
25 media content. The illustrated delay between selections of every consecutively
selected frame is illustrated as 0.33 seconds. The detection period (DP) is at least twice of the selection time for the at least two frames. The detection period (DP) is illustrated as 0.66 seconds.
24

[0051] The advertisement detection system 104 may ascertain a length of the
first ad when the first ad score for the first ad gets penalized for a first time. The
advertisement detection system 104 stores the detected ad information and
associated metadata in the ad detection database 104d. The ad detection database
5 104d is updated with positive detection of the first ad detection in the live streams
of the media content in real time.
[0052] It may be noted that in FIG. 1A, FIG. 1B, FIG. 1C and FIG. 1D, the
advertisement detection system 104 detects the first ad in the media content of the one or more broadcasted channels; however, those skilled in the art would
10 appreciate that the advertisement detection system 104 detects different ads in the
media content of the one or more broadcasted channels in the real time.
[0053] FIG. 2 illustrates a block diagram 200 of the advertisement detection
system 104, in accordance with various embodiments of the present disclosure.
The block diagram 200 describes the advertisement detection system 104. The
15 advertisement detection system 104 hardware, channel, language and ad length
agnostic real time detection of multi-lingual televised advertisements across the live streams of media content of one or more broadcasted channels. The live stream of the media content for each broadcasted channel is received and processed through implementation of computer vision, feature extraction;
20 statistical learning techniques. The supervised detection of televised video
advertisements is facilitated by real time collection of features and development of statistical models. The media content in the live stream of each broadcasted channel is selectively processed frame by frame and robust statistical learning models are developed over the selectively processed frames of the media content.
25 The robust statistical learning models are parametrically analyzed and prominent
and prospective features are classified and compared for an improvement in time and error rate of detection of the televised video ads.
25

[0054] The block diagram 200 of the advertisement detection system 104
includes a reception module 202, a selection module 204, an extraction module
206, a deriving module 208 and an assignment module 210. Moreover, the block
diagram 200 includes a creation module 212, a normalization module 214, an
5 obtaining module 216, an access module 218, a comparison module 220, a
scoring module 222 and a detection module 224.
[0055] The reception module 202 receives the live stream of the media content
from the broadcasted channel. The live stream of the media content includes of a live ad stream and a non-ad stream. The live ad stream includes the one or more
10 ads (as discussed in detailed description of FIG. 1A). Further, the selection
module 204 selects the set of frames per second from the pre-defined set of frames in each second of the live stream of the media content. Each selected frame in the selected set of frames differs by a frame gap of at least 8 frames. In an example, the set of frames is 3 per second from a 25 frames per second of the
15 live stream of the media content (as discussed in the detailed description of FIG.
1A).
[0056] The extraction module 206 extracts the pre-defined number of keypoints
from each selected frame. Each extracted keypoint is characterized by a spatial
position in selected frame. The pre-defined number of key points is extracted
20 based on validation of at least one of scale invariance criterion, a rotation
invariance criterion and Harris score criterion. In an embodiment of the present disclosure, the pre-defined number of keypoints is 700 (as discussed in detailed description of FIG. 1A). The deriving module 208 derives the pre-defined number of binary descriptors from pre-defined number of extracted keypoints.
25 Each binary descriptor is characterized by a binary string with a length of 256 bits
(as discussed in detailed description of FIG. 1A). The assignment module 210 assigns the weight value to each binary descriptor of the pre-defined number of binary descriptors. The weight value for each binary descriptor is L1 normalized.
26

Each normalized weight value for each binary descriptor is characterized by an
arithmetic sum of 1 (as discussed in detailed description of FIG. 1A).
[0057] The creation module 212 creates the special pyramid of the pre-defined
number of derived descriptors to obtain the pre-defined number of spatially
5 identifiable binary descriptors. The special pyramid of the pre-defined number of
binary descriptors includes one or more copies of derived binary descriptors. Each binary descriptor in the special pyramid is tagged with an identifier for spatially segregating and validating position of each binary descriptor in the selected frame (as discussed in the detailed description of FIG. 1A). The
10 normalization module 214 normalizes each weight value for each binary
descriptor in the special pyramid of the pre-defined number of spatially identifiable binary descriptors. Each weight value in the special pyramid is L1 normalized such that the arithmetic sum of the weight values is 1 (as discussed in the detailed description of FIG. 1A).
15 [0058] The obtaining module 216 obtains the first vocabulary of binary
descriptors of each selected frame. The first vocabulary of binary descriptors corresponds to the pre-defined number of spatially identifiable binary descriptors (as discussed in the detailed description of FIG. 1A). The access module 218 accesses the second vocabulary of binary descriptors from the comprehensive set
20 of televised advertisements. The second vocabulary of binary descriptors
corresponds to the curated comprehensive repository of the ad frames. The second vocabulary of the binary descriptors is characterized by an n-ary tree data structure with leaf nodes. The second vocabulary includes the set of tree structured clusters of binary descriptors as the leaf nodes and associated with
25 weight values. The second vocabulary of the binary descriptors is created from
extraction of the pre-defined number of keypoints and the binary descriptors from each frame of the repository of ad frames. The special pyramid of the descriptors is created for each ad frame to obtain the pre-defined number of spatially
27

identifiable binary descriptors. The binary descriptors are clustered into the first set of clusters. The binary descriptors are clustered into the first set of clusters based on the evaluation of minimum hamming distance between each binary descriptor. The binary descriptors in each cluster of the first set of clusters are iteratively clustered. Each subsequent cluster is clustered for the pre-determined iteration level to obtain the set of tree structured clusters. In an embodiment of the present disclosure, the pre-determined iteration level is 6. The weight value, the term frequency and the inverse document frequency are assigned to each clustered binary descriptor and each weight value is normalized using LI normalization such that the evaluated arithmetic sum of weight values is 1 (as discussed in the detailed description of FIG. 1A).
[0059] The comparison module 220 compares each spatially identifiable binary
descriptor from the first vocabulary of each selected frame with the plurality of spatially identifiable binary descriptors in the at least one or more clusters. The one or more clusters are searched from the set of tree structured clusters. The set of tree structured clusters corresponds to the second vocabulary of the binary descriptors of the repository of the ad frames. The binary descriptors in the first vocabulary are compared with the second vocabulary (as described in the detailed description of FIG. 1A).
[0060] The scoring module 222 progressively scores each selected frame from
the live stream of media content for the validation of the selected frame as the ad frame of the first ad. The first ad is progressively scored for each positively validated frame to obtain the progressive ad score (as described in the detailed description of FIG. 1 A). The detection module 224 detects the first ad in the live stream of the media content when the progressive score for the first ad is above a threshold score. In an example, the threshold score for positive detection of the first ad in the live stream of the media content is 6 (as described in the detailed description of FIG. 1A).

[0061] FIG. 3 illustrates a block diagram of a computing device 300, in
accordance with various embodiments of the present disclosure. The computing device 300 includes a bus 302 that directly or indirectly couples the following devices: memory 304, one or more processors 306, one or more presentation components 308, one or more input/output (I/O) ports 310, one or more input/output components 312, and an illustrative power supply 314. The bus 302 represents what may be one or more busses (such as an address bus, data bus, or combination thereof). Although the various blocks of FIG. 3 are shown with lines for the sake of clarity, in reality, delineating various components is not so clear, and metaphorically, the lines would more accurately be grey and fuzzy. For example, one may consider a presentation component such as a display device to be an I/O component. Also, processors have memory. The inventors recognize that such is the nature of the art, and reiterate that the diagram of FIG. 3 is merely illustrative of an exemplary computing device 300 that can be used in connection with one or more embodiments of the present invention. Distinction is not made between such categories as "workstation," "server," "laptop," "hand-held device," etc., as all are contemplated within the scope of FIG. 3 and reference to "computing device."
[0062] The computing device 300 typically includes a variety of computer-
readable media. The computer-readable media can be any available media that can be accessed by the computing device 300 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, the computer-readable media may comprise computer storage media and communication media. The computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. The computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash

memory or other memory technology, CD-ROM, digital versatile disks (DVD) or
other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk
storage or other magnetic storage devices, or any other medium which can be
used to store the desired information and which can be accessed by the computing
device 300. The communication media typically embodies computer-readable
instructions, data structures, program modules or other data in a modulated data
signal such as a carrier wave or other transport mechanism and includes any
information delivery media. The term "modulated data signal" means a signal
that has one or more of its characteristics set or changed in such a manner as to
encode information in the signal. By way of example, and not limitation,
communication media includes wired media such as a wired network or direct-
wired connection, and wireless media such as acoustic, RF, infrared and other
wireless media. Combinations of any of the above should also be included within
the scope of computer-readable media.
[0063] Memory 304 includes computer-storage media in the form of volatile
and/or nonvolatile memory. The memory 304 may be removable, non¬removable, or a combination thereof. Exemplary hardware devices include solid-state memory, hard drives, optical-disc drives, etc. The computing device 300 includes one or more processors that read data from various entities such as memory 304 or I/O components 312. The one or more presentation components 308 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc. The one or more I/O ports 310 allow the computing device 300 to be logically coupled to other devices including the one or more I/O components 312, some of which may be built in. Illustrative components include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.

[0064] The present disclosure has numerous disadvantages over the prior art.
The present disclosure provides ad length independent, hardware or platform independent, language independent and channel independent detection of advertisements. The present disclosure solves the problem of larger detection time of at least 7 seconds for a 15 second to 30 second ad. The present disclosure provides a statistical and parametric method for robust detection of televised ads across many broadcast channels in a very short duration of 0.6 seconds to 1 second against earlier detection time of at least 7 seconds. The detection is progressive with a very low error rate in false detection and computationally light weighted. The video ads are detected within reception of first two frames of the live stream. The detection is agnostic to platform, broadcast channel, receiving hardware, ad length and noise. The detection is supported by highly efficient database management, supervised machine learning and advanced custom computer vision techniques. The binary descriptors are selected after efficient consideration of feature redundancy, rotational invariance, translational invariance and Harris score. The use of validation criterion on the basis of progressively awarded and penalized ad scores with each positive and negative validation provides a robust means to ascertain whether the detection media is an ad video.

What is claimed is:
1. A computer system comprising: one or more processors; and
a memory coupled to the one or more processors, the memory for storing instructions which, when executed by the one or more processors, cause the one or more processors to perform a method for hardware, channel, language and ad length agnostic real time detection of multi-lingual televised advertisements across live streams of media content of one or more broadcasted channels, the method comprising:
selecting, at an advertisement detection system, a set of frames per second from a pre-defined set of frames in each second of each live stream of the media content in each broadcasted channel of the one or more broadcasted channels;
extracting, at the advertisement detection system, a pre-defined number of keypoints from each selected frame of the media content, the pre-defined number of keypoints being extracted based on validation of at least one of a scale invariance criterion, a rotation invariance criterion and a Harris score criterion;
deriving, at the advertisement detection system, a pre-defined number of binary descriptors from corresponding pre-defined number of extracted keypoints, each binary descriptor being characterized by a binary string with a length of 256 bits;
creating, at the advertisement detection system, a special pyramid of the pre-defined number of binary descriptors to obtain a pre-defined number of spatially identifiable binary descriptors, the special pyramid being created for obtaining a first vocabulary of binary descriptors corresponding to the pre¬defined number of spatially identifiable binary descriptors of each selected frame;

accessing, at the advertisement detection system, a second vocabulary of binary descriptors corresponding to a curated comprehensive repository of ad frames from a comprehensive set of televised advertisements, wherein the second vocabulary comprises a set of tree structured clusters of binary descriptors;
comparing, at the advertisement detection system, each spatially identifiable binary descriptor from the pre-defined number of binary descriptors corresponding to the first vocabulary of each selected frame with a plurality of spatially identifiable binary descriptors in at least one or more clusters of the set of tree structured clusters corresponding to the second vocabulary of the binary descriptors of the repository of the ad frames;
progressively scoring, at the advertisement detection system, each selected frame from each live stream of the media content for validation of the selected frame as the ad frame of a first ad, wherein the first ad being progressively scored for each positively validated frame to obtain a progressive ad score and wherein the progressive score being calculated for the ad length independent detection of the first ad; and
detecting, at the advertisement detection system, the first ad in each live stream of the media content of the one or more broadcasted channels when the progressive score for the first ad being above a threshold score, wherein the first ad being broadcasted by the one or more broadcasted channels in one or more languages and wherein the first ad being detected independent of the one or more languages of the first ad broadcasting in the one or more broadcasted channels. 2. The computer system as recited in claim 1, further comprising assigning, at the advertisement detection system, a weight value to each binary descriptor of the pre-defined number of binary descriptors, wherein the weight value corresponding to each binary descriptor being LI normalized and wherein each

normalized weight value corresponding to each binary descriptor being characterized by an arithmetic sum of 1.
3. The computer system as recited in claim 1, wherein the first ad in each live stream of the media content being detected in a detection period, wherein the detection period being in a range of 0.6 second to 1 second and wherein the detection period being equivalent to receiving of two frames of the first ad.
4. The computer system as recited in claim 1, wherein the binary descriptors in the first vocabulary being compared with the binary descriptors in the second vocabulary by:
searching for a cluster in the set of tree structured clusters comprising of the binary descriptors with minimum hamming distances with the corresponding binary descriptors of the selected frame in corresponding live stream of the media content;
matching each binary descriptor from the first vocabulary of the binary descriptors for the selected frame of the media content with the binary descriptor in the searched cluster for the minimum hamming distance;
calculating Bhattacharya distance between weight values of each matched pair of the binary descriptors; and
summing calculated Bhattacharya distance for each descriptor in the first vocabulary of the binary descriptors corresponding to the selected frame of the media content to obtain a summed feature value, wherein the summed feature values being in a normalized range of 0 to 1.
5. The computer system as recited in claim 1, further comprising receiving, at the advertisement detection system, the live stream of the media content of the broadcasted channel, the live stream of the media content comprises of a live ad stream and a non-ad stream and the live ad stream comprises one or more ads.
6. The computer system as recited in claim 1, wherein each selected frame in the selected set of frames differ by a frame gap of at least 8 frames, wherein the set

of frames selected from the pre-defined set of frames in the live stream of media content per second being 3 and the pre-defined set of frames per second in the live stream being 25 and wherein the pre-defined number of keypoints being 700, the pre-defined number of descriptors being 700 when the pre-defined number of keypoints being 700 and the pre-defined number of spatially identifiable binary descriptors being 2100.
7. The computer system as recited in claim 1, wherein the second vocabulary of the
binary descriptors being characterized by an n-ary tree data structure comprising
of leaf nodes, the second vocabulary of the binary descriptors being created by:
extracting the pre-defined number of keypoints and corresponding binary descriptors from each frame of the repository of the ad frames;
creating the special pyramid of the descriptors for each ad frame to obtain the pre-defined number of spatially identifiable binary descriptors;
clustering the binary descriptors into a first set of clusters, the binary descriptors being clustered into the first set of clusters based on an evaluation of a minimum hamming distance between each binary descriptor;
iteratively clustering the binary descriptors in each cluster of the first set of clusters and each subsequent cluster for a pre-determined iteration level to obtain a set of tree structured clusters; and
assigning a weight value to each clustered binary descriptor based on a term frequency and an inverse document frequency and normalizing the weight values using LI normalization, the weight value being normalized for an evaluated arithmetic sum of weight values as 1.
8. The computer system as recited in claim 7, wherein each cluster in the first set of
clusters and the set of tree structured clusters being associated with a centroid,
wherein the first set of cluster comprises 10 clusters and corresponding 10
centroids and the second set of tree structured clusters comprises 106 clusters and

106 centroids and wherein the pre-determined iteration level for clustering binary descriptors corresponding to each frame of the repository of ad frames being 6. 9. The computer system as recited in claim 1, wherein each selected frame for the first ad in the corresponding live stream of the media content being progressively scored in at least one or more steps, the one or more steps comprises:
comparing the summed feature value for each selected frame with a first threshold value for validating the selected frame as the ad frame of the first ad;
evaluating a ratio test for determining degree of difference between each selected frame in the selected set of frames in the live stream of the media content, wherein the ratio test being evaluated by dividing the summed feature value for a second frame by the corresponding summed feature value for a first frame in the selected set of frames;
rewarding the first ad of one or more ads in the live stream of the media content by assigning a first ad score for a positive validation of the evaluated ratio, wherein the first ad score being assigned to the first ad when the ratio being less than a second threshold value;
penalizing a second ad of the one or more ads in the corresponding live stream of the media content by deducting a second score from the assigned first ad score for the second ad, wherein the second ad being a past ad streamed before the first ad and wherein the first ad being streamed in real time in the corresponding live stream of the media content;
rewarding the first ad of the one or more ads in the corresponding live stream of the media content by adding a third score to the first ad score of the first ad, wherein the third score being rewarded based on an equality criterion and wherein the equality criterion being based on equality of the feature value of the first frame and the feature value of the second frame in the selected set of frames;

rewarding the first ad of the one or more ads in the corresponding live stream of the media content by adding a fourth score to the first ad score of the first ad, wherein the fourth score being rewarded based on a vicinity criterion and wherein the vicinity criterion being based on successive positive validation of the selected set of frames; and
calculating the progressive ad score for the first ad and the second ad based on at least one of progressive addition and subtraction of the second score, the third score and the fourth score from the first ad score. 10. The computer system as recited in claim 9, wherein the first threshold value for validating comparison of the summed feature value being 0.02 and the second threshold value for validation of the evaluated ratio being 0.7 and wherein the first ad score for positive validation by the ratio test being 1.5, the second score for penalizing the second ad being 0.5, the third score for the equality criterion being 0.5 and the fourth score for the vicinity criterion being 0.5.

Documents

Application Documents

#	Name	Date
1	Power of Attorney [07-06-2017(online)].pdf	2017-06-07
2	Form 5 [07-06-2017(online)].pdf	2017-06-07
3	Form 3 [07-06-2017(online)].pdf	2017-06-07
4	Form 20 [07-06-2017(online)].jpg	2017-06-07
5	Form 1 [07-06-2017(online)].pdf	2017-06-07
6	Drawing [07-06-2017(online)].pdf	2017-06-07
7	Description(Complete) [07-06-2017(online)].pdf_30.pdf	2017-06-07
8	Description(Complete) [07-06-2017(online)].pdf	2017-06-07
9	abstract.jpg	2017-07-13
10	201711019935-Proof of Right (MANDATORY) [13-07-2017(online)].pdf	2017-07-13
11	201711019935-FORM-26 [13-07-2017(online)].pdf	2017-07-13
12	201711019935-Power of Attorney-140717.pdf	2017-07-24
13	201711019935-OTHERS-140717.pdf	2017-07-24
14	201711019935-Correspondence-140717.pdf	2017-07-24
15	201711019935-FORM 18 [26-05-2021(online)].pdf	2021-05-26
16	201711019935-FER.pdf	2022-07-12
17	201711019935-OTHERS [11-01-2023(online)].pdf	2023-01-11
18	201711019935-FORM 3 [11-01-2023(online)].pdf	2023-01-11
19	201711019935-FER_SER_REPLY [11-01-2023(online)].pdf	2023-01-11
20	201711019935-ENDORSEMENT BY INVENTORS [11-01-2023(online)].pdf	2023-01-11
21	201711019935-DRAWING [11-01-2023(online)].pdf	2023-01-11
22	201711019935-CLAIMS [11-01-2023(online)].pdf	2023-01-11
23	201711019935-ABSTRACT [11-01-2023(online)].pdf	2023-01-11
24	201711019935-US(14)-HearingNotice-(HearingDate-03-09-2025).pdf	2025-08-08
25	201711019935-FORM-26 [08-08-2025(online)].pdf	2025-08-08
26	201711019935-Correspondence to notify the Controller [29-08-2025(online)].pdf	2025-08-29
27	201711019935-Written submissions and relevant documents [18-09-2025(online)].pdf	2025-09-18
28	201711019935-PETITION UNDER RULE 137 [18-09-2025(online)].pdf	2025-09-18

Search Strategy

1	SEARCHSTRATEGY-E_12-07-2022.pdf