A Bandwidth Efficient Method Of Dynamically Updating Region Of

< Back

A Bandwidth Efficient Method Of Dynamically Updating Region Of Interest Coordinates Of An Image During Video Streaming

Abstract: An apparatus and method of the present invention facilitates uninterrupted video chat and video surveillance applications wherein the region of interest feature of Motion JPEG 2000 is dynamically updated by allocating more bits to the facial characteristics of an image as compared to its background using enhanced face detection and light weight face tracking approach. Improved compression efficiency and subsequent effective bandwidth reduction for video transmission is achieved by embedding face movement and motion intelligence on Motion JPEG 2000 even under different lightening conditions and noisy environment.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

31 August 2010

Publication Number

23/2013

Publication Type

INA

Invention Field

COMMUNICATION

Status

Parent Application

Patent Number

Legal Status

Grant Date

2018-05-25

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

NIRMAL BUILDING,9TH FLOOR, NARIMAN POINT, MUMBAI-400021, MAHARASHTRA, INDIA.

Inventors

1. MANOJ CHATHANKULANGARA RAJAN

TATA CONSULTANCY SERVICES, DHARA BUILDING, SALARPURIA GR TECH PARK, NO.69/3 & 69/4, MAHADEVAPURA, BANGALORE-560066, KARANATAKA, INDIA.

2. SREEJAYA VISWANATHAN

TATA CONSULTANCY SERVICES, DHARA BUILDING, SALARPURIA GR TECH PARK, NO.69/3 & 69/4, MAHADEVAPURA, BANGALORE-560066, KARANATAKA, INDIA.

3. VAISHNAVI GOLLAPINNI

TATA CONSULTANCY SERVICES, DHARA BUILDING, SALARPURIA GR TECH PARK, NO.69/3 & 69/4, MAHADEVAPURA, BANGALORE-560066, KARANATAKA, INDIA.

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention: A METHOD AND APPARATUS FOR BANDWIDTH EFFICIENT VIDEO STREAMING
Applicant
TATA Consultancy Services A company Incorporated in India under The Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to
be performed.

Field of the Invention
The present invention is directed to a bandwidth efficient apparatus and method for uninterrupted video streaming and video surveillance that employs an enhanced face detection and light weight face tracking approach to dynamically update the region of interest coordinates of an image for subsequent encoding and transmission.,
Background of the Invention:
Face-to-face video communication is a potentially important component of real time communication systems. Inexpensive cameras connected to devices ranging from desktop computers to cell phones enable video conferencing in a variety of modes such as one-to-one and multi-party conferences. Previous face video compression techniques are not able to efficiently operate at low bandwidth conditions as the entirety of captured video frame is compressed and transmitted across the client server network. Thus, reducing the bandwidth without comprising on the quality of image is posed as a major challenge.
Different approaches have been proposed to reduce the bandwidth requirements for streaming video, such as the MPEG-4, H.26x video coding, Motion JPEG etc. However, none of the available techniques have been able to effectively utilize the bandwidth to the extent of uninterrupted video chat applications. Further, these techniques do not provide natural looking faces i.e. required quality of captured region of interest is not achieved. Similarly H.26x based coding techniques are fully automatic and robust, but are not efficient for low bit-rate face video since their generality does not take advantage of any face models. Therefore, what is needed is a system and method that can provide face-to-face video conferencing at very low bit rates with natural looking results. Additionally, this system and method should be able to provide face-to-face video conferencing in real time.The same is applicable for video surveillance applications in which the face of the person being monitored is very important like home security,retail and bank security etc.
It has further been found that adoption of MJPEG 2000 video codec though offers an intra frame coding format which offers wide range of advantages over the other intra coding formats like MJPEG in terms of compression efficiency and over inter frame coding methods (ike H.264, MPEG-4 etc in terms of video quality, error resilience, scalability etc but lacks in terms of compression efficiency as it is intra frame based

and hence is not a preferred standard for applications where bandwidth utilization is more important like video chat, video surveillance over low bandwidth networks etc.
US Patent 7634109 provides a method of digital image processing by identifying a group of pixels corresponding to an image of a face within the digital image. The image is then subjected to further enhancements in its orientation, color correction, fill flash simulation etc and simultaneously the location of the face in an image is tracked. However, no reference for the dynamic updation of region of interest coordinates of the portion of target image is made and the invention merely posits digital image processing technique using face detection information.
US Patent 7512283 projects a method of transmitting selected regions of interest of digital video data at selected resolutions. The method captures digital video, compresses it using JPEG 2000 frames and simultaneously extracts the compressed high resolution JPEG 2000 frames at a lower resolution. The invention however does not refer to the use of face detection and face tracking analytical approach for enhancing the compression efficiency of video codec like MJPEG 2000 while retaining and exploring the quintessential features of error resilience and quality image of the said video codec.
Thus, what is needed is a system and method to add effective compression efficiency feature to the sophisticated video standard and coding system of MJPEG 2000 in order to explore the other benefits offered by the system. Further improved image compression and networking capability without necessitating large bandwidth utilization is desirable to yield the best possible image quality.
Object of the Invention:
The primary objective of the present invention is to dynamically update the region of interest coordinates of a portion of an image for every video frame for uninterrupted video chat and video surveillance applications.
The other object of the present invention is to detect and track and dynamically update : the facial features of the portion of an image and not the entirety of image as the region of interest.

Another object of the present invention is to use robust face detection approach for detecting frontal faces.
It is an object of the present invention to adopt light weight face tracking approach which can track faces at different tits or orientations.
It is yet another object of the present invention to use Haar Cascade Classifier based algorithm popularly known as Viola Jones method for face detection.
Yet another object of the present invention is to employ face tracking analytical approach based on Continuously Adaptive Mean Shift Algorithm.
Another aspect of the present invention provides dynamic adjustment of size and angle to best fit the area with highest concentration of bright pixels in the face probability image.
Still another object of the present invention is to provide enhanced video quality, error resilience and scaiability in addition to increased compression efficiency for effective bandwidth utilization.
It is another object of the present invention to achieve effective bandwidth utilization even under improper lightening conditions and noisy environments.
Yet another object of the present invention is to effectuate increased bandwidth utilization even in low power embedded processors with low cost webcams.
It is further object of the present invention to enhance the bandwidth efficiency of the system by involving motion intelligence to check for any substantial movement in the non facial regions of the image.
One of the objects of the present invention is to elevate the compression efficiency of the system by allocating more bits to the target region of interest portion as compared to the background against it using the region of interest feature of MJPEG2000.
Another object of the present invention is to exploit standard features of MJPEG 2000 like low latency, error resilience for encoding the image.

According to another aspect of the present invention, flexibility to dynamically adjust the color and frequency components of background based on the lightening of the background is provided.
One more object of the present invention provides a simplified and robust face detection and face tracking analytical approach to make it ideal for real time performances even on embedded device.
Yet another object of the invention is to enable video chat application in hand held mobile devices and even in low bandwidth networks like dial -up/broadband internet, GSM, GPRS, ISDN etc or video surveillance applications in low bandwidth networks.
Summary of the invention
The present invention relates to a processor implemented apparatus and a bandwidth efficient method of enhancing the "region of interest" feature of Motion JPEG 2000 by making it dynamic with face detecting and face tracking approach and addressing the issue of compression efficiency with intra frame coding methodology of MJPEG 2000 by face tracking and motion detection intelligence thus making it ideal for video chat and surveillance kind of applications in low bandwidth conditions thereby ensuring quality and error resilience. The simplicity of the face detection and tracking analytical approach capacitates the method to be suitably implemented on embedded processors delivering real time performance.
The method and the apparatus is particularly appropriate for video streaming applications like video chat, video surveillance etc where the persons face is of utmost importance in low bandwidth error prone networks. The major requirements of these networks are robust transmission, quality of video and efficient bandwidth utilization. The face detection analytical approach is based on Haar Cascade Classifier for detecting frontal faces while the face tracking analytical approach extracts its foundational feature from Continuously Adaptive Mean Shift Algorithm for tracking tilted faces. The combination of above two approaches dynamically shifts the region of interest feature every frame and if the shift in region of coordinates is above a certain threshold value, the particular frame is encoded using sophisticated MJPEG 2000 video codec and transmitted uninterruptedly to the client side.
Brief description of the accompanying drawings

The foregoing summary, as well as the following detailed description of preferred embodiments, are better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings example construction of the invention; however, the invention is not limited to the specific methods and system disclosed in the drawings:
Fig. 1 illustrates the block diagram of the embodiment of the bandwidth efficient system of the present invention.
Fig. 2 (a), (b) and (c) highlights working examples to support effectiveness of face detection and face tracking algorithm with low cost webcams under different noise and lightening- normal, good and bad conditions respectively
Detailed Description of the Invention:
Some embodiments of this invention, illustrating all its features, will now be discussed in detail.
The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items.
It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, the preferred, systems and methods are now described.
The present invention is directed to improve quality of video chat and video surveillance applications even under low bandwidth conditions for use in a plethora of application environments. The system and method involved to achieve uninterrupted video chat and video surveillance applications during low bandwidth conditions makes use of a less complex algorithm to give better user experience. More specifically, the invention provides an apparatus and method for efficient video streaming by making "region of interest" selection feature in Motion JPEG 2000 video codec dynamic by adopting a robust face detection and light weight face tracking approach. The invention

enables capturing, processing and transmission of encoded information to the client side. The region of interest aspect of the captured video frame is made dynamic as its coordinates gets updated every frame. This ensures that the bandwidth is efficiently managed and simultaneously the quality and error resilience of best quality is provided.
An exemplary block diagram of a network environment for the implementation of the video chat and surveillance method on the apparatus of the present invention is illustrated in Fig. 1, and generally designated as network 100. As would be appreciated by one skilled in the art, the topology of network 100 is determined by the layout of an installation environment. In other words, the cameras installed for capturing video frames, the interconnection of devices, the topology of network devices, the connection types etc is determined by the requirements of installation environment.
As shown, the network 100 includes one or more video cameras 101 to capture video frames that may be in operative communication with a server 102. The communication server 102 may serve as a central repository for acquired images obtained from video cameras 101 or in any one of a number of roles typically provided in a client server network environment. The information of captured video frames is processed in a sequentially arranged plurality of processing modules like face detection module 103, face tracking module 104, analysis module 105 and a video compression module 106 before the final image gets transmitted over the network.
Having described an environment for the implementation of the method on the apparatus of the present invention, the specific details of the system and the methodology adopted shall be discussed next. However, the features, use and novelty of the present invention may best be understood by considering an exemplary situation and instance in which the present system would be advantageous. Turning now to the details of the apparatus, a general purpose computing device having a processor readable code embodied thereon for executing programmable instructions associated with one or more processors to perform a method of image processing by updating region of interest coordinates of a portion of an image for each video frame in an input video frame sequence using face detection and face tracking approach is provided. The system further comprising of an video capture device for acquiring sequence of videos, a face detection and face tracking module; an analysis module and a Motion JPEG 2000 compression module in operative communication with processor

component to provide compression of video image within a client server communicating network.
The video capturing devices like video camera or a camcorder or CCTV cameras for surveillance operations or webcams or digital cameras produces a digital video stream or a video stream to be subsequently stored in the server for further processing. The image is then detected in a face detection module which employs face detection analytics to detect frontal face as the region of interest. After detection of frontal face, the image is transferred to face tracking module which is best suited for tracking tilted faces. The processed image now enters into an analysis module which determines the coding or encoding of the processed image by the chosen video codec.
The specific details of the operation of Motion JPEG 2000 106 video codec is outside the scope of the invention and will not be discussed in any great detail. Generally, JPEG 2000 is a wavelet-based image compression standard created by the Joint Photographic Experts Group committee to supersede their original discrete cosine transform-based (DCT) JPEG standard. JPEG 2000 can operate at higher compression ratios without generating the characteristic 'blocky and blurry" artifacts of the original DCT-based JPEG standard. It also allows more sophisticated progressive downloads or transfer of data. Motion JPEG 2000 is the extension of JPEG 2000 supporting video compression. MJPEG2000 offers an intra frame coding format and offers a wide range of advantages over other intra coding formats like MJPEG in terms of compression efficiency and over inter frame coding methods like H.264, MPEG-4 etc in terms of video quality, error resilience, scalability etc. However, it lacks in terms of compress/on efficiency as it is intra frame based and hence is not a preferred standard for applications where bandwidth utilization is more important like video chat, video surveillance over low bandwidth networks etc.
The compression module MJPEG2000 106 has a feature called Region of Interest (ROI) coding which helps to provide better quality for a particular region of the video as compared to other regions. This is done by allocating more bits for encoding the region of interest while providing less for the background. Since in many applications the face of the person is more important compared to the background, the invention here make use of the face tracking algorithm to allocate the face region as ROI in each video frame thus dynamically moving the ROI co-ordinates every video frame. The face tracking algorithm acts as the feed for scaling based ROI method prescribed in the

JPEG2000 standard. Scaling based ROI method is to shift the co-efficients so-that bits associated with the ROI is placed in higher bit-planes as compared to the bits associated with the background. Thus the face detection and tracker modules are incorporated to the Motion JPEG2000 encoder to dynamically allocate the face region of the person as ROI.
Referring to Fig. 1 the preferred embodiment of the present invention receives images in digital form where the images can be translated into a grid representation including multiple pixels wherein a "pixel" is a picture element or a basic unit of the composition of a digital image or any of the small discrete elements that together constitute an image. The image is now temporarily stored in a server repository for further enhancement. Next, the application employed is of detecting and isolating the facial features of a portion of an image from the rest of the image and determining the size and location of facial components relative to other portions of the image.
The captured image held in communicating server 102 enters the face detection module 103 wherein the "Face Detection" involves the art of isolating and detecting faces in a digital image and includes a process of determining whether a human face is present in an input image, and may include or is preferably used in combination with determining a position and/or other features, properties, parameters or values of parameters of the face within the input image. This is suitably employed to detect frontal faces and not tilted faces.
In one aspect of the invention face detection is done using Haar Cascade classifier using Adaboost learning as per Viola Jones Algorithm. This makes the detection process much more efficient when it is based on the detection of Haar like features that encode some information about the class to be detected. These Haar like features encode the existence of oriented contrasts between regions in the image. A set of these features can be used to encode the contrasts exhibited by a human face and their spacial relationships. Haar-like features are so called because they are computed similar to the coefficients in Haar wavelet transforms. The Haar features are square waves, which in 2 dimensions is a pair of rectangles - one light and one dark.
The presence of Haar feature is determined by subtracting the average dark - region pixel value from the average light region pixel value. If the difference is above a previously determined threshold value, the feature is said to be present. To select the

specific Haar feature to use and to set threshold levels a machine learning method called Adaboost is used. First, a classifier (namely a cascade of boosted classifiers working with Haar-like features) is trained with a few hundreds of sample views of a particular object (i.e., a face), called positive examples, that are scaled to the same size (say, 20x20), and negative examples - arbitrary images of the same size. After a classifier is trained, it can be applied to a region of interest (of the same size as used during the training) in an input image. The classifier outputs a "1" if the region is likely to show the object (i.e., face), and "0" otherwise.
Multiple weak classifiers are combined to create strong classifier. Adaboost selects a set of weak classifiers to combine and assigns a weight to each. The weighted combination is the strong classifier. Combinations of weak classifiers can achieve a good generalization performance with polynomial space- and time-complexity which have already been shown in the prior arts. The algorithm has been able to obtain combinations of weak classifiers with a very good generalization and fast training time on both the test problems and the real applications. The combined classifiers show a good scaling property which indibates efficiency in space-complexity. To search for the target facial feature in the whole image one can move the search window across the image and check every location using the classifier. The classifier is designed so that it can be easily "resized" in order to be able to find the objects of interest at different sizes, which is more efficient than resizing the image itself. So, to find an object of an unknown size in the image the scan procedure should be done several times at different scales.
For better understanding, the word "cascade" in the classifier name means that the resultant classifier consists of several simpler classifiers (stages) that are applied subsequently to a region of interest until at some stage the image is rejected or all the stages are passed. The word "boosted" means that the classifiers at every stage of the cascade are complex themselves and they are built out of basic classifiers using one of four different boosting techniques (weighted voting). The basic classifiers are decision-tree classifiers with at least 2 leaves. Haar-like features are the input to the basic classifiers. The feature used in a particular classifier is specified by its shape, position within the region of interest and the scale.
This is followed by combination of all weak classifiers as a filter chain that is especially efficient for the classification of image regions. The training is done with a sample set

of about 1000 positive and negative samples as discussed above. During use, if any of these filters fail to pass an image region, it is immediately classified as "Not face". When a filter passes an image region it goes to trie next filter in the chain, (mage regions that pass through all filters in the chain are classified as "Face". The face detector is created with thousands of samples of frontal faces which are used for detection. Thus the face detection module determines the location and sizes of facial features of an image in arbitrary digital images.
The limitation however with the above face detection module 103 is that it is suited only for the frontal faces and is not favorable for capturing tilted faces. Also, the process is computationally intensive and cannot be solely adaptive for real time performance. To overcome this limitation, soon after face detection, the image is transmitted to the face tracking module 104 which performs the task of tracking tilted face. The face tracking method is based on continuously adaptive Mean shift algorithm. It is lightweight and works well for even when the faces are tilted. In one of the preferred embodiments of the present invention, the following steps are implemented to track the facial features of the captured image.
The first step involves the process of creating a colored histogram to represent the target facial features as region of interest of the image. It shall however be noted that this step is performed only once. This is followed by the calculation of face probability for each pixel in the subsequent video frames holding the image, wherein the pixel is a picture element or a basic unit of the composition of a digital image or any of the small discrete elements that together constitute an image. The next significant step includes shifting of the location of the face rectangle in each frame keeping it centered over the area with the highest concentration of bright pixels in the face probability image. It finds the new location by starting at the previous location and computing the center of gravity of the face probability values within a rectangle. This is based on the algorithm called Mean shift. The Mean shift algorithm thus uses concept of color probability density and tracks target facial feature in captured video frame by matching color density. Then mean shift algorithm is applied to estimate color density and target location.
Next, the angle and size of face in each frame is evaluated. This phase computes the scale and orientation of face which best fits the face probability pixels inside the new rectangle computed in the previous step. As in each frame the size and angle are adjusted the method is called "Continuously adaptive Mean Shift". The face tracking

module 104 and method is simple and accurate to detect tilted faces and works on the color distribution to represent a face and then calculating the face probability and then tracking those pixels.
The above two face detection and face tracking approach is adopted to effectuate effective bandwidth utilization. The individual frames of videos captured from one or more cameras are encoded using Motion JPEG 2000 video codec 106. The encoding of the video frames is however subjected to further conditions and limitations as shown in Fig. 1. Once the image tracking is conducted, it is analyzed to determine whether the tracked image qualifies for encoding. This analysis is performed by the analysis module 105.
The angle of tilt of the face is estimated as is returned by the face tracking module 104. The analysis module holds the previously determined first and second threshold values which act as parameters to determine encoding of the image. Once the angle of tilt values is obtained, it is compared against the first threshold values contained within the analysis module 105. If the angle of tilt values is above a certain first threshold value as compared to the previous frame, it proceeds for subsequent encoding.
Another aspect of the invention presents a situation where the angle of tilt values is less than the previously determined first threshold values. In this case, the motion detection algorithm is used to check for any substantial movement in the non facial features of the portion of an image, say with the hands of a person. The motion detection is based on background subtraction algorithm which relies on color histograms, texture information, and successive division of rectangular image regions to model the background and detect the moiion. This is particularly important and useful from video surveillance point of view. The subtraction algorithm is executed following the below mentioned steps.
Firstly, the pixel colors of the two consecutively captured frames are compared in the analysis module 105. This is followed by replacing all the pixels with the same color, say preferably a white color. The color of ail the available pixels is analyzed. If the color of pixels is different, then those pixels are retained. The number of retained pixels is counted for comparison with the second threshold value contained within the analysis moduie 105. This give rise to two alternative situations: first, when the number of pixels is less than the previously determined second threshold value and second, when the

number of pixels is more than the previously determined second threshold value. The analysis module 105 accordingly det ermines that when the number of pixels with different colors is above the second threshold value, then the motion is said to be detected.
The analysis module 105 hence decides whether the particular frame should be encoded or not. If the angle of tilt is above the first threshold value or if the number of different color pixels is above the second threshold value, then the frame is allowed to be encoded by the compression module 106. However, if none of the above said conditions are met, then the frame is not encoded and an indication is sent to the receiver side to display the previously decoded frame. Consequently, the process is repeated with the capturing of next frame which is subjected to previously disclose processing before getting finally encoded.
The encoding of the target image portions is enabled using Motion JPEG 2000, an image compression standard and coding system. The bandwidth efficiency of Motion JPEG 2000 is enhanced by bringing motion intelligence technique, the lack of which is one of the drawbacks of intra frame coding standards like MJPEG or MJPEG 2000. The compression gains are attributed to the use of Discrete Wavelet Transform and a more sophisticated entropy encoding scheme. Further, both lossless and lossy compressions are provided in single compression architecture. The compression efficiency of video frames which are selected for encoding is further increased by allocating more bits to face region selected by the dynamic ROI using face tracking algorithm against the set background.
The above mentioned methods effectuate the RO! utilization without compromising on the quality of region of interest at the same time making use of the great features of MJPEG2000 like low latency, error resilience etc. In a nutshell, a bandwidth efficient method of dynamically updating region of interest coordinates of a portion of image for each video frame in an input video frame sequence using robust face detection and lightweight face tracking approach is provided, the said method comprising the processor implemented steps of:
a) capturing one or more images in a sequence of video frames for subsequent storage in server;

b) detecting the presence of specific Haar feature for distinguishing facial and non facial characteristics within the region of an image of a current video frame;
c) tracking the facial region of an image for allocating the facial region of an image as the region of interest for one of the captured video frames;
d) determining the angle of tilt of the facial region and comparing the obtained value with the previously determined first threshold value;
e) checking for any substantial movement within the non facial regions of an image against the previously determined second threshold value when the angle of tilt of the facial region is below the first threshold value;
f) encoding the image when either of the angle of tilt is above the first threshold value or the substantial movement of the non facial region is above the second threshold value using intra frame image compression and coding standard;
g) displaying and transmitting the encoded image or optionally directing to capture the video frame next to the currently captured frame and repeating the steps (a) to (g) when the current frame is not encoded.
Robust face detection and tracking algorithm is used to dynamically update the Region of Interest (ROI) for Motion JPEG2000 as disclosed in the present invention. The method is very suitably to be implemented in embedded systems with processor embodied on them. The devices can be any of webcams, camcorders, transcoders, image cameras or hand held mobile devices. For example, it can be implemented on embedded platforms like Intel Atom which can be used for handheld devices and applications. The invention also finds applicability in video chat applications even in low bandwidth networks like dial up/broadband internet, GSM, GPRS, ISDN etc. The methodology ensures that ROI co-ordinates are updated every frame and only if the shift in co-ordinates is above a certain threshold i.e if there is a sufficient movement in the face, then that particular frame is encoded and transmitted to the client side. This makes sure that the bandwidth is efficiently managed at the same time the quality and error resilience will be of best quality which is the major advantage of Motion JPEG2000. In one of the preferred embodiments of the present invention, based on the

lighting of the background the color and the high frequency components of the background are dynamically adjusted.
This method is ideal for video streaming applications where the persons face is of utmost importance like video chat, video surveillance etc in low bandwidth error prone networks The major requirements of these networks are robust transmission, quality of video and efficient bandwidth utilization. The usage of Motion JPEG2000 ensures the first two and the disadvantage of lesser compression efficiency is addressed by making use of dynamic ROI based compression.
The experimental records further demonstrates that with dynamic "region of interest" feature the bandwidth utifization is just less or equal to H.264 which is considered to be the best compression standard in terms of bandwidth utilization. At the same time this method eliminates the disadvantages of H.264 in terms of quality and the error due to packet losses in streaming media networks since Motion JPEG2000 is an intra frame algorithm and has a very robust error resilient feature. For video resolution of 720*480 size @ 30 fps (frames per second), bandwidth usage achieved is of the following order:
Motion JPEG: 15-20 Mbps;
MPEG2: 8 -10 Mbps;
MPEG4/H.264: 4-6 Mbps;
Motion JPEG2000 without ROI: 10-12 Mbps; while
The present invention achieves an exemplary bandwidth usage of about 3-4 Mbps. The bandwidth utilization is improved by allocating more bits for ROI region (face region) and the face orientation plus motion detection information to decide on which video frames need to be encoded. Thus uninterrupted video streaming for chat and surveillance along with efficient bandwidth utilization by exploiting ROI feature of Motion JPEG 2000 video codec without compromising on image quality forms the fundamental unique element of the invention.
In another aspect of the invention, the selection of simple and accurate analytical algorithms makes it suitable to be implemented in low power embedded processors with low cost webcams under different lightening conditions and noisy environments as

exemplified in Fig. 2(a), (b) and (c). The snapshots presented in Fig. 2 are taken by running the algorithm using Logitech C600USB webcam.
The invention enhances the ROI feature of MJPEG2000 by making it dynamic with face tracking and addresses the issue of compression efficiency with intra frame coding methodology of MJPEG2000 by face tracking and motion detection intelligence thus making it ideal for video chat and surveillance kind of applications in low bandwidth ensuring quality and error resilience. The simplicity of the algorithms selected for face tracking and motion detection makes sure that this method can be implemented on embedded processors delivering real time performance.
From the foregoing, it will be seen that this invention is one well adapted to attain all the ends and objects hereinabove set forth together with other advantages which are obvious and which are inherent to the method and apparatus. It will be understood that certain features and sub combinations are of utility and may be employed without reference to other features and sub combinations. This aspect is contemplated by and is within the scope of the claims. Since many possible embodiments of the invention may be made without departing from the scope thereof, it is also to be understood that all matters herein set forth or shown in the accompanying drawings are to be interpreted as illustrative and not limiting.

What is claimed is:
1) A bandwidth efficient method of dynamically updating region of interest coordinates of a portion of image for each video frame in an input video frame sequence using robust face detection and lightweight face tracking approach, the said method comprising the processor implemented steps of;
a) capturing one or more images in a sequence of video frames for subsequent storage in server;
b) detecting the presence of specific Haar feature for distinguishing facial and non facial characteristics within the region of an image of a current video frame;
c) tracking the facial region of an image for allocating the facial region of an image as the region of interest for one of the captured video frames;
d) determining the angle of tilt of the facial region and comparing the obtained value with the previously determined first threshold value;
e) checking for any substantial movement within the non facial regions of an image against the previously determined second threshold value when the angle of tilt of the facial region is below the first threshold value;
f) encoding the image when either of the angle of tilt is above the first threshold value or the substantial movement of the non facial region is above the second threshold value using intra frame image compression and coding standard;
g) displaying and transmitting the encoded image or optionally directing to capture the video frame next to the currently captured frame and repeating the steps (a) to (g) when the current frame is not encoded.
2) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 1, wherein the processor is embedded on one or more device as selected from the group of webcams, camcorders, transcoders, image cameras or hand held mobile devices.

3) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 1, wherein the presence of selected Haar features is detected by evaluating the difference between average illuminated region pixel value and dark region pixel value of the target image.
4) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 3, wherein the target Haar feature selected from the portion of an image is a facial feature.
5) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 3, wherein the selection of Haar feature is achieved by using Adaboost learning of Vioala Jones Algorithm.
6) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 5, wherein the detection of a facial feature is based on the ability of the image regions to pass through a chain of filters comprising weighted weak classifiers.
7) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 3, wherein the Haar feature is represented as any of the shapes selected from a group of shapes comprising a rectangle and a square.
8) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 1, wherein the Haar classifiers are employed for frontal face detection.
•
9) A bandwidth efficient method of dynamically updating region of interest coordinates as
claimed in claim 1, wherein tracking of the facial region is operated in the steps of:
inputting the target facial feature of the image and representing it by a colored histogram;
computing the face probability values for each pixel in the subsequent video frames;
computing the centre of gravity of the face probability values for relocating the target facial feature of the image over the area with the highest concentration of bright pixels in the face probability image and;

determining the scale and orientation of target facial feature of the image which best fits the face probability pixels inside the newly located area of the face probability image.
10) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 1,' wherein the angle of tilt is computed from the face tracker to compare the obtained value with the first threshold value.
11) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 1, wherein the face tracking approach is preferentially adopted for tilted faces.
12) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 9, wherein the frame of the captured image proceeds for encoding if the angle of tilt obtained is above the first threshold value.
13) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 1, wherein the movement of non facial features of the target image is tracked and compared with the second threshold value by employing motion detection technique.
14) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 11, wherein the movement of the non facial features of an image is estimated by the steps of:
inputting the sequence of captured video frames to compare their pixel colors;
substituting all the pixel colors with a uniquely identified single color and retaining the differently colored pixels,
comparing the number of retained pixels with a previously determined second threshold value and;
indicating the motion as detected if the number of retained pixels is above the second threshold value.
15) A bandwidth efficient method of dynamically updating region of interest coordinates as
claimed in claim 1, wherein the individual frames of the video captured are
compressed and encoded using Motion JPEG 2000.

16) A bandwidth efficient method of dynamically updating region of interest coordinates as
claimed in claim 15, wherein the compression efficiency of Motion JPEG 2000 can be
enhanced by
allocating more bits to target facial region of an image against the background of the image or
detecting the motion and movement of facial features by the motion tracking approach
or a combination thereof.
17) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 1, wherein the method achieves bandwidth compression to the extent of 3-4 Mbps for video resolution of 720*480 size at the rate of 30fps.
18) A bandwidth efficient method of dynamically updating region of interest coordinates as claimed in claim 1, wherein the method supports effective face detection and tracking mechanism even under low lightening conditions using low cost webcams.
19) A bandwidth efficient video streaming system comprising of a general purpose computing device with a processor readable code embodied thereon for executing programmable instructions within a client server communicating network, the said system further cc mprising of : a video capture device for acquiring sequence of videos; a face detection module; a face tracking module; analysis module and a Motion JPEG 2000 compression component in operative communication with processor component to provide compression of video images, wherein the system performs the steps:
a) on an apparatus including a processor configured for image processing,
capturing one or more images in a sequence of video frames for subsequent
storage in a server;
b) detecting the presence of specific Haar feature for distinguishing facial and
non facial characteristics within the region of an image of a current video
frame;
. c) tracking the facial region of an image for allocating the facial region of an image as the region of interest for one of the captured video frames;

d) determining the angle of tilt of the facial region and comparing the obtained value with the previously determined first threshold value;
e) checking for any substantial movement within the non facial regions of an image against the previously determined second threshold value when the angle of tilt of the facial region is below the first threshold value;
f) encoding the image when either of the angle of tilt is above the first threshold value or the substantial movement of the non facial region is above the second threshold value using intra frame image compression and coding standard;
g) displaying and transmitting the encoded image or optionally directing to capture the video frame next to the currently captured frame and repeating the steps (a) to (g) when the current frame is not encoded.

20) A bandwidth efficient video streaming system as claimed in claim 19, wherein the system is adapted to support video streaming applications like video chat or video surveillance in the low bandwidth networks.
21) A bandwidth efficient video streaming system as claimed in claim 19, wherein the system supports the method which can be suitably implemented on low power embedded processors in different lightening and noisy environment.
22) A bandwidth efficient video streaming system as claimed in claim 19, wherein the system achieves dynamic control of color and frequency of wavelets in the region of interest portion of the image.
23) A system and method substantially as herein described with reference to and as illustrated by the accompanying drawing.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	2424-MUM-2010-FORM 26(06-10-2010).pdf	2010-10-06
1	2424-MUM-2010-RELEVANT DOCUMENTS [25-09-2023(online)].pdf	2023-09-25
2	2424-MUM-2010-CORRESPONDENCE(06-10-2010).pdf	2010-10-06
2	2424-MUM-2010-RELEVANT DOCUMENTS [30-09-2022(online)].pdf	2022-09-30
3	OTHERS [10-05-2016(online)].pdf	2016-05-10
3	2424-MUM-2010-RELEVANT DOCUMENTS [23-09-2021(online)].pdf	2021-09-23
4	Examination Report Reply Recieved [10-05-2016(online)].pdf	2016-05-10
4	2424-MUM-2010-RELEVANT DOCUMENTS [31-03-2020(online)].pdf	2020-03-31
5	Description(Complete) [10-05-2016(online)].pdf	2016-05-10
5	2424-MUM-2010-RELEVANT DOCUMENTS [26-03-2019(online)].pdf	2019-03-26
6	Claims [10-05-2016(online)].pdf	2016-05-10
6	2424-mum-2010-abstract.pdf	2018-08-10
7	Abstract [10-05-2016(online)].pdf	2016-05-10
7	2424-mum-2010-claims.pdf	2018-08-10
8	2424-mum-2010-correspondence.pdf	2018-08-10
8	2424-MUM-2010-Correspondence to notify the Controller (Mandatory) [04-05-2018(online)].pdf	2018-05-04
9	2424-mum-2010-description(complete).pdf	2018-08-10
9	2424-MUM-2010-Written submissions and relevant documents (MANDATORY) [24-05-2018(online)].pdf	2018-05-24
10	2424-mum-2010-drawing.pdf	2018-08-10
10	2424-MUM-2010-PatentCertificate25-05-2018.pdf	2018-05-25
11	2424-mum-2010-form 1.pdf	2018-08-10
11	2424-MUM-2010-IntimationOfGrant25-05-2018.pdf	2018-05-25
12	2424-MUM-2010-FORM 18.pdf	2018-08-10
12	abstract1.jpg	2018-08-10
13	2424-mum-2010-form 2(title page).pdf	2018-08-10
13	2424-MUM-2010_EXAMREPORT.pdf	2018-08-10
14	2424-mum-2010-form 2.pdf	2018-08-10
14	2424-MUM-2010-HearingNoticeLetter.pdf	2018-08-10
15	2424-mum-2010-form 3.pdf	2018-08-10
16	2424-mum-2010-form 2.pdf	2018-08-10
16	2424-MUM-2010-HearingNoticeLetter.pdf	2018-08-10
17	2424-MUM-2010_EXAMREPORT.pdf	2018-08-10
17	2424-mum-2010-form 2(title page).pdf	2018-08-10
18	abstract1.jpg	2018-08-10
18	2424-MUM-2010-FORM 18.pdf	2018-08-10
19	2424-mum-2010-form 1.pdf	2018-08-10
19	2424-MUM-2010-IntimationOfGrant25-05-2018.pdf	2018-05-25
20	2424-mum-2010-drawing.pdf	2018-08-10
20	2424-MUM-2010-PatentCertificate25-05-2018.pdf	2018-05-25
21	2424-mum-2010-description(complete).pdf	2018-08-10
21	2424-MUM-2010-Written submissions and relevant documents (MANDATORY) [24-05-2018(online)].pdf	2018-05-24
22	2424-MUM-2010-Correspondence to notify the Controller (Mandatory) [04-05-2018(online)].pdf	2018-05-04
22	2424-mum-2010-correspondence.pdf	2018-08-10
23	2424-mum-2010-claims.pdf	2018-08-10
23	Abstract [10-05-2016(online)].pdf	2016-05-10
24	2424-mum-2010-abstract.pdf	2018-08-10
24	Claims [10-05-2016(online)].pdf	2016-05-10
25	Description(Complete) [10-05-2016(online)].pdf	2016-05-10
25	2424-MUM-2010-RELEVANT DOCUMENTS [26-03-2019(online)].pdf	2019-03-26
26	Examination Report Reply Recieved [10-05-2016(online)].pdf	2016-05-10
26	2424-MUM-2010-RELEVANT DOCUMENTS [31-03-2020(online)].pdf	2020-03-31
27	OTHERS [10-05-2016(online)].pdf	2016-05-10
27	2424-MUM-2010-RELEVANT DOCUMENTS [23-09-2021(online)].pdf	2021-09-23
28	2424-MUM-2010-RELEVANT DOCUMENTS [30-09-2022(online)].pdf	2022-09-30
28	2424-MUM-2010-CORRESPONDENCE(06-10-2010).pdf	2010-10-06
29	2424-MUM-2010-RELEVANT DOCUMENTS [25-09-2023(online)].pdf	2023-09-25
29	2424-MUM-2010-FORM 26(06-10-2010).pdf	2010-10-06