A Method For Video Conferencing At Ultra Low Bandwidth

Abstract: The invention aims at developing a new technique for low bandwidth video conferencing. It is based on the notion "transmit the changed only". This means only when there is considerable change in the scene that information needs to be transmitted. This is done by motion detection at the senders end and sending only the required frames. The system defines a local motion frequency function (LMFF) for tracking. This function is based on offline learning using machine learning techniques such as SVM, neural networks etc. The function is trained on when to give a scene change indication. When there is delta change, which is negligible from the perspective of human perception, this LMFF function will send a message\flag to the receiver stating that the earlier frame can be displayed again. Thus, in a preferred embodiment, the present invention provides a method for video conferencing at ultra low bandwidth comprising the steps of: capturing the first I-frame at senders end, storing the I-frame in a reference buffer and transmitting the frame to a receiver; detecting the motion of a next frame at senders end with the help of a local motion frequency function (IMFF) with respect to the reference frame stored in said reference buffer to determine if there is substantial change in scene; and when motion detected is beyond a threshold value, giving an indication for transmitting the changed scene to the receiver.

Patent Information

Application #

Filing Date

02 July 2009

Publication Number

1/2011

Publication Type

INA

Invention Field

ELECTRONICS

Status

Email

Parent Application

Applicants

SAMSUNG ELECTRONICS COMPANY LIMITED

416, MAETAN-DONG, YEONGTONG-GU, SUWON-SI, GYEONGGI-DO 442-742

Inventors

1. DHALL, ABHINAV

SAMSUNG ELECTRONICS COMPANY LIMITED. GROUND AND FIRST FLOOR, D-5, SECTOR 59, NOIDA

2. KUMAR, BRAJESH

SAMSUNG ELECTRONICS COMPANY LIMITED. GROUND AND FIRST FLOOR, D-5, SECTOR 59, NOIDA

3. SHARMA, NAVEEN

SAMSUNG ELECTRONICS COMPANY LIMITED. GROUND AND FIRST FLOOR, D-5, SECTOR 59, NOIDA

Specification

FIELD OF THE INVENTION
The present invention relates to a method for video conferencing at ultra low bandwidth. In particular the method of the invention requires minimum resources in terms of bandwidth and computation capability and the camera installed at the capturing ends is a basic camera. These requirements are similar to the military as well as surveillance applications. It should be used for medium to low resolution video conferencing since it makes use of local motion and eliminates extra frames by sending regneration request flags instead of predictive frames.
BACKGROUND OF THE INVENTION
Bandwidth utilization is a key factor in video conferencing applications. High video resolution of frames to be transmitted automatically applies higher bandwidth requirements. Therefore, a system with limited bandwidth available at its disposable trying to transmit high-resolution frames will experience delays, frame loss and jitteriness.

The system is targeted at low to medium range of resolutions since it skips some P (predictive) frames which may look very jittery on a high resolution system. On a low resoultion system the skipping from a frame N to frame N+l assuming minimal motion is not detected by the human eye.
Various techniques have been tried to reduce the bandwidth requirement such as better compression via new codec like as H.264 and H.263 etc.
Document WO/1999/057900 describes a video conferencing solution where the looks of the caller can be changed\improved. But there is no caller polishing system in market, which considers a major factor like geographical\racial criteria for applying the makeup.
The techniques defined in prior art are full systems which cater from data representation to compression and decompression.
There is therefore, a need to tap only the motion estimation module with the use of motion models which tells how much motion has been there in a series of frames instead of how much motion has been there in consecutive frames.

SUMMARY OF THE INVENTION
The invention aims at developing a new technique for low bandwidth video conferencing. It is based on the notion "transmit the changed only". This means only when there is considerable change in the scene that information needs to be transmitted. This is done by motion detection at the senders end and sending only the required frames. The system defines a local motion frequency function (LMFF) for tracking. This function is based on offline learning using machine learning techniques such as SVM, neural networks etc. The function is trained on when to give a scene change indication. When there is delta change, which is negligible from the perspective of human perception, this LMFF function will send a message\flag to the receiver stating that the earlier frame can be displayed again.
Thus, in a preferred embodiment, the present invention provides a method for video conferencing at ultra low bandwidth comprising the steps of: capturing the first I-frame at senders end, storing the I-frame in a reference buffer and transmitting the frame to a receiver; detecting the motion of a next frame at senders end with the help of a local motion frequency function (IMFF) with respect to the reference frame stored in said reference buffer to determine if there is substantial change in scene; and when motion detected is beyond a threshold value, giving an indication for transmitting the changed scene to the receiver.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The invention can now be described in detail with the help of the figures of the accompanying drawings in which
Figure 1 is a block diagram of the user settings for the invention.
Figure 2 is a flow diagram illustrating the method of present invention.
DETAILED DESCRIPTION
Operation of the invention, as illustrated in Figure 1, can now be described as follows.
Figure la depicts the transport stream when the system determines high motion and decides to send a new frame, which was marked as input or I-frame.
Figure lb depicts the next stage when the amount of motion as determined by the system is very less and the LMFF decides that there is no need to send a new frame and simply a flag RI (Repeat I frame) is sent in the transmission buffer. This means that the receiver will display the I frame again.

Figure lc depicts when there is some motion with respect to the motion being captured by the LMFF classifier and this motion is similar to the one, which was detected, in the earlier frame. Therefore, this frame can be marked as NF (New Frame), which means that it will be re generated from the past and future frame. This also considers the next I-frame.
Figure Id depicts the scenario when there has been a considerable motion change and the system marks it as the arrival of a new I frame.
As shown in the system of Figure 1, motion estimation is performed when a new frame is grabbed for a high amount motion above a particular threshold. The learning function LMFF captures the sequence. If this was not an I-frame then it will record the little amount of motion and send a RI flag when the amount of motion is below a threshold. Else it will send a NF or new frame if the amount of motion is higher than a particular threshold.
Operation of the Invention
The transmission buffer sends I frames and a few flags. RI means Repeat I frame. This flag indicates to display the previous I frame. The system defines a function LMFF (Local Motion Frequency Function) which tracks little motion for example lip movement.

This function tries to find out a sequence of small motion for example lip opening and closing and hence adds a RI or NF flag. NF stands for new flag which means that this frame will be generated at the receiver's end using the previous I and future frame.
The system is divided into two parts, first at the sender's end and the second at the receiver's end. These two parts work in tandem.
Sender's end
1) At the sender's end the camera captures the first frame, stores it in a reference buffer and transmits the frame.
2) A motion detection module at the sender tries to detect the motion with respect to the reference frame stored in step (1).
3) If motion is detected, the I frame in the reference buffer is overwritten with the current frame and transmits also (as in Figure 1 a).
4) If no motion is detected in step (2) the residual value from the step (2) is processed under a local motion frequency function (LMFF).

5) The LMFF captures the presence of motion in a very small part of the picture and the transmission buffer is concatenated with the flag RI (Repeat I). RI refers to the 1-frame in the reference buffer. As depicted in Figure 1 b. NF implied regeneration of the frame from earlier and next frame as depicted in figure 1 c. This means that as there was very less motion, for example, if a speaker is speaking the only movement is the lips, then from frame N to frame N+l there is very less motion in terms of human visual perception. This is depicted in Figure lc.
6) Now if a similar trend in terms of local motion is observed by LMFF for two consecutive frames. The transmission buffer is appended with NF (New Frame). The payload in the transmission buffer is sent to receiver's end after compression.
7) The system can also adapt to lower bandwidth available by sending down scaled frames.
Receiver's end
1) At the receiver's end the system scans the buffer received.
2) It decodes the I frame as it is and displays it.

3) If it encounters RI tag, it displays the previous I frame.
4) If NF is encountered, the earlier I frame and the preceding I frame are used to generate a new frame.
5) If any frame is lost the system tries to generate the frame from earlier and next frame and if the next frame is not available the earlier I frame is displayed.
6) If the frames had been downscaled at the sender's end they are zoomed with scaling before they are displayed.
The receiver system will keep on displaying the same frame without the sender transmitting again and again, in case there is no motion or motion, which is negligible to the human eye. The audio part is transmitted and played in its usual manner. Now consider a lip opening and closing sequence. When the lips are full open (reference I frame) the LMFF tracks small motion in comparison to the next frame and marks RI in its place because that motion is not visible to human eyes. If a similar trend continues when the next frame is compared the LMFF function marks it as NF. This means that it will be generated from the lip open and lip closed frames (future I frame).

The effects and advantages of the present invention are:
1) Very low bandwidth requirement.
2) Can be used in case of low to moderate video resolution.
3) Use of local motion learning models.
4) Computationally inexpensive.
5) Runs in real time in the current DTV\PC\Mobile video conferencing setup.

WE CLAIM
1. A method for video conferencing at ultra low bandwidth comprising the steps of:
capturing the first I-frame at senders end, storing the I-frame in a reference buffer and transmitting the frame to a receiver;
detecting the motion of a next frame at senders end with the help of a local motion frequency function (IMFF) with respect to the reference frame stored in said reference buffer to determine if there is substantial change in scene; and
when motion detected is beyond a threshold value, giving an indication for transmitting the changed scene to the receiver.
2. The method as claimed in claim 1, wherein said method is carried out for medium to low video conferencing.
3. The method as claimed in claim 1, wherein said LMFF function is based on offline learning using machine learning techniques such as SVM, neural networks, etc.

4. The method as claimed in claim 1, wherein a repeat I-frame (RI) flag is sent in the transmission buffer when very little amount of motion is detected by the LMFF function and the receiver will display the I-frame again.
5. The method as claimed in claim 1, wherein a frame from an earlier and from a next frame can be regenerated when there is little motion, like movement of lips of a speaker.
6. The method as claimed in claim 5, wherein the transmission buffer is appended with NEW FRAME (NF) when a similar trend in terms of local motion is observed by LMFF for two consecutive frames.
7. The method as claimed in claim 1, wherein for adapting to the lower bandwidth available in the system, down scaled frames can be sent from the senders end, which can be zoomed with scaling before they are displayed at the receivers end.
8. The method as claimed in claim 1, wherein the receiver can keep displaying the same frame without the sender transmitting. The frames again and again, when there is no motion or motion negligible to the human eye.

9. The method as claimed in claim 8, wherein the audio part can be transmitted and played in its usual manner while the same frame is being displayed.
10.A method for video conferencing at ultra low bandwidth, substantially as herein described and illustrated in the figures of the accompanying drawings.

The invention aims at developing a new technique for low bandwidth video conferencing. It is based on the notion "transmit the changed only". This means only when there is considerable change in the scene that information needs to be transmitted. This is done by motion detection at the senders end and sending only the required frames. The system defines a local motion frequency function (LMFF) for tracking. This function is based on offline learning using machine learning techniques such as SVM, neural networks etc. The function is trained on when to give a scene change indication. When there is delta change, which is negligible from the perspective of human perception, this LMFF function will send a message\flag to the receiver stating that the earlier frame can be displayed again. Thus, in a preferred embodiment, the present invention provides a method for video conferencing at ultra low bandwidth comprising the steps of: capturing the first I-frame at senders end, storing the I-frame in a reference buffer and transmitting the frame to a receiver; detecting the motion of a next frame at senders end with the help of a local motion frequency function (IMFF) with respect to the reference frame stored in said reference buffer to determine if there is substantial change in scene; and when motion detected is beyond a threshold value, giving an indication for transmitting the changed scene to the receiver.

Documents

Application Documents

#	Name	Date
1	941-KOL-2009_EXAMREPORT.pdf	2016-06-30
1	abstract-941-kol-2009.jpg	2011-10-07
2	941-kol-2009-specification.pdf	2011-10-07
2	941-KOL-2009-(27-08-2013)-CORRESPONDENCE.pdf	2013-08-27
3	941-kol-2009-gpa.pdf	2011-10-07
3	941-kol-2009-abstract.pdf	2011-10-07
4	941-kol-2009-form 3.pdf	2011-10-07
4	941-kol-2009-claims.pdf	2011-10-07
5	941-kol-2009-form 2.pdf	2011-10-07
5	941-KOL-2009-CORRESPONDENCE-1.1.pdf	2011-10-07
6	941-kol-2009-form 1.pdf	2011-10-07
6	941-kol-2009-correspondence.pdf	2011-10-07
7	941-KOL-2009-FORM 1-1.1.pdf	2011-10-07
7	941-kol-2009-description (complete).pdf	2011-10-07
8	941-kol-2009-drawings.pdf	2011-10-07
9	941-KOL-2009-FORM 1-1.1.pdf	2011-10-07
9	941-kol-2009-description (complete).pdf	2011-10-07
10	941-kol-2009-correspondence.pdf	2011-10-07
10	941-kol-2009-form 1.pdf	2011-10-07
11	941-kol-2009-form 2.pdf	2011-10-07
11	941-KOL-2009-CORRESPONDENCE-1.1.pdf	2011-10-07
12	941-kol-2009-form 3.pdf	2011-10-07
12	941-kol-2009-claims.pdf	2011-10-07
13	941-kol-2009-gpa.pdf	2011-10-07
13	941-kol-2009-abstract.pdf	2011-10-07
14	941-kol-2009-specification.pdf	2011-10-07
14	941-KOL-2009-(27-08-2013)-CORRESPONDENCE.pdf	2013-08-27
15	abstract-941-kol-2009.jpg	2011-10-07
15	941-KOL-2009_EXAMREPORT.pdf	2016-06-30