Abstract: ABSTRACT METHOD AND APPARATUS FOR CUSTOMIZING AUDIO CONTENT OF VIDEO FILE IN A MULTIMEDIA BASED SYSTEM The present invention describes a method of customizing audio content of a video file in a multimedia system based on at least one user input provided to the multimedia system. The method comprising segregating one or more objects from one or more scenes in the video file into foreground objects and background objects by processing the video file, identifying one or more audio objects in one or more predefined virtually divided portions of a display panel, selected via the at least one user input provided to the multimedia system and enabling customization on the identified one or more audio objects corresponding to one or more audio channels based on the at least one user input provided to the multimedia system. Also, the present invention a multimedia system for customizing audio content of a video file based on at least one user input provided to a multimedia system. Figure 3
CLIAMS:
We Claim:
1. A method of customizing audio content of a video file in a multimedia system based on at least one user input provided to the multimedia system comprising:
segregating one or more objects from one or more scenes in the video file into foreground objects and background objects by processing the video file;
identifying one or more audio objects in one or more predefined virtually divided portions of a display panel, selected via the at least one user input provided to the multimedia system; and
enabling customization on the identified one or more audio objects corresponding to one or more audio channels based on the at least one user input provided to the multimedia system.
2. The method as claimed in claim 1, further comprising:
identifying the corresponding audio object perceptually from the one or more scenes in the video file.
3. The method as claimed in claim 1, wherein the one or more predefined virtually divided portions are created based on number of speakers present in the multimedia system.
4. The method as claimed in claim 1, wherein identifying one or more audio objects in one or more predefined virtually divided portions of a display panel comprising:
determining direction of the at least one user input, where the direction of the user input is based on visual content in each scene of the video file;
categorizing the corresponding virtually divided portion into foreground image and background image; and
identifying the one or more background objects from corresponding background images and the one or more foreground objects from corresponding foreground images selected via the at least one user inputs provided to the multimedia system.
5. The method as claimed in claim 1, further comprising:
determining one or more customization of the one or more audio objects selected by the user based on the at least one user input, where the user input comprises one of contact and contactless interaction with the display panel of the multimedia system.
6. The method as claimed in claim 5, wherein the one or more customization of the one or more audio objects based on the at least one user input comprises at least one of :
effect addition, muting, gain increase and morphing.
7. The method as claimed in claim 1, wherein enabling customization on the identified one or more audio objects corresponding to one or more audio channel based on the at least one user input, comprises
detecting one or more bit streams from which the one or more background audio objects(BGO) and one or more foreground audio objects(FGO) are retrieved;
identifying one or more mixture constituents of the audio objects for each audio channel and weight of each of the mixture constituents based on the scene of video file rendering information in one or more bit streams; and
generating mixing equations for the one or more audio channels by applying a change in weights of the one or more mixture constituents.
8. The method as claimed in claim 1, further comprising:
providing customization of the one or more audio objects until the scene in the video file is changed.
9. A multimedia system for customizing audio content of a video file based on at least one user input provided to a multimedia system comprises:
a display panel;
one or more speakers; and
an audio customisation module, wherein the audio customisation module is configured for customizing audio content of the video file in the multimedia system based on the at least one user input provided to the multimedia system.
10. The multimedia system as claimed in claim 9, wherein the audio customisation module comprises:
a user input identification module adapted for identifying the at least one user input to the multimedia system;
an audio object identification module adapted for identifying one or more audio objects in one or more predefined virtually divided portions of the display panel, selected via the at least one user input provided to the multimedia system; and
a customization enabling module adapted for enabling customization on the identified one or more audio objects corresponding to one or more audio channels based on at least one user input provided to the multimedia system.
11. The multimedia system as claimed in claim 9, further comprising display panel dividing module for virtually dividing the display panel.
12. The multimedia system as claimed in claim 10, wherein the audio object identification module adapted for identifying one or more audio objects in one or more predefined virtually divided portions of the display panel, and configured to perform the steps comprising:
identifying one or more audio objects in one or more predefined virtually divided portions of the display panel comprising:
determining direction of the at least one user input, where the direction of the at least one user input is based on visual content in each scene of the video file;
categorizing the corresponding virtually divided portion in to foreground image and background image; and
identifying one or more background objects from the corresponding background images, and one or more foreground objects from the corresponding foreground images selected via the at least one user input provided to the multimedia system .
13. The multimedia system as claimed in claim 10, wherein the customization enabling module adapted for enabling customization on the identified one or more audio objects corresponding to one or more audio channels based on the at least one user input, and configured to perform the steps comprising:
detecting one or more bit streams from which the one or more background audio objects(BGO) and the one or more foreground audio objects(FGO) are retrieved;
identifying one or more mixture constituents of the audio objects for each audio channel and weight of each of the mixture constituents based on the scene of video file rendering information in the one or more bit streams; and
generating mixing equations for the one or more audio channels by applying a change in weights of the one or more mixture constituents.
Dated this the 09th day of January 2015
Signature
KEERTHI J S
Patent agent
Agent for the applicant
,TagSPECI:FORM 2
THE PATENTS ACT, 1970
[39 of 1970]
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(Section 10; Rule 13)
METHOD AND APPARATUS FOR CUSTOMIZING AUDIO CONTENT OF VIDEO FILE IN A MULTIMEDIA BASED SYSTEM
SAMSUNG R&D INSTITUTE INDIA – BANGALORE PRIVATE LIMITED
# 2870, ORION Building,Bagmane Constellation Business Park,
Outer Ring Road, Doddanakundi Circle,
Marathahalli Post, Bangalore-560 037,
an Indian company
The following Specification particularly describes the invention and the manner in which it is to be performed
FIELD OF THE INVENTION
The present invention relates to the field of Multimedia systems and more particularly relates to an apparatus and method for customizing audio content of a multimedia file.
BACKGROUND OF THE INVENTION
Multimedia is the field concerned with the computer-controlled integration of text, graphics, drawings, still and moving images (Video), animation, audio, and any other media where every type of information can be represented, stored, transmitted and processed digitally. Conventional multimedia content with audio video components are made to be played back on any device that support the multimedia files. Typically, all the multimedia contents are always intended for generic audience and there is no customization allowed from the user perspective.
When a video is played in a device, the user is subjected with different sounds in tandem with the visual scene and the final soundscape is ideally a linear mixture of different sound sources. From the user perspective, the need for change/customization to the soundscape arise from combined audio as well as video impression that user gets. But the current systems allow only a pre-defined change in the audio irrespective of video scene and also the changes effects the entire sound mixture with little selectivity.
Development of new standards in the multimedia field offers more user experience and advanced options to customize the audio experience. Spatial Audio Object Coding (SAOC) is a standard that gives more freedom and flexibility to the user. However based on SAOC standard, there exists a need for controlling audio effects based on user needs in real time to improve the user experience and user preferences.
SUMMARY
An objective of the invention is to provide an apparatus and method for customizing audio content of a multimedia file.
One aspect of present invention describes a method of customizing audio content of a video file in a multimedia system based on at least one user input provided to the multimedia system. The method comprising segregating one or more objects from one or more scenes in the video file into foreground objects and background objects by processing the video file, identifying one or more audio objects in one or more predefined virtually divided portions of a display panel, selected via the at least one user input provided to the multimedia system and enabling customization on the identified one or more audio objects corresponding to one or more audio channels based on the at least one user input provided to the multimedia system. The method further comprises identifying the corresponding audio object perceptually from the one or more scenes in the video file.
According to one embodiment of present invention, the one or more predefined virtually divided portions are created based on number of speakers present in the multimedia system. In one embodiment of present invention, the identification of one or more audio objects in one or more predefined virtually divided portions of a display panel comprise determining direction of the at least one user input, where the direction of the user input is based on visual content in each scene of the video file, categorizing the corresponding virtually divided portion into foreground image and background image and identifying the one or more background objects from corresponding background images and the one or more foreground objects from corresponding foreground images selected via the at least one user inputs provided to the multimedia system.
The method of customizing audio content of a video file in a multimedia system further comprises determining one or more customization of the one or more audio objects selected by the user based on the at least one user input, where the user input comprises one of contact and contactless interaction with the display panel of the multimedia system.
In one aspect of present invention, the method of enabling customization on the identified one or more audio objects corresponding to one or more audio channel based on the at least one user input, comprises detecting one or more bit streams from which the one or more background audio objects(BGO) and one or more foreground audio objects(FGO) are retrieved, identifying one or more mixture constituents of the audio objects for each audio channel and weight of each of the mixture constituents based on the scene of video file rendering information in one or more bit streams and generating mixing equations for the one or more audio channels by applying a change in weights of the one or more mixture constituents. The customization is provided of the one or more audio objects until the scene in the video file is changed.
A multimedia system for customizing audio content of a video file based on at least one user input provided to a multimedia system. The multimedia system according to one embodiment of present invention comprises a display panel, one or more speakers and an audio customisation module. The audio customisation module is configured for customizing audio content of the video file in the multimedia system based on the at least one user input provided to the multimedia system.
The audio customisation module according to one embodiment of present invention includes a user input identification module adapted for identifying the at least one user input to the multimedia system, an audio object identification module adapted for identifying one or more audio objects in one or more predefined virtually divided portions of the display panel, selected via the at least one user input provided to the multimedia system and a customization enabling module adapted for enabling customization on the identified one or more audio objects corresponding to one or more audio channels based on at least one user input provided to the multimedia system. In another aspect of present invention, the multimedia system comprises display panel dividing module for virtually dividing the display panel.
BRIEF DESRIPTION OF THE ACCOMPANYING DRAWINGS
The aforementioned aspects and other features of the present invention will be explained in the following description, taken in conjunction with the accompanying drawings, wherein:
Figure 1 illustrates a schematic diagram of a multimedia system, according to an embodiment of the present invention.
Figure 2 illustrates a three dimensional graphical representation of soundscape with audio objects of a multimedia system, according to an embodiment of the present invention.
Figure 3 is a flow diagram illustrating a method of customizing audio content of a video file in a multimedia system based on at least one user input, according to one embodiment of present invention.
Figure 4 is a flow diagram illustrating a method of identifying one or more objects in one or more virtually divided portions of the display panel selected via the at least one user input, according to one embodiment of present invention.
Figure 5 is a flow diagram illustrating a method of enabling customization on the identified one or more audio objects corresponding to one or more audio channel based on the at least one user input, according to one embodiment of present invention.
Figure 6 illustrates a block diagram illustrating a system for customizing audio content of a video file in a multimedia system based on at least one user input, according to one embodiment of present invention.
DETAILED DESCRIPTION OF THE INVENTION
The embodiments of the present invention will now be described in detail with reference to the accompanying drawings. However, the present invention is not limited to the embodiments. The present invention can be modified in various forms. Thus, the embodiments of the present invention are only provided to explain more clearly the present invention to the ordinarily skilled in the art of the present invention. In the accompanying drawings, like reference numerals are used to indicate like components.
The specification may refer to “an”, “one” or “some” embodiment(s) in several locations. This does not necessarily imply that each such reference is to the same embodiment(s), or that the feature only applies to a single embodiment. Single features of different embodiments may also be combined to provide other embodiments.
As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless expressly stated otherwise. It will be further understood that the terms “includes”, “comprises”, “including” and/or “comprising” when used in this specification, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Furthermore, “connected” or “coupled” as used herein may include operatively connected or coupled. As used herein, the term “and/or” includes any and all combinations and arrangements of one or more of the associated listed items.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Figure 1 illustrates a schematic diagram of a multimedia system 100, according to an embodiment of the present invention. In view of the present invention, the multimedia system refers to theatre system having multiple speakers and woofer. Development in the multimedia system contributes new meaning to the audio and visual experience to the user in theatre, especially in systems such as home theatre systems. The current scenario defines an advanced multimedia system providing ultimate user experience including customization of audio effects based on user preferences. According to one exemplary embodiment, the multimedia system 100 is a home theatre system 100. However, it is to be noted that the invention is not limited to the home theatre system. The present invention can be enabled in any multimedia system having display panel and one or more speakers.
The figure represents an exemplary home theatre system 100 having a display panel 101 and two speakers (not shown in the figure). The two speakers are represented as two audio channels, such as right audio channel 105 and left audio channel 106. Every scene in of the video file can be categorised as foreground images and background images. From each of the foreground images, the foreground objects (102 and 103) are identified. Likewise, from every background images, the background objects (104) are identified.
The display panel 101 of the multimedia system 100 may be any screen. For instance, in case of home theatre system, the display panel may be a TV screen. According to one embodiment of present invention, the display panel is virtually divided in to one or more pre-defined portions. This division is based on the number of audio channels or the number of speakers present in the multimedia system. In case of multimedia system 100 such as theatre systems, the audio content corresponding to each object in a scene is sent to various speakers based on the location of the objects. For instance, the sound or the audio content of the background object is sent to the speakers located at the back side of the theatre, whereas the sound of the foreground objects are send to various speakers located at different sides of the theatre based on the location of the foreground objects. This provides better user experience. The virtual division of display panel 101 enables the identification of audio channel corresponding to the background and foreground object.
In one embodiment of present invention, the customization of audio content is performed based on the selection of an object present in the scene of the video file. The object is selected by any of the user interaction technique present in the multimedia system. The corresponding audio channel and the speaker is determined based on the virtually divided portion of the display panel 101.
Some of the customization includes removal of the background score, amplification of the foreground actor / actors, addition of more bass/effects on to one of the actors, addition of a different background score and change of the panning of the Soundscape. Now as per the system the user once decides upon the change that the user needs to make, the user may simply select a region of the screen connected to the source. For example as per the figure, if the background score has to be manipulated, the user can randomly select any part of the screen which falls into the background object of the image. If the actor voice has to be modified, the user can select the chosen actor which in turn happens to be foreground object. Depending on the chosen Object and its direction, consequent audio Objects are manipulated in different channels and thus giving the final enhanced experience.
Figure 2 illustrates a three dimensional graphical representation of soundscape with audio objects of a multimedia system 100, according to an embodiment of the present invention. The soundscape refers to the way in which audio Objects are area arranged and mixed in the audio stream. The multimedia system 100 according to present invention may or may not contain different sources. If there are different audio sources then each channel is like a weighted mixture of different sources. In other words, the different audio Objects (each object belongs to a unique source) are panned and placed such that different channels are formed as mixture of these objects according to the scene rendering. The audio objects refer to any of foreground and background objects that have an audio content. The foreground objects and background objects are placed in a three dimensional graph based on their location in the display panel.
Figure 3 is a flow diagram illustrating a method of customizing audio content of a video file in a multimedia system 100 based on at least one user input, according to one embodiment of present invention. According to present embodiment of the invention, customization of audio content of each of the audio object present in the scene of video is enabled. At step 301, the audio of one or more objects in each scene of a video file are segregated as foreground object and background object. In order to segregate the objects as background object and foreground object, the background subtraction method is used. At step 302, the user input to the multimedia system is detected. The user input may be provided by either a contact or contactless interaction with the display panel of the multimedia system. At step 303, it is determined whether the user input is valid or not. Once a valid user input is detected, then at step 304, the audio customization module identifies one or more audio objects in one or more predefined virtually divided portions of the display panel, selected via the user inputs provided to the multimedia system. The method of identifying one or more objects in the one or more virtually divided portions of the display panel selected via the at least one user input is explained in detail in Figure 4. This results in identifying the background images and foreground images. The background images and foreground images can be referred as background region and foreground region. Further, at step 305, the customization on the identified one or more audio objects corresponding to one or more audio channel based on at least one user input provided to the multimedia system is enabled. The method of enabling customization on the identified one or more audio objects is explained in detail in Figure 5.
The required customization includes, but not limited to effect addition, muting, gain increase audio morphing etc, and is chosen based on user input.
Further, at step 306, the customization of the one or more audio objects is provided until the scene in the video file is changed. The settings are carried forward based on a check for a scene change and determines whether there is a scene change and return to the original audio settings when the scenes changes.
Figure 4 is a flow diagram illustrating a method of identifying one or more objects in one or more virtually divided portions of the display panel selected via the at least one user input, according to one embodiment of present invention. The method includes determining direction of at least one user input at step 401. The direction of the user input is determined based on visual content in each scene of the video file. The user selection is evaluated and tagged as foreground or background along with the direction of the identified object. Then, at step 402 the corresponding virtually divided portion is categorized in to foreground images and background images. Further, one or more background objects from corresponding background images and one or more foreground objects from corresponding foreground images are identified based on the selection via the user inputs provided to the multimedia system 100 as shown in step 403.The corresponding speaker is any of the speakers in the multimedia system which has effect of the audio object.
Figure 5 is a flow diagram illustrating a method of enabling customization on the identified one or more audio objects corresponding to one or more audio channels based on the at least one user input, according to one embodiment of present invention. At step 501, one or more bit streams are detected from which the one or more background audio objects (BGO) and one or more foreground audio objects (FGO) are retrieved. Then, the one or more mixture constituents of the audio objects for each audio channel is based on the scene of video file rendering information in one or more bit streams at step 501. A bit stream is a sequential stream of data collected by the receiver/playback device from a transmission channel. For instance, if a movie is getting played on TV/monitor then the bit stream/data is provided by either a storage medium or cable. So whenever these data starts to come there few bit at the beginning and intermittently, gives info about whether the data following is valid or not. If the data is found valid, then it is a bit stream. In one embodiment, a bit stream parser in the playback module detects the validity of this data.
It also finds the mixture constituents from scene rendering info. Here the mixture generally constitutes a linear combination of weighted audio objects. The mixture constituents include each individual weight audio object. Depending on the video selection and its direction, correct weighted audio objects are identified as shown in step 502. At step 503, mixing equations are generated for the one or more audio channels by applying a change in weights of the one or more mixture constituents.
For instance, consider that the home theatre in the exemplary embodiment comprises two speakers. Then the mixing equations for right channel and the left channel are given below:
Left Channel = (FBO1 * Wt1) + (FBO2 * Wt2) + (BGO1 * Wt1) …(1)
Right Channel = (FBO1 * Wt3) + (FBO2 * Wt4) + (BGO1 * Wt5) …(2)
From the above equations, it is clear that each channel ideally contains both the foreground as well as the background audio objects with different weights. The Weights of different Objects varies in different channel according to the panning of the Objects. Once the user selects an object from the visual of the video file, the manipulation of audio objects is carried out by altering the weights of each relevant FGO or BGO.
For example from Figure 1, if the foreground object 1 102 has to be selectively muted, then categorizing the selection as foreground or background (here in this case Foreground), choosing the dominant FGO from Objects gathered from input bit stream based on the direction of selection, making the weighing factor for that FGO as zero for both channels.
Figure 6 illustrates a block diagram of a multimedia system 600 for customizing audio content of a video file based on at least one user input, according to one embodiment of present invention. According to one embodiment of present invention, the multimedia system includes an audio customizing module 601, display panel 101, one or more speakers 606 and a subwoofer 607. The subwoofers 607 are basically speakers used for reproduction of low pitched audio frequencies known as bass frequency. The bass frequency typically ranges from 20Hz to 100 Hz and contains very little energy. So ideally the woofers are made of materials which can withstand high amount of power and drivers. The display panel 101 may be TV or any other device having a display.
The audio customizing module 601 comprises a user input identification module 602, audio object identification module 603, display panel dividing module 604 and customization enabling module 605. The user input identification module 602 identifies the user input to the multimedia system 100. The input includes the selection of one of a foreground object and background object for customization and the type of customization to be performed in the selected object. The user input comprises one of contact and contactless interaction with the display panel of the multimedia system 100. Likewise, effect addition, muting, gain increase and audio morphing are some of the typical customization inputs.
The audio object identification module 603 identifies the audio object selected by the user via the user input. The display panel dividing module 604 present in the multimedia system 100 divides the display panel 101 in one or more virtual portions in order to enable the identification of the selected object. The customization enabling module 605 enables the mixing of different weightage of mixture constituents of the audio object and generates mixing equation to enable customization in the audio content of the selected audio object.
The present invention provides the whole audio experiences associated with a multimedia playback are made more interactive and customizable. The user is given the Choice to make the personal adjustments at real time in the sound scape based on visual perception of scenes. The input video scenes are analyzed for background / foreground separation and based on it consequent audio objects are manipulated. Now the user selection is mapped into any one of these regions. Further based on the linear mixing equation for each channels, the weights corresponding to the selected audio Object are manipulated.
Although the invention of the method and system has been described in connection with the embodiments of the present invention illustrated in the accompanying drawings, it is not limited thereto. It will be apparent to those skilled in the art that various substitutions, modifications and changes may be made thereto without departing from the scope and spirit of the invention.
| # | Name | Date |
|---|---|---|
| 1 | POA_Samsung R&D Institute India-new.pdf | 2015-03-12 |
| 2 | 2013_DMC_1229_Form 5_filed with IPO on 7 Jan 2015.pdf | 2015-03-12 |
| 3 | 2013_DMC_1229_Drawings_filed with IPO on 7 Jan 2015.pdf | 2015-03-12 |
| 4 | 2013_DMC_1229_complete Specification_filed with IPO on 7 Jan 2015.pdf | 2015-03-12 |
| 5 | abstract 159-CHE-2015.jpg | 2015-08-19 |
| 6 | 159-CHE-2015-FORM 13 [25-10-2019(online)].pdf | 2019-10-25 |
| 7 | 159-CHE-2015-FER.pdf | 2019-11-13 |
| 8 | 159-CHE-2015-FORM-26 [13-05-2020(online)].pdf | 2020-05-13 |
| 9 | 159-CHE-2015-FER_SER_REPLY [13-05-2020(online)].pdf | 2020-05-13 |
| 10 | 159-CHE-2015-US(14)-HearingNotice-(HearingDate-14-11-2023).pdf | 2023-10-26 |
| 11 | 159-CHE-2015-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [13-11-2023(online)].pdf | 2023-11-13 |
| 12 | 159-CHE-2015-US(14)-ExtendedHearingNotice-(HearingDate-13-12-2023).pdf | 2023-11-14 |
| 13 | 159-CHE-2015-FORM-26 [11-12-2023(online)].pdf | 2023-12-11 |
| 14 | 159-CHE-2015-Correspondence to notify the Controller [11-12-2023(online)].pdf | 2023-12-11 |
| 15 | 159-CHE-2015-PETITION UNDER RULE 138 [28-12-2023(online)].pdf | 2023-12-28 |
| 1 | SearchStrategyMatrix_07-11-2019.pdf |