Abstract: [035] The present invention discloses a system for video figure retrieval based on deep learning and method thereof. The system includes, but not limited to, one or more processing unit coupled with a memory unit that reads and executes instructions from the memory, including instructions for: receiving one or more video files wherein the video files comprise at least one video figure; extracting a plurality of visual descriptors from the video file; forming descriptor vectors from the multiple visual descriptors using the deep learning data modelling technique; and performing, in response to the video figure being a key figure of a video sequence, a multi-stage object search of the video frame based on predetermined feature templates and a predetermined number of stages to determine a first object region and a second object region in the video figures, to test the first and second object regions based on colouring information to determine the first object region is a valid object retrieval region and the second object region is an invalid object region. Accompanied Drawing [FIG. 1]
Claims:1. A system for video figure retrieval based on deep learning methods, comprising:
one or more processing unit coupled with a memory unit that reads and executes instructions from the memory, including instructions for:
receiving one or more video files wherein the video files comprise at least one video figure;
extracting a plurality of visual descriptors from the video file;
forming descriptor vectors from the multiple visual descriptors using the deep learning data modelling technique; and
performing, in response to the video figure being a key figure of a video sequence, a multi-stage object search of the video frame based on predetermined feature templates and a predetermined number of stages to determine a first object region and a second object region in the video figures, to test the first and second object regions based on coloring information to determine the first object region is a valid object retrieval region and the second object region is an invalid object region.
2. The system as claimed in claim 1, wherein the processing unit is further configured to reduce a dimension of the descriptor vectors to generate projected descriptors and creates a plurality of cluster keys from the projected descriptors to determine a first object region and a second object region.
3. The system as claimed in claim 1, wherein the processing unit is further configured to update a matching database in a memory unit using the cluster keys generated from the video files, and the coloring information further comprises a coloring probability map.
4. The system as claimed in claim 1, wherein the object retrieval region comprises, but not limited to, a rectangular region, the processing unit further to determine a free form shape object region corresponding to the object region, wherein the free form shape object region has one or more pixel accuracy or a small block of a plurality of pixels accuracy.
5. The system as claimed in claim 1, wherein the processing unit is further configured to generate a dynamic coloring probability histogram based on the coloring format video figure retrieval and an object region in the video figure and determining whether a dynamic color probability histogram is valid or invalid.
6. The system as claimed in claim 1, wherein the processing unit is further configured to determining a first motion wave and a second motion wave corresponding to the coloring region based on the video figure and one or more previous video figures and combining the first motion wave, the second motion wave, and a coloring probability map corresponding to the video figure to generate dynamic color probability histogram.
7. The system as claimed in claim 1, wherein the processing unit is further having a refining module, configured to encoding the video figure, the object region by reducing temporal inconsistency of the object region based on the object region and one or more previous regions in previous video figures.
8. The system as claimed in claim 1, wherein the refining module is further configured for replacing the object region based on a decoded metadata, cropping and displaying image data corresponding only to the object region based on the decoded metadata, or indexing the decoded video figure based on the decoded metadata.
9. The system as claimed in claim 1, wherein the processing unit with the memory unit can be further resided on the computation server and communicatively connected to the end-terminals through computer network and provided with compatible interfaces and a set of video files processing operation synchronization devices connected to the internet.
, Description:[001] The present invention relates to the field of the image and video recognition, retrieval and processing using deep learning data modelling. The invention more particularly relates to a system for video figure retrieval based on deep learning and method thereof.
BACKGROUND OF THE INVENTION
[002] In recent years, the availability of camera-equipped phones has been increased with millions of users, so has the number of videos or images being snapped, shared and stored by users on such devices has also been increased. Further, more and more electronic devices such as, but not limited to, smartphones used by users are now able to play back video and display images to users. Few more gadgets such as tablet computers, virtual reality eyeglasses, computers, laptops, and the like have a way by which media, e.g., video or images, can be received and retrieved by the electronic device and a display unit on which the received media is displayed to a user.
[003] Further, video interfaces for streaming media and IPTV have also been developed in last decade, and activities such as watching online series have become important entertainment methods for people now-a-days. Cisco's VNI predicts that by 2022, IP video traffic will account for 82% of Internet IP traffic. In this context, users have a strong demand for more diversified and more convenient better video services. Now the question comes here, how to search for a character in a video, find a segment of an interested object such as particular face in a movie, or search whether a certain character appears in a video, and search for a video containing a specific character in a movie library has become a problem that needs to be solved.
[004] Furthermore, deep learning data modelling in context of video files has played vital role as popular artificial intelligence and machine learning approach instantly played a major role in image processing field and also achieve proud milestone in image processing domain. Therefore, present invention is mainly based upon the multi-level of deep learning for segmentation a plurality of video files.
[005] In addition, event segmentation for a video file is in itself the primary work of video labeling and object retrieval, and the categorising method towards multiclass channel is improved for video segmentation. The limitation of artificial extraction feature can be eliminated based on deep learning, and then improve method video retrieval accuracy rate.
[006] Considering the above drawbacks, accordingly, there remains a need in the prior art for a technical convergence to make the better video figures retrieval system, interfaces and method compact, it is in this context that the present invention provides an internet firewall processing through machine learning based system and method thereof, which provides a system for video figure retrieval based on deep learning and method thereof. Therefore, it would be useful and desirable to have a system and interface to meet the above-mentioned needs.
SUMMARY OF THE PRESENT INVENTION
[007] In view of the foregoing disadvantages inherent in the known types of conventional video files data retrieval system, method and devices for conventional internet, are now present in the prior art, the present invention provides a system for video figure retrieval based on deep learning and method thereof. The system is designed with, but not limited to, at least two set equipment implementation phase, in which the first set of equipment is the physical placement of the portable or wired hardware unit optionally with the internet architecture, which is communicatively coupled with the second set of equipment implemented with the help of a processing unit provided for video files data processing and based on a machine learning and an Artificial Intelligence trained embedded software and algorithm, which has all the advantages of the prior art and none of the disadvantages.
[008] The main aspect of the present invention is to provide a video figure retrieval system, which comprises a processing unit provided for preparation of training video files, expansion and pre-treatment. Herein, the object detection in at least one video figure is trained using two kinds of different depth network models and deep learning data modelling, needs substantial amounts of facial and/or coloring data are trained, accordingly, it would be desirable to be directed to different data types, carry out data acquisition and the expansion of different modes, strengthen instruction Practice the robustness of model, improve Detection results.
[009] Another aspect of the present invention is to provide a system, in which due to training depth network and deep learning data modelling of the video files to need substantial amounts of trained data, and the video figures involved in video, which is used for training and learning the new action for retrieval. Further, the data augmentation that different modes are carried out to training data, with big data analytics has been provided, training and retrieval results are improved.
[010] The proposed system and method is implemented on, but not limited to, the Field Programmable Gate Arrays (FPGAs) and the like, PC, Microcontroller and with other known processors to have computer algorithms and instruction up gradation for supporting many applications domain where the aforesaid problems to solution is required.
[011] In this respect, before explaining at least one object of the invention in detail, it is to be understood that the invention is not limited in its application to the details of set of rules and to the arrangements of the various models set forth in the following description or illustrated in the drawings. The invention is capable of other objects and of being practiced and carried out in various ways, according to the need of that industry. Also, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting.
[012] These together with other objects of the invention, along with the various features of novelty which characterize the invention, are pointed out with particularity in the disclosure. For a better understanding of the invention, its operating advantages and the specific objects attained by its uses, reference should be made to the accompanying drawings and descriptive matter in which there are illustrated preferred embodiments of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[013] The invention will be better understood and objects other than those set forth above will become apparent when consideration is given to the following detailed description thereof. Such description makes reference to the annexed drawings wherein:
[014] FIG. 1 illustrates a schematic diagram of a system for video figure retrieval based on deep learning and method thereof, in accordance with an embodiment of the present invention; and
[015] FIG. 2 illustrates a block diagram of the system for video figure retrieval based on deep learning and method thereof, in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[016] While the present invention is described herein by way of example using embodiments and illustrative drawings, those skilled in the art will recognize that the invention is not limited to the embodiments of drawing or drawings described and are not intended to represent the scale of the various components. Further, some components that may form a part of the invention may not be illustrated in certain figures, for ease of illustration, and such omissions do not limit the embodiments outlined in any way. It should be understood that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents, and alternatives falling within the scope of the present invention as defined by the appended claims. As used throughout this description, the word "may" is used in a permissive sense (i.e. meaning having the potential to), rather than the mandatory sense, (i.e. meaning must). Further, the words "a" or "an" mean "at least one” and the word “plurality” means “one or more” unless otherwise mentioned. Furthermore, the terminology and phraseology used herein is solely used for descriptive purposes and should not be construed as limiting in scope. Language such as "including," "comprising," "having," "containing," or "involving," and variations thereof, is intended to be broad and encompass the subject matter listed thereafter, equivalents, and additional subject matter not recited, and is not intended to exclude other additives, components, integers or steps. Likewise, the term "comprising" is considered synonymous with the terms "including" or "containing" for applicable legal purposes. Any discussion of documents, acts, materials, devices, articles and the like is included in the specification solely for the purpose of providing a context for the present invention. It is not suggested or represented that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention.
[017] In this disclosure, whenever a composition or an element or a group of elements is preceded with the transitional phrase “comprising”, it is understood that we also contemplate the same composition, element or group of elements with transitional phrases “consisting of”, “consisting”, “selected from the group of consisting of, “including”, or “is” preceding the recitation of the composition, element or group of elements and vice versa.
[018] The present invention is described hereinafter by various embodiments with reference to the accompanying drawings, wherein reference numerals used in the accompanying drawing correspond to the like elements throughout the description. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiment set forth herein. Rather, the embodiment is provided so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those skilled in the art. In the following detailed description, numeric values and ranges are provided for various aspects of the implementations described. These values and ranges are to be treated as examples only and are not intended to limit the scope of the claims. In addition, a number of materials are identified as suitable for various facets of the implementations. These materials are to be treated as exemplary and are not intended to limit the scope of the invention.
[019] Referring now to the drawings, these are illustrated in FIG. 1-2, the present invention discloses a system for video figure retrieval based on deep learning and method thereof. The system is comprised of, but not limited to, one or more processing unit coupled with a memory unit that reads and executes instructions from the memory, including instructions for: receiving one or more video files wherein the video files comprise at least one video figure; extracting a plurality of visual descriptors from the video file; forming descriptor vectors from the multiple visual descriptors using the deep learning data modelling technique; and performing, in response to the video figure being a key figure of a video sequence, a multi-stage object search of the video frame based on predetermined feature templates and a predetermined number of stages to determine a first object region and a second object region in the video figures, to test the first and second object regions based on colouring information to determine the first object region is a valid object retrieval region and the second object region is an invalid object region.
[020] In accordance with another embodiment of the present invention, the processing unit is further configured to reduce a dimension of the descriptor vectors to generate projected descriptors and creates a plurality of cluster keys from the projected descriptors to determine a first object region and a second object region.
[021] In accordance with another embodiment of the present invention, the processing unit is further configured to update a matching database in a memory unit using the cluster keys generated from the video files, and the colouring information further comprises a colouring probability map.
[022] In accordance with another embodiment of the present invention, the object retrieval region comprises, but not limited to, a rectangular region, the processing unit further to determine a free form shape object region corresponding to the object region, wherein the free form shape object region has one or more pixel accuracy or a small block of a plurality of pixels accuracy.
[023] In accordance with another embodiment of the present invention, the processing unit is further configured to generate a dynamic coloring probability histogram based on the coloring format video figure retrieval and an object region in the video figure and determining whether a dynamic color probability histogram is valid or invalid.
[024] In accordance with another embodiment of the present invention, the processing unit is further configured to determining a first motion wave and a second motion wave corresponding to the colouring region based on the video figure and one or more previous video figures and combining the first motion wave, the second motion wave, and a colouring probability map corresponding to the video figure to generate dynamic colour probability histogram.
[025] In accordance with another embodiment of the present invention, the processing unit is further having a refining module, configured to encoding the video figure, the object region by reducing temporal inconsistency of the object region based on the object region and one or more previous regions in previous video figures.
[026] In accordance with another embodiment of the present invention, the refining module is further configured for replacing the object region based on a decoded metadata, cropping and displaying image data corresponding only to the object region based on the decoded metadata, or indexing the decoded video figure based on the decoded metadata.
[027] Further, various exemplary computer system for implementing embodiments consistent with the present disclosure. Variations of computer system may be used for implementing the system for video figure retrieval based on deep learning and method thereof. Computer system may comprise a central processing unit (“CPU” or “processor”). Processor may comprise at least one data processor for executing program components for executing user or system-generated requests. A user may include a person, a person using a device such as such as those included in this disclosure, or such a device itself. The processor may include specialized processing units such as integrated system (bus) controllers, memory management control units, floating point units, graphics processing units, digital signal processing units, etc. The processor may include a microprocessor, such as AMD Athlon, Duron or Opteron, ARM’s application, embedded or secure processors, IBM PowerPC, Intel’s Core, Itanium, Xeon, Celeron or other line of processors, etc. The processor may be implemented using mainframe, distributed processor, multi-core, parallel, grid, or other architectures. Some embodiments may utilize embedded technologies like application-specific integrated circuits (ASICs), digital signal processors (DSPs), Field Programmable Gate Arrays (FPGAs), etc.
[028] Processor may be disposed in communication with one or more input/output (I/O) devices via I/O interfaces. The I/O interfaces may employ communication protocols/methods such as, without limitation, audio, analog, digital, monoaural, RCA, stereo, IEEE-1394, serial bus, universal serial bus (USB), infrared, PS/2, BNC, coaxial, component, composite, digital visual interface (DVI), high-definition multimedia interface (HDMI), RF antennas, S-Video, VGA, IEEE 802.n /b/g/n/x, Bluetooth, cellular (e.g., code-division multiple access (CDMA), high-speed packet access (HSPA+), global system for mobile communications (GSM), long-term evolution (LTE), WiMax, or the like), etc.
[029] In some embodiments, the processor may be disposed in communication with one or more memory devices (e.g., RAM, ROM, etc.) via a storage interface. The storage interface may connect to memory devices including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as serial advanced technology attachment (SATA), integrated drive electronics (IDE), IEEE-1394, universal serial bus (USB), fiber channel, small computer systems interface (SCSI), etc. The memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical drive, redundant array of independent discs (RAID), solid-state memory devices, solid-state drives, etc. The memory devices may store a collection of program or database components, including, without limitation, an operating system, user interface application, web browser, mail server, mail client, user/application data(e.g., any data variables or data records discussed in this disclosure), etc. The operating system may facilitate resource management and operation of the computer system. Examples of operating systems include, without limitation, Apple Macintosh OS X, Unix, Unix-like system distributions (e.g., Berkeley Software Distribution (BSD), FreeBSD, NetBSD, OpenBSD, etc.), Linux distributions (e.g., Red Hat, Ubuntu, Kubuntu, etc.), IBM OS/2, Microsoft Windows (XP, Vista/7/8, etc.), Apple iOS, Google Android, Blackberry OS, or the like.
[030] The word “module,” “model” “algorithms” and the like as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions, written in a programming language, such as, for example, Java, C, Python or assembly. One or more software instructions in the modules may be embedded in firmware, such as an EPROM. It will be appreciated that modules may comprised connected logic units, such as gates and flip-flops, and may comprise programmable units, such as programmable gate arrays or processors. The modules described herein may be implemented as either software and/or hardware modules and may be stored in any type of computer-readable medium or other computer storage device. Further, in various embodiments, the processor is one of, but not limited to, a general-purpose processor, an application specific integrated circuit (ASIC) and a field-programmable gate array (FPGA) processor. Furthermore, the data repository may be a cloud-based storage or a hard disk drive (HDD), Solid state drive (SSD), flash drive, ROM or any other data storage means.
[031] The above-mentioned system is having various novel aspects such as, but not limited to, the processing unit with the machine learning interface for providing a better video figure retrieval system without access to operating system source code by the present invention and which will be understood by reading and studying the aforesaid embodiments, and further, the system is described which have done deep learning data modelling without requiring modification of the source code of a commercial / non-commercial operating system for secured internet access.
[032] It is to be understood that the above description is intended to be illustrative, and not restrictive. For example, the above-discussed embodiments may be used in combination with each other. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description.
[033] The benefits and advantages which may be provided by the present invention have been described above with regard to specific embodiments. These benefits and advantages, and any elements or limitations that may cause them to occur or to become more pronounced are not to be construed as critical, required, or essential features of any or all of the embodiments.
[034] While the present invention has been described with reference to particular embodiments, it should be understood that the embodiments are illustrative and that the scope of the invention is not limited to these embodiments. Many variations, modifications, additions and improvements to the embodiments described above are possible. It is contemplated that these variations, modifications, additions and improvements fall within the scope of the invention.
| # | Name | Date |
|---|---|---|
| 1 | 202141055780-STATEMENT OF UNDERTAKING (FORM 3) [01-12-2021(online)].pdf | 2021-12-01 |
| 2 | 202141055780-REQUEST FOR EARLY PUBLICATION(FORM-9) [01-12-2021(online)].pdf | 2021-12-01 |
| 3 | 202141055780-FORM-9 [01-12-2021(online)].pdf | 2021-12-01 |
| 4 | 202141055780-FORM 1 [01-12-2021(online)].pdf | 2021-12-01 |
| 5 | 202141055780-DRAWINGS [01-12-2021(online)].pdf | 2021-12-01 |
| 6 | 202141055780-DECLARATION OF INVENTORSHIP (FORM 5) [01-12-2021(online)].pdf | 2021-12-01 |
| 7 | 202141055780-COMPLETE SPECIFICATION [01-12-2021(online)].pdf | 2021-12-01 |