Dynamic Web Browser Based Video Processing Pipeline

Abstract: Disclosed is a system and method for dynamic web browser based video processing on an edge device (110). On receiving the video from a video source (105), ROI are identified in the video frames of the video based on a sense running in the web browser (240), the sense defines ROI to be identified and logic to be performed for processing the video. Then each of the one or more video frames are cropped for producing cropped video frames comprising the ROI and the cropped video frames and associated metadata is stored in a message queue (230). Each of the cropped video frames are sequentially processed based on the sense running in the web browser (240), using image processing deep neural networks, and the processed video frames and the associated metadata are communicated to one of a cloud server (120) or to the web browser for displaying or both. [To be published with FIGURE. 2]

Patent Information

Application #

Filing Date

19 July 2019

Publication Number

04/2021

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

shivani@lexorbis.com

Parent Application

Applicants

ARYABHATTA ROBOTICS PRIVATE LIMITED

PLOT NO M-67, MADUSUDHAN NAGAR, UNIT-4, BHUBANESWAR, ODISHA-

Inventors

1. KUMAR, Anoop

House No 2, Village -Daniyari, Post- Bedupar, Dist- Kushinagar, Uttar Pradesh 274409

2. MISHRA, Alok

Plot No 162, Mangalabag, Cuttack, Odisha-753001

Specification

DESC:FIELD OF THE INVENTION

[001] The present disclosure generally relates to video processing, and more particularly to a dynamic web browser based video processing pipeline on edge devices.

BACKGROUND OF THE INVENTION

[002] Generally, IoT devices such as cameras, CCTVs, etc. are widely used in in homes, businesses, industrial applications, vehicles, security, optimization, etc., and are networked to serve various purposes such as surveillance, monitoring, etc. Typically, such device are networked and connected to a gateway, such as a computer, which is communicatively connected with one or more remote servers. The computer which operates as an edge device in such networks enables communication between the servers for transferring video, and provides user interfaces for various purposes such as surveillance, monitoring, etc.
[003] Typically such edge devices have limited computation power and hence videos are directly communicated to the servers for further processing, and hence require more bandwidth. Even though the computer vision and artificial intelligence algorithms are mature enough for real business use, one of the major deterrents for mass adoption is the cost of processing videos from a webcam/CCTV or any video source is too high on the edge and often has to be executed on the cloud/servers.
[004] In view of the problems associated with conventional systems and devices, there exists a need for a method for processing the video on the edges device having limited computational power.

BRIEF SUMMARY OF THE INVENTION

[005] This summary is provided to introduce a selection of concepts in a simple manner that is further described in the detailed description of the disclosure. This summary is not intended to identify key or essential inventive concepts of the subject matter nor is it intended for determining the scope of the disclosure.

[006] The present disclosure discloses a dynamic web browser based video processing system and method for processing the video on an edge device. The method comprises, receiving the video from a video source, identifying region of interest in one or more video frames of the video based on an application called as a ‘sense’ running in the web browser, wherein the sense defines the region of interest to be identified and logic/operations to be performed for processing the video, cropping each of the one or more video frames for producing one or more cropped video frames comprising the region of interest, storing the one or more cropped video frames and associated metadata in a memory and generating a message queue for processing the one or more cropped video frames, sequentially processing each of the one or more cropped video frames, based on the sense running in the web browser, using image processing deep neural networks and communicating the processed video frames and the associated metadata to one of a cloud server or to the web browser for displaying or both.

[007] To further clarify advantages and features of the present disclosure, a more particular description of the disclosure will be rendered by reference to specific embodiments thereof, which is illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting of its scope. The disclosure will be described and explained with additional specificity and detail with the accompanying figures.

BRIEF DESCRIPTION OF THE FIGURES

[008] The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:

[009] Figure 1 illustrates a network environment for video processing in accordance with an embodiment of the present disclosure;

[0010] Figure 2 is a block diagram of the system 200 of the edge device 110 configured for processing the video received from the video source 105 in accordance with an embodiment of the present disclosure;

[0011] Figure 3 illustrates a sense application store in accordance with an embodiment of the present disclosure; and

[0012] Figure 4 is a flowchart illustrating a dynamic web browser based video processing method for processing the video on an edge 110 device in accordance with an embodiment of the present disclosure.

[0013] Further, persons skilled in the art to which this disclosure belongs will appreciate that elements in the figures are illustrated for simplicity and may not have been necessarily drawn to scale. Furthermore, in terms of the construction of the joining ring and one or more components of the bearing assembly may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.

DETAILED DESCRIPTION

[0014] For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications to the disclosure, and such further applications of the principles of the disclosure as described herein being contemplated as would normally occur to one skilled in the art to which the disclosure relates are deemed to be a part of this disclosure.

[0015] It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the disclosure and are not intended to be restrictive thereof.

[0016] In the present disclosure, relational terms such as first and second, and the like, may be used to distinguish one entity from the other, without necessarily implying any actual relationship or order between such entities.

[0017] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or a method. Similarly, one or more elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other elements, other structures, other components, additional devices, additional elements, additional structures, or additional components. Appearances of the phrase “in an embodiment”, “in another embodiment” and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.

[0018] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The components, methods, and examples provided herein are illustrative only and not intended to be limiting.

[0019] Embodiments of the present disclosure will be described below in detail with reference to the accompanying figures.

[0020] Embodiments of the present disclosure discloses dynamic web pipeline of video processing server for low cost video processing. In other words, a method for low cost video processing on an edge device is disclosed, wherein the edge device may be one of a computer, a laptop, a tablet computer, typically having low computational power.

[0021] Figure 1 illustrates a network environment for video processing in accordance with an embodiment of the present disclosure. As shown, the network environment 100 comprises a plurality of video sources 105-1, 105-2 and 105-3 (shown only three video sources and hereafter considered one video source 105 for illustration) and an edge device 110, wherein the edge device 110 and the video sources 105 are communicatively connected through a communication network 115. The environment 100 further comprises a server 120 which is communicatively connected with the edge device 110 through the communication network 115.

[0022] The video sources 105 may include but not limited to CCTV cameras, cameras, web cameras, video databases and the like. The video sources 105 are configured for capturing video in real-time or the video source 105 may be a memory storing the videos. In this present disclosure, a CCTV camera is considered as an exemplary video source from which the video is collected and processed on the edge device 110. It is to be noted that the one or more video sources 105 may be connected to the edge device 110 through the network 115 as shown or may be directly connected through cables or may be connected wirelessly using known wireless communication technologies.

[0023] The communication network 115 may be a wireless network or a wired network or a combination thereof. Wireless network may include long range wireless radio, wireless personal area network (WPAN), wireless local area network (WLAN), mobile data communications such as 3G, 4G or any other similar technologies. The communication network 115 may be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The communication network 115 may either be a dedicated network or a shared network. Further the communication network 115 may include a variety of network devices, including routers, bridges, servers, modems, computing devices, storage devices, and the like. In one implementation, the video sources 105 and the edge device 115 are connected through the LAN.

[0024] The server 120 may be, for example, a mainframe computer, a computer server, a network of computers, or a cloud server. In one implementation, the server 120 is a cloud server comprising one or more processors, associated processing modules, interfaces, and storage devices communicatively interconnected to one another through one or more communication means for communicating information. The storage associated with the server 120 may include volatile and non-volatile memory devices for storing information and instructions to be executed by the one or more processors and for storing temporary variables or other intermediate information during processing. In one embodiment of the present disclosure, the server 120 is communicatively connected to the edge device 115 as shown and configured for receiving processed video for further processing, storage, display, etc. In one embodiment of the present disclosure, sense marketplace can be hosted on the server 120 or may be hosted on a dedicated cloud server.

[0025] The edge device 110 may be any computing device having communication capabilities. Example edge device 110 may include but not limited to computers, notebook computers, PDAs, laptops, smartphones, etc. That is, the edge device 110 as described herein may be a computing device having limited (low) processing capabilities. In one example, the edge device 110 may be a general purpose computer (point of sale terminal) installed in a store.

[0026] Referring to Figure 1, the video captured by the video source 105 (CCTV Camera) is processed on the edge device 110. In one embodiment of the present disclosure, web browser based video processing pipeline is used for processing the video received from the video source 105. That is, the web browser uses Web Assembly, Web GL and C++ Add-ons and creates virtual machine at runtime for processing the video. In one embodiment of the present disclosure, the web browser running on the edge device 110 identifies region of interest in one or more video frames of the video based on a sense running in the web browser, wherein the sense defines the region of interest to be identified and logic/operations to be performed for processing the video, crops each of the one or more video frames for producing one or more cropped video frames comprising the region of interest, stores the one or more cropped video frames and associated metadata in a memory and generating a message queue for processing the one or more cropped video frames, sequentially process each of the one or more cropped video frames, based on the sense running in the web browser, using convolutional neural network, and communicates the processed video frames and the associated metadata to one of a cloud server or to the web browser for displaying or both. The manner in which the video is processed on the edge device 110 is described in detail further below.

[0027] Figure 2 is a block diagram of the system 200 of the edge device 110 configured for processing the video received from the video source 105 in accordance with an embodiment of the present disclosure. As shown, the system 200 comprises a network/communication interface module 205, a processor 210, a memory module 215, and a video processing unit 220. The video processing unit 220 further comprises a video pre-processor 225, a message queue 230, a neural network processing unit 235 and a web browser 240.

[0028] The processor 210 may be a general purpose processor of the edge device 210 which performs basics operations of the edge device 210 and video processing. The memory 215 may include volatile and non-volatile memory devices for storing information and instructions to be executed by the one or more processors and for storing temporary variables or other intermediate information during processing. Further it is to be noted that the one or more modules of the system may be implemented on the processor 210 and instructions to be executed by the processor 210 are stored in the memory 215.

[0029] The network interface module 205 enables communication between the edge device 110 and the video sources 105, and between the edge device 110 and the cloud server 120. The edge device 110 receives the video stream via the network interface module 205. In one embodiment of the present disclosure, the video pre-processor 225 processes the video received from the video source 105 based on the sense running in the web browser 240, wherein the pre-processing step include identification of region of interest in one or more video frames of the video based. In one embodiment of the present disclosure, the sense defines the region of interest to be identified and logic/operations to be performed for processing the video. The sense may be selected and downloaded from a sense application store. The sense application store comprises a plurality of senses, each sense defining specific ROI to be identified and specific logic/operations to be performed for processing the video. Figure 3 illustrates a sense application store in accordance with an embodiment of the present disclosure. As shown, a plurality of senses are defined for a plurality of applications. For example, end user may subscribe to traffic sense 305 for counting the number of people entering a given area (for example, store). Similarly, attendance sense 310 may be used for managing attendance in an office. Based on the requirement of an end user, the end user may subscribe for a sense and the sense is implemented on the edge device 110 associated with the user. For example, an end user who wishes to count the total number of people entering his shop and the number of people wearing a contact glass may install a sense that defines ROI as human. Similarly, based on the requirement, senses may be downloaded and installed in edge device 110.

[0030] As described, the video pre-processor 225 identifies the ROI in each of the one or more incoming video frame and crops each of the one or more video frames for producing one or more cropped video frames comprising the region of interest. Considering the above example, the video pre-processor 225 identifies humans (as object) entering the shop in the one or more video frames and crops the each of the one or more video frames having humans. Then the video pre-processor 225 stores the one or more cropped video frames and associated metadata in a memory and generating a message queue 230 for processing the one or more cropped video frames. In this present example, the video pre-processor 225 stored the cropped video frames (having humans) in a message queue 230 along with the metadata. The metadata as described herein may include but not limited to video source identifier (camera ID, for example), frame number, location, time, and temporal and spatial information, etc. In one implementation, the temporal and spatial information is cached in the local data store (memory) to make the output on the web browser 140 instantaneous. In one embodiment of the present disclosure, the one or more cropped video frames are stored in RAM of the edge device 110 in compressed base64 encoded format. In another embodiment of the present disclosure, the video pre-processor 225 uses neural network based object/ROI detection processing pipeline mentioned in a cloud based configuration. Then stores the one or more frames in the RAM as described. It is to be noted that the video input class of the video pre-processor 225 is configured for handling more than 20,000 varieties of video sources including video files of all supported formats. In some implementation, the various processes involved in ROI detection may include but not limited to subject or object detection, image detection, face recognition, object tracking, etc.
[0031] The one or more cropped video frames (temporal output) is stored in the message queue 230 in a compressed encoded format until the temporal output is processed finally through the neural network processing unit 235 or through additional cloud resources to pass on the output to the web browser 240.
[0032] In one embodiment of the present disclosure, the neural network processing unit 235 is configured for sequentially processing each of the one or more cropped video frames based on the sense running in the web browser 240, using image processing deep neural networks, such as convolutional neural network (CNN), Recurrent neural network (RNN), artificial neural network (ANN) and the like. As described, the temporal output, the one or more cropped video frames, are stored in the message queue 230. In one embodiment of the present disclosure, the neural network processing unit 235 receives a cropped video frame from the message queue, processes the cropped video frame and requesting a next cropped video frame with the message queue 240, and repeats the steps for processing each of the cropped video frames stored in the message queue 240. Referring back to the example of counting the total number of people entering the shop and the number of people wearing the contact glass, the neural network processing unit 235 fetches each cropped video frame (having a human) from the message queue 240 and processes the frames to count the number of people wearing a contact glass. After processing, the video output and the result of processing, such as total number and the number of people wearing the contact glass are displayed on the web browser 240 running the sense. In one embodiment of the present disclosure, the output of the neural network processing unit 235 is fed to the cloud server 120 for further processing and storage.
[0033] Figure 4 is a flowchart illustrating a dynamic web browser based video processing method for processing the video on an edge 110 device in accordance with an embodiment of the present disclosure. At step 405, the edge device 110 receives video from a video source 105, wherein the video source may include but not limited to CCTV cameras, cameras, web cameras, video databases and the like, through a communication network.
[0034] At step 410, the video pre-processing unit 225 (implemented on web browser) identifies the region of interest in one or more video frames of the video based on a sense running in the web browser, wherein the sense defines the region of interest to be identified and logic/operations to be performed for processing the video. As described in the present disclosure, a plurality of senses are defined for different applications and the end-users may subscribe for the desired sense based on their requirements.
[0035] At step 415, the video pre-processing unit 225 crops each of the one or more video frames for producing one or more cropped video frames comprising the region of interest. At step 420, the one or more cropped video frames and associated metadata are stored in a memory as a message queue for processing the one or more cropped video frames. As described, the metadata associated with each video frame may include but not limited to video source identifier, frame number, location, time and temporal and spatial information.
[0036] At step 425, each of the one or more cropped video frames is sequentially processed based on the sense running in the web browser, using image processing deep neural networks, for example, CNN, RNN, etc. At step 430, the processed video frames and the associated metadata are communicated to one of a cloud server 120 for further processing and storage or to the web browser 240 for displaying or both. The output may be presented to the end users on the web browser 240 by means of output video, graphical representation, chart, or in any known format.
[0037] As described, the browser based video processing on the edge device may handle input video source from more than 20,000 varieties of video sources including video files of all supported formats. This module is run on web technologies.

[0038] The video pre-processor and the message queue are local executable which runs in the background and communicates to the web browser which performs Region of Interest (ROI) cropping and neural network based object detection processing pipeline.

[0039] As described, the video frames are processed locally on the edge device, the device typically having low processing capabilities. However, after pre-processing by the video pre-processor, the cropped video frames and the associated metadata may be communicated to the cloud server 120 for processing and analysis.

[0040] The method and system disclosed in the present disclosure, eliminates the need of (a) frequent updates to the edge vision pipeline which often involving heavy downloads. Further, the method and system enables in situ video processing through the edge device, typically having lower computation capabilities. That is, method and system for video processing disclosed in the present disclosure enables video processing on an already available hardware (edge device having limited computational power) at the location of the video.

[0041] While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended. As would be apparent to a person skilled in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein.

[0042] The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples. Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of embodiments is at least as broad as given by the following claims.
,CLAIMS:
1. A dynamic web browser based video processing method for processing the video on an edge device, the method comprising:
receiving the video from a video source;
identifying region of interest in one or more video frames of the video based on a sense running in the web browser, wherein the sense defines the region of interest to be identified and logic/operations to be performed for processing the video;
cropping each of the one or more video frames for producing one or more cropped video frames comprising the region of interest;
storing the one or more cropped video frames and associated metadata in a memory and generating a message queue for processing the one or more cropped video frames;
sequentially processing each of the one or more cropped video frames, based on the sense running in the web browser, using image processing deep neural networks; and
communicating the processed video frames and the associated metadata to one of a cloud server or to the web browser for displaying or both.

2. The method as claimed in claim 1, wherein the video source is one a database storing the videos, CCTV camera, camera, and web camera.

3. The method as claimed in claim 1, wherein the sense defining the region of interest to be identified and the logic/operations to be performed for processing the video is selected and implemented in the web browser from a plurality of senses, each sense defining specific ROI and specific logic/operations to be performed for processing the video.

4. The method as claimed in claim 1, wherein sequential processing of each of the one or more cropped video frames comprises:
receiving a cropped video frame from the message queue;
processing the cropped video frame and requesting a next cropped video frame with the message queue; and
repeating the steps for processing each of the cropped video frames.
5. The method as claimed in claim 1, wherein the metadata associated with each video frame comprises, video source identifier, frame number, location, time and temporal and spatial information.

6. A system (200) for processing a video on an edge device (110), the system (200) comprising:
a communication interface (205) for receiving the video from a video source (105); and
a processor (210) in communication with a memory (215), wherein the processor (210) is configured for:
identifying region of interest in one or more video frames of the video based on a sense running in the web browser (240), wherein the sense defines the region of interest to be identified and logic/operations to be performed for processing the video;
cropping each of the one or more video frames for producing one or more cropped video frames comprising the region of interest;
storing the one or more cropped video frames and associated metadata in a memory and generating a message queue (230) for processing the one or more cropped video frames;
sequentially processing each of the one or more cropped video frames, based on the sense running in the web browser, using image processing deep neural networks; and
communicating the processed video frames and the associated metadata to one of a cloud server or to the web browser for displaying or both.

Documents

Application Documents

#	Name	Date
1	201931029205-FORM 3 [16-08-2024(online)].pdf	2024-08-16
1	201931029205-STATEMENT OF UNDERTAKING (FORM 3) [19-07-2019(online)].pdf	2019-07-19
2	201931029205-FER.pdf	2024-05-17
2	201931029205-PROVISIONAL SPECIFICATION [19-07-2019(online)].pdf	2019-07-19
3	201931029205-FORM FOR STARTUP [19-07-2019(online)].pdf	2019-07-19
3	201931029205-FORM 18 [16-06-2023(online)].pdf	2023-06-16
4	201931029205-FORM FOR SMALL ENTITY(FORM-28) [19-07-2019(online)].pdf	2019-07-19
4	201931029205-FORM 3 [20-05-2021(online)].pdf	2021-05-20
5	201931029205-FORM 3 [26-11-2020(online)].pdf	2020-11-26
5	201931029205-FORM 1 [19-07-2019(online)].pdf	2019-07-19
6	201931029205-Request Letter-Correspondence [03-08-2020(online)].pdf	2020-08-03
6	201931029205-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [19-07-2019(online)].pdf	2019-07-19
7	201931029205-EVIDENCE FOR REGISTRATION UNDER SSI [19-07-2019(online)].pdf	2019-07-19
7	201931029205-COMPLETE SPECIFICATION [20-07-2020(online)].pdf	2020-07-20
8	201931029205-DRAWINGS [19-07-2019(online)].pdf	2019-07-19
8	201931029205-CORRESPONDENCE-OTHERS [20-07-2020(online)].pdf	2020-07-20
9	201931029205-DECLARATION OF INVENTORSHIP (FORM 5) [19-07-2019(online)].pdf	2019-07-19
9	201931029205-DRAWING [20-07-2020(online)].pdf	2020-07-20
10	201931029205-FORM-26 [12-09-2019(online)].pdf	2019-09-12
10	201931029205-Proof of Right (MANDATORY) [12-09-2019(online)].pdf	2019-09-12
11	201931029205-FORM-26 [12-09-2019(online)].pdf	2019-09-12
11	201931029205-Proof of Right (MANDATORY) [12-09-2019(online)].pdf	2019-09-12
12	201931029205-DECLARATION OF INVENTORSHIP (FORM 5) [19-07-2019(online)].pdf	2019-07-19
12	201931029205-DRAWING [20-07-2020(online)].pdf	2020-07-20
13	201931029205-CORRESPONDENCE-OTHERS [20-07-2020(online)].pdf	2020-07-20
13	201931029205-DRAWINGS [19-07-2019(online)].pdf	2019-07-19
14	201931029205-COMPLETE SPECIFICATION [20-07-2020(online)].pdf	2020-07-20
14	201931029205-EVIDENCE FOR REGISTRATION UNDER SSI [19-07-2019(online)].pdf	2019-07-19
15	201931029205-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [19-07-2019(online)].pdf	2019-07-19
15	201931029205-Request Letter-Correspondence [03-08-2020(online)].pdf	2020-08-03
16	201931029205-FORM 1 [19-07-2019(online)].pdf	2019-07-19
16	201931029205-FORM 3 [26-11-2020(online)].pdf	2020-11-26
17	201931029205-FORM 3 [20-05-2021(online)].pdf	2021-05-20
17	201931029205-FORM FOR SMALL ENTITY(FORM-28) [19-07-2019(online)].pdf	2019-07-19
18	201931029205-FORM FOR STARTUP [19-07-2019(online)].pdf	2019-07-19
18	201931029205-FORM 18 [16-06-2023(online)].pdf	2023-06-16
19	201931029205-PROVISIONAL SPECIFICATION [19-07-2019(online)].pdf	2019-07-19
19	201931029205-FER.pdf	2024-05-17
20	201931029205-STATEMENT OF UNDERTAKING (FORM 3) [19-07-2019(online)].pdf	2019-07-19
20	201931029205-FORM 3 [16-08-2024(online)].pdf	2024-08-16

Search Strategy

1	201931029205E_06-02-2024.pdf