Abstract: ABSTRACT Title: Method and System of Health and Fatigue supervised Watching of Value Infused Videos A method and system to interactively supervise general health and fatigue of viewers watching any video fused with un-interrupting value added contents of national, social, non-commercial and commercial genre, the system comprising a companion device (200) having, a microphone (207), an imaging sensor, a data processor, an image signal processor (209), a streaming engine (208), a plurality of cognitive algorithms for a loudness engine, a gesture engine, an overlay engine, a media streaming, keyword detection and streaming, saliency and pose estimation unit, driving a deep learning based computer program residing in a processor in the companion device (200), the system continuously monitoring the fatigue parameters of viewer while playing value infused videos, the system comprising a sub-system for producing the value infused videos based on contextual relevance with respect to primary contents and explicit preference of viewer, including a companion input device (201). The system facilitates emotions and interest based watching. Figure 1.
Description:
Form 2
The Patent Act 1970
(39 of 1970)
&
The Patent Rules 2003
Complete Specification
(See section 10 and rule 13)
Title of the Invention:
Method and System of Health and Fatigue supervised Watching of Value Infused Videos
Applicant : ADMOTT LLP
Nationality: Indian
Address: SAHU House
3rd Floor, Flat No. 301
B-128, Sector 20, Belapur,
Navi Mumbai, Thane -400614
Maharashtra,
INDIA
The following specification particularly describes the invention and the manner in which it is to be performed:
FIELD OF INVENTION
The present invention relates to videos with valuable content fusion and a method thereof, and healthy watching of such videos.
BACKGROUND OF INVENTION
Currently, watching videos with advertisements is a matter of subjective irritation and objective benefits. Here is how -
The prevalent method of showing advertisements on OTT or any video streaming platforms is before the content, middle of the content, end of the content as pre-roll ads, midroll ads, or Post-roll ads. Advertisements are either 5 sec skippable or 10 sec non-skippable. There are longer advertisement chains too. These advertisement formats are not desired by users for several reasons. One of them is unwelcome break in their continuity of what they are watching. It is a kind of unhealthy mental speed breakers without zebra marking! Viewers invariably wish to skip the ads and go to a viewing of the content they have selected. Also, ad insertion by pausing the content is intrusive to the user experience. Users have limited or no brand recall with the plethora of advertisements shown to them. This results in low viewability to advertiser’s ads and low return on investments for their ad spend. Conclusively, low yield for the OTT service provider inventory.
Importantly, pertinent social and public interest non-commercial inserts also get same unwelcome treatment and get missed, thereby viewers depriving themselves of the purpose and benefit. Example – newly launched government schemes of public and particularly children’s benefit fail to reach masses with expected reach and effect.
On the other hand, longer breaks facilitate viewers blinking more, move neck or sip a drink, which retards fatigue and is beneficial to individual health.
There is a clear need to integrate advertisement and non-commercial information fusion and personal health and fatigue of the viewer with viewing habits of persons so as to evolve a newer way of healthy, beneficial and profitable watching.
OBJECTIVE OF THE INVENTION
It is the objective to invent a method and system for healthy watching of videos.
It is another objective to invent a system that proactively and dynamically monitors a first level well-being of a viewer.
It is another objective to invent a system that integrates social and public interest contents with seamless and leisure watching.
It is another objective to invent a system that integrates product and services need fulfillment with leisure watching.
It is yet another objective to invent a system that integrates product and services of contextual relevance with leisure watching.
It is yet another objective to invent a system that integrates viewers opinion and convergence with leisure watching.
It is yet another objective to invent a system that integrates product and services of geographical contextual relevance with leisure watching.
It is yet another objective to invent a system that dynamically switches and integrates different product and services in an identified slot with leisure watching.
SUMMARY OF INVENTION
The present invention is a method and system to interactively supervise general health and fatigue of viewers watching any video, which is intelligently fused with value added contents of social and non-commercial interest, besides commercial advertisements. The system becomes a kind of companion of the viewer(s) without compromising privacy and peace of the viewer(s).
As a responsible companion, the inventive method can be easily understood as a set of rules which a responsible companion would adhere to provide healthy entertainment service, like watching a TV.
The method as per present invention grossly comprises:
1. Ascertaining or receiving metadata of viewer(s) including age, gender and interests of a viewer. The method also applies to a plurality of viewers with similar physical and mental parameters.
2. Ascertaining general health and fatigue parameters of the viewer(s) including body temperature, facial freshness and happiness, eye blinking frequency and energy level.
3. Generating a greeting or cautionary audio or video message based on an initial health and fatigue assessment.
4. Playing of an age and interest appropriate, value infused video by or for the viewer(s).
5. Continuously monitoring the fatigue parameters and delivering cautionary message, after permission.
6. Understanding sign language and audio feedback communication of the viewer(s) and act accordingly.
7. Raising advisory/suitable alarm in case of a situation when viewer (s) need help including falling asleep unsafely.
The value infused video is inventively developed by fusing-in social, non-commercial and commercial advertisement frames in apposition to video contents, as an important disclosure and claim of the present invention, subsequently described here below.
The system as per present invention cognitively executes above method by deploying a companion device driven by a deep learning based computer program. The companion device is termed so and functions as a substitute of an attendant with normal human acumen. A human companion is generally expected to keep a cautious and casual watch and intervene only when sought or for safety. The companion device replicates certain cognitive functions limited to such boundary conditions.
The companion device has a defined sector of view, and viewers present in the defined sector are beneficiaries of the present system.
An imaging sensor like a camera of the companion device detects one or more humans in the defined sector of view and correspondingly predicts their respective age and gender, after obtaining due consent.
The companion device predicts and estimates viewer’s:
• generic health and fatigue
• pose and gaze
• attentiveness localized to a scene presented on a connected viewing screen including a television
• attention duration at a localized scene of interest
• absence verses unconsciousness/fall or seizure
• emotions
• high noise exposure
• blinking
The companion device estimates, through its image sensor and a health algorithm - an age, gender, body temperature and a plurality of life parameters of the viewer. The age is grossly categorized in one of the prescribed segments – minor, adult, aged, old, with any safer error with respect to video viewing. Illustratively, if the age estimate is in-between 15 years and 25 years which is a borderline between minor and adult, then the assumption would be safely “minor”. There is a provision to obtain a confirmation from the viewer.
The companion device predicts, from the attentiveness location and duration, that viewer is interested in brand of a car in the localized scene of interest. The companion device by its microphone data streamed, till a threshold, to its processor runs a cough classifier and anomaly detection algorithm and runs a heuristics and displays a health condition. The health condition includes tiring of eyes to cough & cold. to a person falling.
An image sensor including and infra-red image sensor disposed on the companion device traces movement of a viewer and differentiates between a walk out and a fall and correspondingly raises alarm. A microphone data processed in conjunction with image sensor data processes an intensity of fall, as an empirical equation
Intensity of fall = f (age of person, sound of fall, view of fall), deployed as a narrow AI algorithm.
If a viewer watches the value fused video while exercising on a gym bicycle or treadmill embedded with sensors picking rider’s body parameters like pulse and heart rate, then the companion device of present system is connectible to such sensors by radio or wireless connection or by wired connections or both.
The companion device of the system which is a deep learning based computer program driven estimates an absolute as well as an adverse change in health and fatigue parameters during a viewing time, not limited to above listed ones.
The health and fatigue detection is limited currently only by sensors/data and not by the inventive health algorithm. The term “heuristics” implies computing by applying cognitive algorithms of narrow AI, to indicate a degree of simplification applied in the health algorithm commensurate with health data remotely obtainable currently and to be able to quickly reach reasonable health conclusion. The companion device does NOT replace an expert medical advice or medical assistance which a person may need in the course of watching the value infused videos.
The viewer (s) communicate via established sign languages; or by making gestures of their own or by audio commands. The communication is also implicit in the form of facial expressions including grins. The communication is voluntary as well as involuntary when the viewer (s) is in distress, consequent to a fall or falling unconscious. The communication is also in the form of signs of pain or confusion. The companion device receives such feedbacks from the viewer (s) through image sensors and a processor in the companion device facilitates heuristics or feedback algorithm processing for decision making including a cognitive peace time decision making as well as a cognitive panic time decision making.
The decision making accuracy of the deep learning heuristics improves with constant learning and training data updating of a frequent viewer. Interactive analytics widens multiple applications such as improvement of feedback on an a social, non-commercial or a commercial information or advertisement served, heuristics for determination of content engagement of videos.
A sub-system to produce a value infused video is now described. The essence of “value” stems from the backdrop that any video comprises a main content in the form of a theme or a story presented in a particular manner and packaged with thrill, actions, suspense and other creative arts, based on which a viewer selects such video to watch, however, the user necessarily gets unwanted and unscheduled interruptions in the form of social, non-commercial and commercial messages and advertisements, termed here as secondary contents. “Value” of these secondary contents cannot be underestimated/ignored, though secondary contents are NOT the basis for a viewer to invest his/her time and money and select a particular video! The present invention, therefore, re-orchestrates such secondary contents in a manner that they do NOT take away additional time! Importantly, the secondary contents are NOT consciously skippable when amalgamated with the main content. On the other hand, Figure 10, the companion device (200), by streaming camera data of a threshold number of frames on pose, walk and blink of the viewer runs a heuristics to generate a saliency map of the viewer giving the region where the viewer focuses more! The saliency map is based on the pixels of importance to a human visual system. In simple words, the present invention predicts interest of a viewer, besides health and fatigue.
As per the present invention, the sub-system finds opportunities to insert secondary contents, here a soft drink bottle, within a primary contents, here a table activity amongst three humans in a kitchen ambience, in a manner that the primary contents are not tampered nor compromised and the secondary contents get their due attention in a smooth and more natural manner than current ways. Particularly, the present invention effectively uses secondary contents in the form of least expensive gif, JPEG image format and converts them into 2D/ 3D image to make the secondary contents, particularly, objects and products look real and rich, and seamlessly integrate with the look and feel of the actual streaming content.
The system essentially comprises following functions and actions and processed by a deep learning based algorithm driven by a computer implemented program around a video under consideration:
• Input data/video streaming
• Object detection by YOLO with open CV
• Blank zone/Flat surface
• Fused acknowledgement
Contextual Relevancy is inseparably associated with corresponding contents. Thus, while a primary objects may be humans – male/female to a large extent, associated objects/contents including but not limited to material contents like food, dresses; and non-material contents like celebrative, romantic, sad, set a context.
Frames with space are identified, followed by identifying objects therein. Negative list of unacceptable objects and activities is filtered out while human activity recognition is carried out for acceptable objects and contents, like romance sports, applying cosmetics, cooking, laughing, celebrating. Parsing is carried out between human activity recognition output and secondary contents including advertisement category class and sub- category including brands. The secondary contents are fused in the raw video to produce the value infused video.
Identified frames and modified frames are stored and shared with owners of original main content owner and OTT service provider. Through object detection technique, here YOLO5, a train dataset is generated using a deep learning model, involving steps of identification and labelling. Objects detected are assigned a rank based on training data.
Space detection has a large number of prescribed weighted parameters as per the advertiser and or OTT owner, illustratively:
• Size of zone - 20%,
• distance of placement zone from other objects in the frame (at least 25% to 50%) - 10%,
• clarity of surface area- 10%,
• Viewability minimum 50%- 20%,
• clarity of zone- 10%,
• white spaces- 10%,
• clutter free- 10%,
• Ease of fusing creative 10%
In the current situation wherein the social, non-commercial and commercial advertisements are played with random breaks, there is no relation nor contextual similarity whatsoever ! So, one may be watching a sad scene in a video and an advertisement of a celebration may appear from nowhere; though, of late, OTT platforms use watch history to derive interest and recommend videos to watch! In the present invention, contextual relevance is an important infusing parameter achieved through artificial intelligence. Along with objects and space, a contextual algorithm predicts activity in the video. Hence, an expensive watch is an apt fusion of secondary content in a frame of a main content having rich ladies with western attire.
About 4,00,000 situational frames data, resulting into 400 actions comprise training data of context in the present invention, which grows consequent to the deep learning algorithm, when encountering newer situations. Apropos, activity identified being YOGA, a social message is aptly fused in the interest of health of viewers. The activity identified being a sports of archery, a non-commercial message of promoting archery by way of accomplished national players is a pertinent amalgamation.
Context relevance is generated by way of an cognitive algorithm trained to analyze images in the frames/views, expressions, actions therein and speech associated with a plurality of frames. Illustratively, AffectNet® is a known large facial expression dataset with close to a million images manually labeled for the presence of eight (neutral, happy, angry, sad, fear, surprise, disgust, contempt) facial expressions along with the intensity of valence and arousal, which the cognitive algorithm as per the present invention compares with images of viewers captured by image sensors disposed in the companion device.
Contextual relevance includes the social status and preference of the humans identified along with their dress sense, dress and accessories worn, along with surrounding objects. Figure WW, a soft drink product is an apt fusion on a breakfast table with usable space.
Contextual relevance includes a negative category of crime, harassment, drugging, torture and self-harm et cetera and the cognitive algorithm is trained to handle such spaces as particularly prescribed. Hence the present invention is sensitive NOT to show a gun or dagger in order NOT to promote violence thoughtlessly.
The secondary contents are alloyed with the primary contents for a controlled time commensurate with the context. Thus a romantic scene shall have a longer fusion than an action scene.
Modification of secondary contents to make them suitable for amalgamation/fusion is the work of Art + Engineering. Secondary contents are made fusible by resizing, red eye reduction, shadow, rotating, Opacity, adding blurring shadows and reflection, altering Illumination, grain size to perfectly match the look and feel of the main content frames. This can be achieved by matching pixel by pixel of each frame. A new frame with the modified secondary content replaces the original frame in the identified placement zone such that fusion does NOT leave any stitching marks.
Generative AI technology will help reduce time and costs to generate virtually real secondary content infused in the video content. Certain laborious process of image generation is automated by providing data sets that helps the model learn and generate the relevant output. This substantially reduces time and costs to manually process the secondary content for fusion into the video content.
User gestures related to watching the value fused videos are captured by the companion device for continuing or changing secondary contents as well as primary contents. Illustratively, a waving gesture is interpreted as a sign of closing a content. Thumbs up can be interpreted as approval of the secondary content including commercial secondary contents like a particular brand. Victory sign is likewise interpreted as approval to interact with the brands in the virtual world and gamification. The companion device is configurable to subsequently continue interaction with a secondary content provider as an Augmented reality (AR) virtual interaction with the commercial brands.
The companion device is a custom-built hardware comprising a contemporary processor supported by a microphone, a streaming device and an image signal processor as a base Service. The platform service is provided by a plurality of algorithms comprising a loudness engine, a gesture engine, an overlay engine, Media streaming, keyword detection and streaming, Saliency and pose estimation unit. An application layer comprises computer program for at least a context aware value delivery, user feedback, seizure and fall detection, sign language, interaction unit and eye blink.
A companion input device having at least a plurality of companion buttons and a distress button communicates with the companion device, as an option.
As a preferred embodiment, is an AWS Cloud architecture on which the present invention is integrated, to produce a value infused video , involving an intense video contents, data, and management, between
? Video owner / OTT service provider
? Secondary content owner
? ADMOTT
keeping abreast of viewers’ preferences data/information, with due copyright management precautions and data/content security. ADMOTT is a brand of the infused value video development agency, responsible for a high quality and ethical artificial intelligence based deep machine learning tool of such development, with inventive contextual relevancy and interactive viewing based on gaze, expressions and emotions, backed by companion device providing a generic health and fatigue monitoring. ADMOTT duly protects its output of the value infused video, which is susceptible to manipulation, by contemporary data and content protection tools, including but not limited to cryptographic signature with their cryptographic private key.
The present invention therefore facilitates a safe and companioned/supported watching of a value enhanced videos coupled with an enhancement of viewer’s awareness of social and national initiatives much more effectively and in a more receptive and non-intrusive manner without forcing any secondary contents as a necessary evil of current practices of unavoidable manner leading to putting in extra time.
The present invention gives new monetizable inventory to the OTT service provider at a scale. Secondary contents, particularly commercial advertisements are placed intelligently using deep learning, in non-intrusive formats within the content. Advertisements fused within the main content are modified to give look and feel to the advertisers' products that appear to be real.
The present invention is expected to increase viewability by a magnitude of scales. This will allow OTT owners to charge a premium to their secondary content providers and thereby higher revenue per view yield for their inventory.
Secondary contents can be geo-targeted in multiple cities, regions, geographies thereby increasing monetizable opportunities to the OTT service provider. AUTOMATED buy/sell (programmatic) of custom in-content secondary contents will further increase the scale across the geographies globally.
Scalability Example- Indian Premier League (IPL) cricket match is streamed in multiple countries like India, UK, US, Australia etc.… Secondary contents can be localized at scale. Viewers in India/rural India, in the identified slots predicted by algorithm, may see a pension scheme of a Nationalized bank or an immunization drive of state government, while UK may see HSBC Bank scheme, and or Australia may see a flight safety related initiative of Qantas Airline.
The present invention further leads to a time and cost saving solution, as the space once identified in a raw video is useable for different secondary contents, sequentially or at different points of time. Example - if a center table is detected as a Rank 1 space for secondary content fusion, on the same table the inventive algorithm dynamically switches to different secondary contents like a free medicine drive of health ministry, a Rolex watch, a cricket update, a weather information. Such dynamic switching leads to continued interest in watching.
The present invention deploys contemporary AI tools with inventive algorithms to achieve the objectives. Computer Vision method, YOLO, Coevolutionary Neural Network, Natural language Processing is deployed for deriving meaningful inferences like object detection, event detection context, emotions, sentiments, expressions and feelings from digital images associated with audio and video. With TF-IDF words are given weight-measure relevance. Edge grey scale & gradient detection algorithm are deployed for video speech detection algorithm for audio of the video, identifying the object in each frame, create categories, listing negative and positive, analyze Expressions like sad, happy, etc., and classification of objects like the cat, dog, person, etc. The scope of the present invention is unlimited and grows with higher and higher algorithm processing capabilities and tools.
Server-side ad insertion (SSAI), that is, dynamic [secondary] content insertion enables the seamless delivery of fused secondary contents into main content streamed giving seamless viewers experience. This type of real -time insertion play crucial role to stitch secondary contents in the live streaming content such as live cricket match, live soccer match, live award shows like Grammy award, Filmfare award, Oscar award etc.
Secondary contents can be dynamically changed in the same spot and can be optimized. Example - in traditional method of placing perimeter secondary contents and advertisements in a cricket stadium ground or a football ground has fixed spots in the boundary or ground and is expensive to buy these spots because of popularity and finite space available. With the present invention of infusion, and which is virtual, these spots can be remonetized for secondary contents dynamically , It can be bought at the fraction of cost and on real time basis, bringing cost efficiency, time saving and scalable to target localized ads across geographies, cities and regions.
The present invention does object detection with respect to viewers as well as with respect to primary contents, in the same manner or in a different manner. While the object detection in the main video is algorithm based, the object detection wrt viewers is based on sensors or algorithm or a combination thereof.
DESCRIPTION OF DRAWINGS
Figure 1 is a flow diagram of the method and system as per the present invention.
Figure 2 is a representative view of a companion device.
Figure 3 is a representative sector of view coverable by the companion device
Figure 4 is a representative view of a viewer with estimated personal parameters.
Figure 5 is a representation of gaze estimation of a viewer.
Figure 6 is a logic diagram of health estimation of the viewer.
Figure 7 is a side representative view of the viewer, distinguishing coughing and sneezing.
Figure 8A is a representative view of a viewer exiting verses falling and a corresponding heat map. Figure 8B is a gradient diagram of movement of a heatmap corresponding to the viewer.
Figure 9 is a front view of signs and gestures of a viewer.
Figure 10 is a logic diagram of fatigue estimation of the viewer.
Figure 11 are screen shots of a primary content and a secondary content infused therein correspondingly.
Figure 12A-12D are clustered diagrams of actions of a sub-system.
Figures 13 is a screen shot with object detection parameters.
Figure 14 is another screen shot with object detection parameters.
Figure 15A-15E is a flow diagram of steps of the sub-system of producing a value infused video.
Figure 16A-16F is a flow diagram of steps of space detection parameters.
Figure 17 are screen shots of a primary content and a contextually relevant secondary content infused therein correspondingly.
Figure 18 is a screen shot of a frame with action identified while Figure 19 is a corresponding screen shot with a contextual social message infused.
Figure 20 is a screen shot of a frame with action identified while Figure 21 is a corresponding screen shot with a contextual non-commercial picture/message infused.
Figure 22 is a hardware and software architecture of the companion device.
Figure 23A-23C is an AWS Cloud architecture on which the sub-system as per the present invention is integrated.
Figure 24 is a front representative view of a companion input device.
Figure 25 is a flow diagram of continuous improvement of deep learning algorithms.
DETAILED DESCRIPTION OF INVENTION
The present invention shall now be described with the help of drawings. It is to be expressly noted that the present invention is of wide scope and grows with availability of AI tools and platforms, structured and unstructured databases, imaging sensors, biomedical sensors and transducers. The description, therefore, should not be construed to limit the invention in any way whatsoever.
The present invention is a method and system to interactively supervise general health and fatigue of viewers watching any video, which is intelligently fused with value added contents of social and non-commercial interest, besides commercial advertisements. The system becomes a kind of companion of the viewer(s) without compromising privacy and peace of the viewer(s).
As a responsible companion, the inventive method can be easily understood as a set of rules which a responsible companion would adhere to provide healthy entertainment service, like watching a TV.
Figure 1, the method as per present invention grossly comprises:
1. Ascertaining or receiving metadata (150b) of viewer(s) including age, gender and interests of a viewer (101). The method also applies to a plurality of viewers with similar physical and mental parameters.
2. Ascertaining general health and fatigue parameters of the viewer(s) including body temperature, facial freshness and happiness, eye blinking frequency and energy level (102).
3. Generating a greeting or cautionary audio or video message based on an initial health and fatigue assessment (103).
4. Playing of an age and interest appropriate, value infused video (110) by or for the viewer(s) (104).
5. Continuously monitoring the fatigue parameters and delivering cautionary message, after permission (105).
6. Understanding sign language (150a) and audio feedback communication of the viewer(s) and act accordingly (106).
7. Raising advisory/suitable alarm in case of a situation when viewer (s) need help including falling asleep unsafely (107).
A value infused video (110) is inventively developed by fusing-in social, non-commercial and commercial advertisement frames in apposition to video contents, as an important disclosure and claim of the present invention, subsequently described here below.
Figure 2, the system as per present invention cognitively executes above method by deploying a companion device (200) driven by a deep learning based computer program. The companion device (200), is termed so and functions as a substitute of an attendant with normal human acumen. A human companion is generally expected to keep a cautious and casual watch and intervene only when sought or for safety. The companion device (200) replicates certain cognitive functions limited to such boundary conditions.
Figure 3, the companion device (200) has a defined sector of view (201), and viewers (108) present in the defined sector are beneficiaries of the present system.
An imaging sensor like a camera of the companion device (200) detects one or more humans in the defined sector of view (201) and correspondingly predicts their respective age and gender, after obtaining due consent.
The companion device (200) observes, predicts and estimates viewer’s:
• generic health and fatigue
• pose and gaze
• attentiveness localized to a scene presented on a connected viewing screen including a television
• attention duration at a localized scene of interest
• absence verses unconsciousness/fall or seizure
• emotions
• high noise exposure
• blinking
Pose includes change in pose resulting in estimation of cumulating tiredness. Likewise blinking includes change in blinking rate, and closing of eyes.
It is to be appreciated that above estimation is entirely by a plurality of non-invasive and non-touch means namely still imaging, sound and movement.
Figure 4, the companion device (200) estimates, through its image sensor and health algorithm - an age, gender, body temperature and a plurality of life parameters of the viewer (150). The age is grossly categorized in one of the prescribed segments – minor, adult, aged, old, with any safer error with respect to video viewing. Illustratively, if the age estimate is in-between 15 years and 25years which is a borderline between minor and adult, then the assumption would be safely “minor”. There is a provision to obtain a confirmation from the viewer (150).
Figure 5, the companion device (200) predicts, from the attentiveness location and duration, that viewer (150) is interested in brand of a car in the localized scene of interest. Figure 6, the companion device (200) by its microphone data streamed (220), till a threshold, to its processor runs a cough classifier (221) and anomaly detection algorithm and runs a heuristics (222) and displays a health condition. The health condition includes tiring of eyes to cough & cold to a person falling. Figure 7 is illustrative of viewer (150) sneezing and or coughing, the microphone (207) thereby picking corresponding audio data.
Health parameters are estimated also by expressions. AffectNet® is a known large facial expression dataset with close to a million images manually labeled for the presence of facial expressions along with the intensity of valence and arousal, which the cognitive algorithm as per the present invention compares with images of viewers captured by image sensors disposed in the companion device (200).
With a combination of expression and microphone data streamed (220), the health cognitive algorithm provides a reasonable health estimate expected of the companion device (200).
Figure 8A-8B, an image sensor including an infra-red image sensor disposed on the companion device (200) traces movement of a viewer (150) through a heatmap (151) obtained through the infra-red image sensor and differentiates between a walk away (152) and a fall (153a, 153b) and correspondingly raises alarm. Such differentiation is done by a gradient or angle of movement of the heatmap (151), as a preferred embodiment. Thus a walk away (152) is estimated by a rising gradient implying standing first and then walking. This is a healthy and normal situation. A first kind of fall (153a) is estimated by an initially rising gradient and then drooping sharply, implying trying to stand first but falling. A second kind of fall (153b) is estimated by a falling gradient, implying falling while sitting. A viewer (150) dozing off while sitting which may be unsafe, causing a likely fall is preventable by this inventive estimation. A microphone data processed in conjunction with image sensor data processes an intensity of fall and the companion device (200) acts accordingly.
Intensity of fall = f (age of person, sound of fall, view of fall, gradient of fall), deployed as a narrow AI algorithm.
If a viewer (150) watches the value fused video while exercising on a gym bicycle or treadmill embedded with sensors picking rider’s body parameters like pulse and heart rate, then the companion device (200) of present system is connectible to such sensors by radio or wireless connection or by wired connections or both.
The companion device (200) of the system which is a deep learning based computer program driven estimates an absolute as well as an adverse change in health and fatigue parameters during a viewing time, not limited to above listed ones.
The health and fatigue detection is limited currently only by sensors/data and not by the inventive health algorithm. The term “heuristics” implies computing by applying cognitive algorithms of narrow AI, to indicate a degree of simplification applied in the health algorithm commensurate with health data remotely obtainable currently and to be able to quickly reach reasonable health conclusion. The companion device (200) does NOT replace an expert medical advice or medical assistance which a person may need in the course of watching the value infused videos.
Figure 9, the viewer (s) (150) communicate via established sign languages (150a); or by making gestures of their own or by audio commands. The communication is also implicit in the form of facial expressions including grins. The communication is voluntary as well as involuntary when the viewer (s) (150) is in distress, Figure 8a and 8b, consequent to a fall or falling unconscious. The communication is also in the form of signs of pain or confusion. The companion device (200) receives such feedbacks from the viewer (s) (150) through image sensors and a processor in the companion device (200) facilitates heuristics or feedback algorithm processing for decision making including a cognitive peace time decision making as well as a cognitive panic time decision making.
The decision making accuracy of the deep learning heuristics improves with constant learning and training data updating of a frequent viewer (150). Interactive analytics widens multiple applications such as improvement of feedback on a social, non-commercial or a commercial information or advertisement served, heuristics for determination of content engagement of videos.
A sub-system to produce a value infused video (110) is now described. The essence of “value” stems from the backdrop that any video comprises a main content in the form of a theme or a story presented in a particular manner and packaged with thrill, actions, suspense and other creative arts, based on which a viewer (150) selects such video to watch, however, the user necessarily gets unwanted and unscheduled interruptions in the form of social, non-commercial and commercial messages and advertisements, termed here as secondary contents. “Value” of these secondary contents cannot be underestimated/ignored, though secondary contents are NOT the basis for a viewer (150) to invest his/her time and money and select a particular video! The present invention, therefore, re-orchestrates such secondary contents in a manner that they do NOT take away additional time! Importantly, the secondary contents are NOT consciously skippable when amalgamated with the main content. On the other hand, Figure 10, the companion device (200), by streaming camera data (230) of a threshold number of frames on pose, walk and blink of the viewer (150) runs a heuristics (232) to generate a saliency map of the viewer giving the region where the viewer focuses more! The saliency map is based on the pixels of importance to a human visual system. In simple words, the present invention predicts interest of a viewer, besides health and fatigue.
Figure 11, as per the present invention, the sub-system finds opportunities to insert secondary contents (90), here a soft drink bottle, within a primary contents (80), here a table activity amongst three humans in a kitchen ambience, in a manner that the primary contents (80) are not tampered nor compromised and the secondary contents (90) get their due attention in a smooth and more natural manner than current ways. Particularly, the present invention effectively uses secondary contents (90) in the form of least expensive gif, JPEG image format and converts them into 2D/ 3D image to make the secondary contents, particularly, objects and products look real and rich, and seamlessly integrate with the look and feel of the actual streaming content.
The system essentially comprises following functions and actions, aptly condensed in Figure 12A-12D, and processed by a deep learning based algorithm driven by a computer implemented program around a video under consideration:
• Input data/video streaming (211)
• Object detection by YOLO with open CV (212)
• Blank zone/Flat surface (213)
• Fused acknowledgement (214)
Figure 12A, input data (211) including videos of different types and their attributes including media type (211a), title (211b), keywords (211c), duration (211d), date (211e), video-ID (211f), video type (211g) , description (211h) are represented as vectors (216) commensurate with machine learning approach and assigned a class it belongs to, for further processing by different algorithms.
Figure 12B, gives an overview of a large number of intense data management activities through artificial intelligence using state of the art AI tools including YOLO5. YOLO is a working abbreviation for a contemporary deep learning based tool, for you only look once, particularly with extremely large training based analysis of a huge image data. The video or clip is analyzed frame by frame to identify objects (212) in the frames including steps of :
• Dividing image into grids (212a)
• Predicting a class label for each grid (212b)
• Applying “Intersection over union” for predicting box boundary (212c)
• Considering the CNN and pool layer (212d)
• Producing output for each image (212e) in the form of an object (212f)
Next, Figure 12C, the video or the clip is analyzed frame by frame to identify blank zones (213) of prescribed area in the form of bounding box (213a), in a prescribed number of contiguous frames as clustering (213b). Blank Zone detection (213) includes assessment of flat surface, position (213c) and duration (213d); and conforming blank zones are labelled (213e).
Figure 12D, the object data and the blank zones data re-processing (214a) leads to a first level selection of secondary contents which could be seamlessly amalgamated or alloyed or fused. An essence of re-processing is matching of attributes of object data and blank zones data to ensure fusibility without causing a noticeable “stitching” and a machine learning algorithm (214b) with continuous improvement potential is deployed for decision labelling (214c) of each “stitch”.
Illustrative Figure 13, objects (240) are identified as persons (240a), glasses (240c) and hat (240b) in a bounding box (213a). Clearly, there are many objects (240) cluttering the frame, even if there could be contiguous blank zones! Importantly, the identification is provided with an accuracy level (241). Apparently, there is ambiguity between “Hat” and “Head” reflected by a low numeric of the accuracy level. Figure 14, objects (240), which are an SUV (240d) and a skateboard (240e) respectively, with a bounding box dimension (242) termed as a “BBox” dimension, are clear and uncluttered. The BBox dimensions (242) imply X-Y coordinates of a diagonal of a rectangle defining the corresponding bounding box (213a) around a specific object (240) identified. Important to note a rank (243) allotted to each object, which is one of the several labels/attributes of each identified object.
Contextual Relevancy is inseparably associated with corresponding contents. Thus, while a primary objects may be humans – male/female to a large extent, associated objects/contents including but not limited to material contents like food, dresses; and non-material contents like celebrative, romantic, sad, set a context.
Figure 15A, frames with space ( 251) are identified, followed by identifying objects (240) therein. Negative list of unacceptable objects and activities (252) is filtered out; while, Figure 15B/15C, human activity recognition (253) is carried out for acceptable objects and contents, like romance sports (253a), applying cosmetics (253b), cooking (253c), laughing, celebrating (253d). Figure 15D, parsing is carried out between human activity recognition output and secondary contents (90) including advertisement category class and sub- category including brands, (254). Figure 15E, the secondary contents (90) are fused (255) in the raw video to produce the value infused video (110).
Figure 16A-16D describe space detection with reference to modified frames with infused value content.
Figure 16A/16B, identified frames and modified frames are stored (261) and shared with owners of original main content owner and OTT service provider (262). Through object detection technique, here YOLO 5, a train dataset is generated using a deep learning model (267), involving steps of identification and labelling (268). Figure 16C, objects detected are assigned a rank (243) based on training data (263), illustratively:
Building - 1, Table surface - 2 , Billboard - 3, Car Bonnet - 4,
Buses side - 5. other flat surfaces - 6, Desk - 7, Pillars - 8,
Rectangle Flat surfaces - 9 others – 10
Figure 16D, space detection has a large number of prescribed weighted parameters (264) as per the advertiser and or OTT owner, illustratively:
• Size of zone - 20%,
• distance of placement zone from other objects in the frame (at least 25% to 50%) - 10%,
• clarity of surface area- 10%,
• Viewability minimum 50%- 20%,
• clarity of zone- 10%,
• white spaces- 10%,
• clutter free- 10%,
• Ease of fusing creative 10%
Figure 16E, place holder of more than a threshold time, say 15 seconds (265) and or more than 25% (266) is marked/sent for secondary contents fusion. Figure 16F.
In the current situation wherein the social, non-commercial and commercial advertisements are played with random breaks, there is no relation nor contextual similarity whatsoever ! So, one may be watching a sad scene in a video and an advertisement of a celebration may appear from nowhere; though, of late, OTT platforms use watch history to derive interest and recommend videos to watch! In the present invention, contextual relevance is an important infusing parameter achieved through artificial intelligence. Along with objects and space, a contextual algorithm predicts activity in the video. Hence, Figure 17, an expensive watch is an apt fusion of secondary content (90) in a frame of a main content (80) having rich ladies with western attire. This illustration is an outcome of the above inventive steps of object, space and context orchestration, wherein above listed criterion of suitability can be objectively verified.
About 4,00,000 situational frames data, resulting into 400 actions comprise training data of context in the present invention, which grows consequent to the deep learning algorithm, when encountering newer situations. Apropos, Figure 18, activity (81) identified being YOGA, a social message (82) is aptly fused in the interest of health of viewers, Figure 19. Figure 20, the activity (81) identified being a sports of archery, a non-commercial message (83) of promoting archery by way of accomplished national players is a pertinent amalgamation, Figure 21.
Context relevance is generated by way of a cognitive algorithm trained to analyze images in the frames/views, expressions, actions therein and speech associated with a plurality of frames. Illustratively, AffectNet® is a known large facial expression dataset with close to a million images manually labeled for the presence of eight (neutral, happy, angry, sad, fear, surprise, disgust, contempt) facial expressions along with the intensity of valence and arousal, which the cognitive algorithm as per the present invention compares with images of viewers captured by image sensors disposed in the companion device (200).
Contextual relevance includes the social status and preference of the humans identified along with their dress sense, dress and accessories worn, along with surrounding objects. Figure 11, a soft drink product is an apt fusion on a breakfast table with usable space.
Contextual relevance includes a negative category of crime, harassment, drugging, torture and self-harm et cetera and the cognitive algorithm is trained to handle such spaces as particularly prescribed. Hence the present invention is sensitive NOT to show a gun or dagger in order NOT to promote violence thoughtlessly.
An illustrative part of contextual algorithm is as follows:
# -----------------------------
# USAGE
# -----------------------------
# python human_activity_recognition_deque.py --model resnet-34_kinetics.onnx --classes action_recognition_kinetics.txt --input videos/example_activities.mp4
# python human_activity_recognition_deque.py --model resnet-34_kinetics.onnx --classes action_recognition_kinetics.txt
# Construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True, help="path to trained human activity recognition model")
ap.add_argument("-c", "--classes", required=True, help="path to class labels file")
ap.add_argument("-i", "--input", type=str, default="", help="optional path to video file")
args = vars(ap.parse_args())
)
CLASSES = open(args["classes"]).read().strip().split("\n")
SAMPLE_DURATION = 16
SAMPLE_SIZE = 112
# Initialize the frames queue used to store a rolling sample duration of frames -- this queue will automatically pop out
# old frames and accept new ones
frames = deque(maxlen=SAMPLE_DURATION)
# Grab the pointer to the input video stream
print("[INFO] Accessing the video stream...")
vs = cv2.VideoCapture(args["input"] if args["input"] else 0)
# Loop over the frames from the video stream
while True:
# Read the frame from the video stream
(grabbed, frame) = vs.read()
if not grabbed:
print("[INFO] No frame read from the video stream - Exiting...")
break
# Now the frames array is filled, we can construct the blob
blob = cv2.dnn.blobFromImages(frames, 1.0, (SAMPLE_SIZE, SAMPLE_SIZE), (114.7748, 107.7354, 99.4750),
swapRB=True, crop=True)
blob = np.transpose(blob, (1, 0, 2, 3))
net.setInput(blob)
outputs = net.forward()
label = CLASSES[np.argmax(outputs)]
cv2.putText(frame, label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (255, 255, 255), 2)
# If the 'q' was pressed, break from the loop
if key == ord("q"):
break
The secondary contents (90) are alloyed with the primary contents (80) for a controlled time commensurate with the context. Thus a romantic scene shall have a longer fusion than an action scene.
Modification of secondary contents (90) to make them suitable for amalgamation/fusion is the work of Art + Engineering. Secondary contents (90) are made fusible by resizing, red eye reduction, shadow, rotating, Opacity, adding blurring shadows and reflection, altering Illumination, grain size to perfectly match the look and feel of the main content frames. This can be achieved by matching pixel by pixel of each frame. A new frame with the modified secondary content replaces the original frame in the identified placement zone such that fusion does NOT leave any stitching marks.
Generative AI technology will help reduce time and costs to generate virtually real secondary content infused in the video content. Certain laborious process of image generation is automated by providing data sets that helps the model learn and generate the relevant output. This substantially reduces time and costs to manually process the secondary content for fusion into the video content.
User gestures related to watching the value fused videos are captured by the companion device (200) for continuing or changing secondary contents as well as primary contents. Illustratively, a waving gesture is interpreted as a sign of closing a content. Thumbs up can be interpreted as approval of the secondary content including commercial secondary contents like a particular brand. Victory sign is likewise interpreted as approval to interact with the brands in the virtual world and gamification. The companion device (200) is configurable to subsequently continue interaction with a secondary content provider as an Augmented reality (AR) virtual interaction with the commercial brands.
Figure 22, the companion device (200) is a custom-built hardware comprising a contemporary processor supported by a microphone (207), a streaming device (208) including a plurality of image sensors and an image signal processor (209) as a base feature service (200a). The platform service (200b) is provided by a plurality of algorithms comprising a loudness engine, a gesture engine, an overlay engine, media streaming, keyword detection and streaming, saliency and pose estimation unit. An application layer (200c) comprises computer program for at least a context aware value delivery, user feedback, seizure and fall detection, sign language (150a), interaction unit and eye blink. Figure 25, At times, an estimation or detection may result into a false detection (204). A feedback (206) of such false detections (204) and or a consequent inappropriate / nuisance actions (205) to the deep learning algorithms successively improves the outcome and reduces false detections (204).
Figure 24, a companion input device (210) having at least a plurality of companion buttons (202) and a distress button (203) communicates with the companion device (200), as an option.
Figure 23A-23C, as a preferred embodiment, is an AWS Cloud architecture on which the present invention is integrated, to produce a value infused video (110), involving an intense video contents, data, and management, between
? Video owner / OTT service provider (301)
? Secondary content owner (302)
? ADMOTT (300)
keeping abreast of viewers’ preferences data/information (303), with due copyright management precautions and data/content security. ADMOTT (300) is a brand of the infused value video development agency, responsible for a high quality and ethical artificial intelligence based deep machine learning tool of such development, with inventive contextual relevancy (215) and interactive viewing based on gaze, expressions and emotions, backed by companion device (200) providing a generic health and fatigue monitoring. ADMOTT duly protects its output of the value infused video (110), which is susceptible to manipulation, by contemporary data and content protection tools, including but not limited to cryptographic signature with their cryptographic private key.
The present invention therefore facilitates a safe and companioned/supported watching of a value enhanced videos coupled with an enhancement of viewer’s awareness of social and national initiatives much more effectively and in a more receptive and non-intrusive manner without forcing any secondary contents as a necessary evil of current practices of unavoidable manner leading to putting in extra time.
The present invention gives new monetizable inventory to the OTT service provider at a scale. Secondary contents, particularly commercial advertisements are placed intelligently using deep learning, in non-intrusive formats within the content. Advertisements fused within the main content are modified to give look and feel to the advertisers' products that appear to be real.
The present invention is expected to increase viewability by a magnitude of scales. This will allow OTT owners to charge a premium to their secondary content providers and thereby higher revenue per view yield for their inventory.
Secondary contents can be geo-targeted in multiple cities, regions, geographies thereby increasing monetizable opportunities to the OTT service provider. AUTOMATED buy/sell (programmatic) of custom in-content secondary contents will further increase the scale across the geographies globally.
Scalability Example- Indian Premier League (IPL) cricket match is streamed in multiple countries like India, UK, US, Australia etc.… Secondary contents can be localized at scale. Viewers in India/rural India, in the identified slots predicted by algorithm, may see a pension scheme of a Nationalized bank or an immunization drive of state government, while UK may see HSBC Bank scheme, and or Australia may see a flight safety related initiative of Qantas Airline.
The present invention further leads to a time and cost saving solution, as the space once identified in a raw video is useable for different secondary contents, sequentially or at different points of time. Example - if a center table is detected as a Rank 1 space for secondary content fusion, on the same table the inventive algorithm dynamically switches to different secondary contents like a free medicine drive of health ministry, a Rolex watch, a cricket update, a weather information. Such dynamic switching leads to continued interest in watching.
The present invention deploys contemporary AI tools with inventive algorithms to achieve the objectives. Computer Vision method, YOLO, Coevolutionary Neural Network, Natural language Processing is deployed for deriving meaningful inferences like object detection, event detection context, emotions, sentiments, expressions and feelings from digital images associated with audio and video. With TF-IDF words are given weight-measure relevance. Edge grey scale & gradient detection algorithm are deployed for video speech detection algorithm for audio of the video, identifying the object in each frame, create categories, listing negative and positive, analyze Expressions like sad, happy, etc., and classification of objects like the cat, dog, person, etc. The scope of the present invention is unlimited and grows with higher and higher algorithm processing capabilities and tools.
Server-side ad insertion (SSAI), that is, dynamic [secondary] content insertion enables the seamless delivery of fused secondary contents into main content streamed giving seamless viewers experience. This type of real -time insertion play crucial role to stitch secondary contents in the live streaming content such as live cricket match, live soccer match, live award shows like Grammy award, Filmfare award, Oscar award etc.
Secondary contents can be dynamically changed in the same spot and can be optimized. Example - in traditional method of placing perimeter secondary contents and advertisements in a cricket stadium ground or a football ground has fixed spots in the boundary or ground and is expensive to buy these spots because of popularity and finite space available. With the present invention of infusion, and which is virtual, these spots can be remonetized for secondary contents dynamically , It can be bought at the fraction of cost and on real time basis, bringing cost efficiency, time saving and scalable to target localized ads across geographies, cities and regions.
The present invention does object detection with respect to viewers as well as with respect to primary contents, in the same manner or in a different manner. While the object detection in the main video is algorithm based, the object detection wrt viewers is based on sensors or algorithm or a combination thereof.
, Claims:WE CLAIM:
1. A method to interactively supervise general health and fatigue of viewers watching a video fused with un-interrupting value added contents of national, social, non-commercial and commercial genre; the method comprising the steps of:
a) Ascertaining or receiving a metadata (150b) of viewer(s) including age, gender (101) and interests of one or more viewers (150),
b) Ascertaining general health and fatigue parameters of the viewer (s) including body temperature, facial freshness and happiness, eye blinking frequency and energy level (102),
c) Generating a greeting or cautionary audio or video message based on an initial health and fatigue assessment (103),
d) Playing of an age, gender and interest appropriate, value infused video by or for the viewer(s) (104),
e) Continuously monitoring the fatigue parameters and delivering cautionary message,
f) Understanding sign language (150a) and audio feedback communication of the viewer(s) (150) and act accordingly (106), and
g) Raising advisory/suitable alarm in case of a situation when viewer (s) (150) need help including falling asleep unsafely (107).
2. A system to interactively supervise general health and fatigue of viewers watching any video fused with un-interrupting value added secondary contents (90) of national, social, non-commercial and commercial genre, the system comprising a streaming engine (208), characterized by:
i. a companion device (200) having, a microphone (207), an imaging sensor, a data processor, an image signal processor (209),
ii. the streaming engine (208) including a smart television, and
iii. a plurality of deep learning based computer programs residing in a processor in the companion device (200),
wherein the deep learning based computer program executes the steps of:
a) Ascertaining by the imaging sensor and a health algorithm, or receiving by the companion input device (210), a metadata (150b) of viewer(s) (150) including age, gender (101) after obtaining consent,
b) Ascertaining by the imaging sensor and the health algorithm, a general health and fatigue parameters of the viewer (s) (150) including body temperature, facial freshness and happiness, eye blinking frequency and energy level (102),
c) Generating by the data processor and the streaming device a greeting or cautionary audio or video message based on an initial health and fatigue assessment (103),
d) Playing by the streaming device an age, gender and interest appropriate value infused video for the viewer(s) (104),
e) Continuously monitoring by the imaging sensor and the health algorithm the fatigue parameters and delivering cautionary message, after permission (105),
f) Understanding sign language (150a) and audio feedback communication by the imaging sensor and the health algorithm of the viewer(s) (150) to act accordingly (106), and
g) Raising advisory/suitable alarm by the streaming device in case of a situation when viewer (s) need help including falling asleep unsafely (107).
3. The system to interactively supervise general health of viewers as claimed in claim 2 wherein the system further comprises a platform service (200b) provided by a plurality of cognitive algorithms for a loudness engine, a gesture engine, an overlay engine, media streaming, keyword detection and streaming, saliency and pose estimation unit.
4. The system to interactively supervise general health of viewers as claimed in claim 2 wherein an application layer (200c) comprises computer program for at least a context aware value delivery, user feedback, seizure and fall detection, sign language (150a), interaction unit and eye blink.
5. The system to interactively supervise general health of viewers as claimed in claim 2 wherein the companion device (200) has a defined sector of view (201), and wherein the viewer (s) (150) present in the defined sector of view (201) are beneficiaries of the present system.
6. The system to interactively supervise general health of viewers as claimed in claim 2, wherein the imaging sensor of the companion device (200) detects one or more humans in the defined sector of view (201) and correspondingly predicts by heuristics their respective age, categorized in one of the prescribed segments – minor, adult, aged, old, with a safer error with respect to a video viewing restriction.
7. The system to interactively supervise general health of viewers as claimed in claim 2, wherein the ascertaining by the imaging sensor, the microphone (207) and the health algorithm includes:
a. A pose including a change in the pose, and a gaze,
b. attentiveness localized to a scene presented on a connected viewing screen including a television,
c. emotion,
d. attention duration at a localized scene of interest,
e. absence verses unconsciousness,
wherein a threshold value of an image data and a microphone data is analyzed by the health algorithm.
8. The system to interactively supervise general health of viewers as claimed in claim 2, wherein the companion device (200) predicts by a deep learning heuristics a decision making including a cognitive peace time decision making as well as a cognitive panic time decision making.
9. The system to interactively supervise general health of viewers as claimed in claim 8, wherein the decision making accuracy of the deep learning heuristics improves with constant learning and training data updating of a frequent viewer.
10. The system to interactively supervise general health of viewers as claimed in claim 2, wherein the system further comprises a sub-system to produce a value infused video, the sub-system comprising a computer implemented program to process a plurality of raw videos with a plurality of cognitive algorithms, the computer implemented program characterized by:
- A object detection algorithm,
- An object absence detection algorithm wherein absence of objects for a prescribed minimum area and time is identified as a target zone,
- A natural language processing algorithm wherein an audio is converted to a text, and
- A library of secondary contents including national, social, non-commercial and commercial,
wherein
a context of different segments of the raw video is predicted by analyzing visuals of the raw video and corresponding audio converted to the text and neural network trained by a context training data, a corresponding secondary content (90) from a library is contextually fused in the corresponding target zone with a pixel match of a fused secondary content, and
wherein
analysing the raw video frame by frame to identify objects (212) in the frames includes steps of :
• Dividing image into grids (212a),
• Predicting a class label for each grid (212b),
• Applying “Intersection over union” for predicting box boundary (212c),
• Considering the CNN and pool layer (212d), and
• Producing output for each image (212e) in the form of an object (212f).
11. The system to interactively supervise general health of viewers as claimed in claim 10, wherein the secondary content is a visual contextually fused in a plurality of frames of the raw video.
12. The system to interactively supervise general health of viewers as claimed in claim 10, wherein the secondary content is an audio contextually fused in a plurality of frames of the raw video.
13. The system to interactively supervise general health of viewers as claimed in claim 10, wherein analyzing visuals of the raw video comprises analyzing expressions of the objects.
14. The system to interactively supervise general health of viewers as claimed in claim 10, wherein the context includes a social status and preference of the humans identified along with their dress sense, dress and accessories worn, along with surrounding objects.
15. The system to interactively supervise general health of viewers as claimed in claim 10, wherein the context excludes a negative category of crime, harassment, drugging, torture and self-harm, the cognitive algorithm trained to handle such spaces with authorization.
16. The system to interactively supervise general health of viewers as claimed in claim 10, wherein the secondary contents (90) are alloyed with the primary contents (80) for a controlled time commensurate with the context.
17. The system to interactively supervise general health of viewers as claimed in claim 10, wherein a plurality of secondary contents are dynamically switched with a time slot.
18. The system to interactively supervise general health of viewers as claimed in claim 10, wherein the secondary contents at different geography are different.
19. The system to interactively supervise general health of viewers as claimed in claim 10, wherein the secondary contents are fused in a live streaming of primary contents.
20. The system to interactively supervise general health of viewers as claimed in claim 10, wherein the sub-system is an AWS Cloud architecture on which its backend is integrated.
21. The system to interactively supervise general health of viewers as claimed in claim 10, wherein a feedback (206) of a false detections (204) and or a consequent inappropriate / nuisance actions (205) to the deep learning algorithms successively improves the outcome and reduces false detections.
| # | Name | Date |
|---|---|---|
| 1 | 202321041153-REQUEST FOR EARLY PUBLICATION(FORM-9) [16-06-2023(online)].pdf | 2023-06-16 |
| 2 | 202321041153-POWER OF AUTHORITY [16-06-2023(online)].pdf | 2023-06-16 |
| 3 | 202321041153-FORM-9 [16-06-2023(online)].pdf | 2023-06-16 |
| 4 | 202321041153-FORM FOR STARTUP [16-06-2023(online)].pdf | 2023-06-16 |
| 5 | 202321041153-FORM FOR SMALL ENTITY(FORM-28) [16-06-2023(online)].pdf | 2023-06-16 |
| 6 | 202321041153-FORM 1 [16-06-2023(online)].pdf | 2023-06-16 |
| 7 | 202321041153-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [16-06-2023(online)].pdf | 2023-06-16 |
| 8 | 202321041153-EVIDENCE FOR REGISTRATION UNDER SSI [16-06-2023(online)].pdf | 2023-06-16 |
| 9 | 202321041153-DRAWINGS [16-06-2023(online)].pdf | 2023-06-16 |
| 10 | 202321041153-DECLARATION OF INVENTORSHIP (FORM 5) [16-06-2023(online)].pdf | 2023-06-16 |
| 11 | 202321041153-COMPLETE SPECIFICATION [16-06-2023(online)].pdf | 2023-06-16 |
| 12 | 202321041153-Request Letter-Correspondence [23-06-2023(online)].pdf | 2023-06-23 |
| 13 | 202321041153-Power of Attorney [23-06-2023(online)].pdf | 2023-06-23 |
| 14 | 202321041153-FORM28 [23-06-2023(online)].pdf | 2023-06-23 |
| 15 | 202321041153-FORM 3 [23-06-2023(online)].pdf | 2023-06-23 |
| 16 | 202321041153-Form 1 (Submitted on date of filing) [23-06-2023(online)].pdf | 2023-06-23 |
| 17 | 202321041153-Covering Letter [23-06-2023(online)].pdf | 2023-06-23 |
| 18 | 202321041153-CERTIFIED COPIES TRANSMISSION TO IB [23-06-2023(online)].pdf | 2023-06-23 |
| 19 | 202321041153-CORRESPONDENCE(IPO)-(WIPO DAS)-(18-07-2023)..pdf | 2023-07-18 |
| 20 | 202321041153-STARTUP [14-08-2023(online)].pdf | 2023-08-14 |
| 21 | 202321041153-FORM28 [14-08-2023(online)].pdf | 2023-08-14 |
| 22 | 202321041153-FORM 18A [14-08-2023(online)].pdf | 2023-08-14 |
| 23 | Abstact.jpg | 2023-08-30 |
| 24 | 202321041153-FER.pdf | 2024-04-01 |
| 25 | 202321041153-FER_SER_REPLY [17-06-2024(online)].pdf | 2024-06-17 |
| 26 | 202321041153-Power of Attorney [13-09-2024(online)].pdf | 2024-09-13 |
| 27 | 202321041153-Form 1 (Submitted on date of filing) [13-09-2024(online)].pdf | 2024-09-13 |
| 28 | 202321041153-Covering Letter [13-09-2024(online)].pdf | 2024-09-13 |
| 29 | 202321041153-PatentCertificate17-03-2025.pdf | 2025-03-17 |
| 30 | 202321041153-IntimationOfGrant17-03-2025.pdf | 2025-03-17 |
| 31 | 202321041153-PROOF OF ALTERATION [15-04-2025(online)].pdf | 2025-04-15 |
| 32 | 202321041153-FORM FOR STARTUP [15-04-2025(online)].pdf | 2025-04-15 |
| 1 | 202321041153E_28-03-2024.pdf |