Method And System For Predicting Distance Of Gazed Objects Using

< Back

Method And System For Predicting Distance Of Gazed Objects Using Infrared (Ir) Camera

Abstract: This disclosure relates generally to method and system for predicting distance of gazed objects using IR camera. Eye tracking technology is widely used to study human behavior and patterns in eye movements. Existing gaze trackers focus on predicting gaze point and hardly analyzes distance of the gazed object from the gazer or directly classify region of focus. The method of the present disclosure predicts gazed objects distance using a pair of IR cameras placed on either side of a smart glass. The gaze predictor ML model predicts distance of at least one gazed object positioned from eye of each subject during a systematic execution of a set of tasks. From each pupillary information of each pupil a set of features are extracted which are utilized to classify the gazed object of the subject based on the distance into at least one of a near class, an intermediate class, and a far class.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

17 May 2023

Publication Number

47/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point, Mumbai 400021, Maharashtra, India

Inventors

1. GAVAS, Rahul Dasharath

Tata Consultancy Services Limited, Gopalan Enterprises Pvt Ltd (Global Axis) SEZ, "H" Block No. 152 (Sy No. 147,157 & 158), Hoody Village, EPIP Zone, (II Stage), Whitefield, K.R. Puram Hobli, Bangalore – 560066, Karnataka, India

2. RAMAKRISHNAN, Ramesh Kumar

3. SINGH, Priya

4. KARMAKAR, Somnath

Tata Consultancy Services Limited, Building 1B, Ecospace, Plot - IIF/12, New Town, Rajarhat, Kolkata – 700156, West Bengal, India

5. SHESHACHALA, Mithun Basaralu

6. PAL, Arpan

Tata Consultancy Services Limited, Building 1B, Ecospace, Plot - IIF/12, New Town, Rajarhat, Kolkata – 700156, West Bengal, India

Specification

DESC:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
METHOD AND SYSTEM FOR PREDICTING DISTANCE OF GAZED OBJECTS USING INFRARED (IR) CAMERA

Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
The present application claims priority from Indian provisional patent application no. 202321034600, filed on May 17, 2023. The entire contents of the aforementioned application are incorporated herein by reference.
TECHNICAL FIELD
The disclosure herein generally relates to gazed object distance, and, more particularly, to method and system for predicting distance of gazed objects using infrared (IR) camera.
BACKGROUND
Human visual system is the most advanced and superior in estimating the depth both in terms of quality and generalization. Eye tracking technology is widely used to study human behavior and patterns in eye movements given specific target stimulus such as videos, web pages, computer games, books, and the like. Human vision is a complex system that makes it possible to receive and process information from an external environment. Applications like augmented reality (AR), virtual reality (VR), and smart wearable technology are gaining popularity as a result of eye tracking.
Eye tracking application enables human machine interaction for dynamic object tracking in a video in order to define an area of interest. Various methods have been devised for tracking user eye focus or gaze. Gaze depth estimation has numerous applications in the areas of augmented reality, human machine interaction, scene understanding, optics, scientific research, and analysis. However, a robust and accurate estimation of gaze depth is a very challenging problem. Existing gaze tracking methods identifies certain features of the eye positions to compute a gaze direction or gaze point. Such existing gaze trackers are focused on predicting the point of gaze and hardly addresses analyzing the distance of the gazed object from the gazer or directly classify region of focus. Existing systems lack in providing gaze depth estimation, lack robustness, and accuracy. Also, existing techniques propose the use of heavy and bulky head mounted hardware which is cumbersome to use and requires additional processing.
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a system for predicting distance of gazed objects using infrared (IR) camera is provided. The system includes pretraining a gaze predictor ML model to predict distance of at least one gazed object positioned from eye of each subject during a systematic execution of a set of tasks. The one or more IR images of each eye of each subject are received as input for fixed duration as input from a pair of IR cameras configured to either side of a spectacle. Further, one or more pupillary image information from each pupil are acquired for each eye from the one or more IR images. Further, a set of features are extracted from each pupillary information of each pupil, and eye blinks from the set of features is denoised. Further, a distance of a gazed object from current location of the subject is predicted using the gaze predictor ML model and the set of features. Finally, the gazed object of the subject is classified based on the distance into at least one of a near class, an intermediate class, and a far class.
In another aspect, a method for predicting distance of gazed objects using infrared (IR) camera is provided. The system includes pretraining a gaze predictor ML model to predict distance of at least one gazed object positioned from eye of each subject during a systematic execution of a set of tasks. The one or more IR images of each eye of each subject are received as input for fixed duration as input from a pair of IR cameras configured to either side of a spectacle. Further, one or more pupillary image information from each pupil are acquired for each eye from the one or more IR images. Further, a set of features are extracted from each pupillary information of each pupil, and eye blinks from the set of features is denoised. Further, a distance of a gazed object from current location of the subject is predicted using the gaze predictor ML model and the set of features. Finally, the gazed object of the subject is classified based on the distance into at least one of a near class, an intermediate class, and a far class.
In yet another aspect, a for predicting distance of gazed objects using infrared (IR) camera is provided. The non-transitory computer readable medium pretrains a gaze predictor ML model to predict distance of at least one gazed object positioned from eye of each subject during a systematic execution of a set of tasks. The one or more IR images of each eye of each subject are received as input for fixed duration as input from a pair of IR cameras configured to either side of a spectacle. Further, one or more pupillary image information from each pupil are acquired for each eye from the one or more IR images. Further, a set of features are extracted from each pupillary information of each pupil, and eye blinks from the set of features is denoised. Further, a distance of a gazed object from current location of the subject is predicted using the gaze predictor ML model and the set of features. Finally, the gazed object of the subject is classified based on the distance into at least one of a near class, an intermediate class, and a far class.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG.1 illustrates an exemplary system (alternatively referred as gazed objects distance prediction system) according to some embodiments of the present disclosure.
FIG.2 is a functional block diagram for predicting distance of gazed objects using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.3 illustrates a flowchart of a process for predicting distance of gazed objects using IR camera images using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.4A illustrates a first stimulus paradigm where a subject gazes at fixed object using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.4B illustrates a second stimulus paradigm where the subject gazes at forward-backward moving object using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.5 illustrates a pupillary signal information before removing eyeblinks and after removing eye blinks from the set of IR eye image features using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.6A illustrates a first feature which is a pupil orientation angle feature extracted from the IR images of each eye using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.6B illustrates signature wavelet representation of the first feature pupil dilation signal feature into intrinsic mode functions using a variational mode decomposition technique when the subject gazes at near, intermediate, and far objects from a left and a right eye using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.6C illustrates decomposition of a second feature associated with a pupil dilation feature where baseline pupil dilation feature using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.7A illustrates the first stimulus classification performance of near and far distance gazed objects with accuracy and F1 score using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.7B illustrates the first stimulus regression performance of near and far distance gazed objects with MAE score using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.8A illustrates the first stimulus classification performance of near, intermediate, and far distance gazed objects with accuracy and F1 score using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.8B illustrates the first stimulus regression performance of near, intermediate, and far distance gazed objects with MAE score using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.9A illustrates the second stimulus classification performance of near and far distance gazed objects with accuracy and F1 score using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.9B illustrates the second stimulus regression performance of near and far distance gazed objects with MAE score using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.10A illustrates the second stimulus classification performance of near, intermediate, and far distance gazed objects with accuracy and F1 score using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.10B illustrates the second stimulus regression performance of near, intermediate, and far distance gazed objects with MAE score using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.11A illustrates a real life application of refractive defects of vision (RFD) correction using focus tunable lenses using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.11B illustrates an real life application of RFD correction where the subject enters their vision power details before using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.11C illustrates an real life application of RFD correction where dynamic changing of lens based on the input diopter values and the gazed object distance predicted using the system of FIG.1, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
The human visual system is a complex system that makes it possible to receive and process information from the external environment. It helps humans to understand and navigate the world as one of the primary modes of interaction. The human visual system carries out a series of processes, right from light rays falling into the eyes till we perceive what we are seeing. Accommodation of the eye is one such important process that aids in focusing on objects present at varying distances. It remains a cornerstone of human visual experience as it enriches our understanding of attention and intention, which is paramount for understanding and predicting human behavior. Accurately understanding and modeling this capacity has numerous implications, particularly in fields like human-computer interaction, virtual and augmented reality, and other accessibility technologies.
But with time, humans lose the capability of maintaining focus on near objects, which is called presbyopia. Almost everyone experiences some degree of presbyopia and it usually arises after the age of 40. Presbyopia along with the near-sightedness defect are known as refractive defects of vision (RFD). It has a significant impact on an individual’s quality of life and emotional state. It becomes challenging for an individual suffering from this condition to function properly in any environment. Hence, there is a need for RFD correction with newer advanced technological tools and algorithms. In order to correct vision, people largely use bifocals, progressive glasses or reading glasses. The distance of the gazed entity if determined, can aid a person by actuating feedback via smart eyewear with dynamic lenses for assisting the person in their vision.
Furthermore, progressives can perform poorly when conducting tasks requiring side-to-side head movement and suffer from astigmatism in the periphery. Reading or computer glasses avoid these issues, but people often avoid them because of the inconvenience of carrying multiple pairs of glasses, or worse, forgetting the other pair. Last, mono vision and simultaneous-vision contacts fall short when compared to bifocals and single-vision glasses on metrics such as visual acuity, stereo acuity, and near-distance task performance. This has left a critical gap in understanding and technological replication of human depth perception.
One of the promising solutions to the stated problem is the use of dynamic or focus tunable lenses along with eye tracking technologies. The distance of the gazed entity if determined using eye tracking, can aid a person by actuating feedback via smart eyewear with dynamic lenses for assisting the person in their vision.
Existing eye tracking technology (camera based) available in the market is only able to provide a two-dimensional (2D) or XY coordinates of gaze estimation that maps a vector indicative of the angle of viewing into the 2D plane at a fixed distance. They do not provide an indication of the actual distance of viewing (depth perception).
Embodiments herein provide a method and system for predicting distance of gazed objects using IR camera. The system provides a gazed objects distance predictor using a pair of IR cameras. The system enables providing a robust scalable low cost gazed predictor ML model to determine distance of object of interest and corresponding actual distance gazed by a subject. The system utilizes an IR camera-based setup without any additional hardware, thereby making it easy to use and applicable for deployment scenarios. This further predicts a likelihood of subject gazing into at least one of a near distance, intermediate distance, and a far-off distance by performing a model classification comprising near, far, and intermediate distance levels. The method of the present disclosure employs for example a smart glass having a pair of IR cameras positioned on lenses of each smart glass. The method includes training a gaze predictor machine learning (ML) model utilizing domain knowledge based eye features comprising a pupil orientation feature and a pupil dilation feature captured through video frames by performing systematic execution of tasks. Further, the trained gaze predictor machine learning model is utilized to classify the distance of the gazed objects from the current location of the subject based on the set of features.
The system 100 requires only a pair of light weight, inexpensive IR cameras resulting robust, real-time gaze depth estimation solution which leverages the properties of physiological characteristic of eyes while fixating at different distances. The method also address the problem of refractive defects of vision and propose the subsequent use of our gaze depth estimation algorithm in correcting this condition. The results in this work establishes the robustness of the proposed system to retrieve depth information from eye images. The disclosed system is further explained with the method as described in conjunction with FIG.1 to FIG.11C below.
Referring now to the drawings, and more particularly to FIG.1 through FIG.11C, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments, and these embodiments are described in the context of the following exemplary system and/or method.
FIG.1 illustrates an exemplary system 100 (alternatively referred as gazed objects distance prediction system) according to some embodiments of the present disclosure. In an embodiment, the system 100 includes a processor(s) 104, communication interface device(s), alternatively referred as input/output (I/O) interface(s) 106, and one or more data storage devices or a memory 102 operatively coupled to the processor(s) 104. The system 100 with one or more hardware processors is configured to execute functions of one or more functional blocks of the system 100.
Referring to the components of system 100, in an embodiment, the processor(s) 104, can be one or more hardware processors 104. In an embodiment, the one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 104 are configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems including laptop computers, notebooks, hand-held devices such as mobile phones, workstations, mainframe computers, servers, and the like.
The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a user interface, a tracking dashboard to display performance of the enterprise application, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular and the like. In an embodiment, the I/O interface (s) 106 can include one or more ports for connecting to a number of external devices or to another server or devices.
The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
In an embodiment, the memory 102 includes a plurality of modules 110 such as an signal decomposition unit 202, IMF candidate selector unit 204, a signature wavelet extractor 206, and so on as depicted in FIG.2. The plurality of modules 110 include programs or coded instructions that supplement applications or functions performed by the system 100 for executing different steps involved in the process of predicting distance of gazed objects using the IR camera, being performed by the system 100. The plurality of modules 110, amongst other things, can include routines, programs, objects, components, and data structures, which performs particular tasks or implement particular abstract data types. The plurality of modules 110 may also be used as, signal processor(s), node machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the plurality of modules 110 can be used by hardware, by computer-readable instructions executed by the one or more hardware processors 104, or by a combination thereof. The plurality of modules 110 can include various sub-modules (not shown).
Further, the memory 102 may comprise information pertaining to input(s)/output(s) of each step performed by the processor(s) 104 of the system 100 and methods of the present disclosure. Further, the memory 102 includes a database 108. The database 108 stores a first stimulus data and a second stimulus data. In normal real-life scenarios, humans generally have lower vergence angles (angle made by two eyes on the gazed object) for near and it increases with distance. The first stimulus data is when the subject gazes at objects at different heights. The second stimulus data is captured from digitally controlled motorized forward-backward moving apparatus which carries a white circular plate with a black fixation dot where the objects are at same height. The datastore helps to record all stimulus data for training the gaze predictor ML model 208.
The database (or repository) 108 may include a plurality of IR camera images that is processed, received, or generated as a result of the execution of the plurality of modules in the module(s) 110. Although the database 108 is shown internal to the system 100, it will be noted that, in alternate embodiments, the database 108 can also be implemented external to the system 100, and communicatively coupled to the system 100. The data contained within such external database may be periodically updated. For example, new data may be added into the database (not shown in FIG.1) and/or existing data may be modified and/or non-useful data may be deleted from the database. In one example, the data may be stored in an external system, such as a Lightweight Directory Access Protocol (LDAP) directory and a Relational Database Management System (RDBMS). Functions of the components of the system 100 are now explained with reference to FIG.2 and steps in flow diagrams in FIG.3.
FIG.2 illustrates an architectural overview of the system of FIG.1, in accordance with some embodiments of the present disclosure. The FIG.2 depicts a signal decomposition unit 202, IMF candidate selector unit 204, a signature wavelet extractor 206 and a gaze predictor ML model 208.
The system 100 receives one or more IR camera images as input which are further processed to determine distance of the gazed object from current location of the subject. The one or more IR camera images are fetched by further components of the system 100.
The signal decomposition unit 202 of the system 100 decomposes the given pupil dilation signal into one or more intrinsic mode functions (IMFs), where each IMF is centered around a compact center frequency.
The IMF candidate selector unit 204 of the system 100 selects the IMF that has maximum information pertaining to the gazed depth. This IMF is termed as the signature wavelet.
The signature wavelet extractor 206 of the system 100 extracts or performs the convolution of the signature wavelet with that of each pupil dilation signal or feature collected in runtime to maximize the gazed distance information by suppressing the rest of the unwanted information.
The gaze predictor ML model 208 of the system 100 is pretrained to predict distance of at least one gazed object positioned from eye of each subject during a systematic execution of a set of tasks. The gaze predictor ML model 208 is pretrained using a training data where a pair of IR cameras captures a set of input images from a set of video frames continuously located at either side of lenses of the spectacle for example a smart glass. Each IR camera captures the continuous images of one of the eyes. Here, a Python-based inhouse audio stimulus is created to guide the subjects to gaze at one of the 3 target objects placed at 3 different distances ranging as Near (50cm) class, Intermediate (150cm) class and Far (300cm) class. The subjects are required to gaze at each object for a duration of about 3 seconds and this is followed by an eye close state for another 3 seconds. Each of these 10 such trials constitute a session. The generated training data is fed into the system to process inputs received as inputs during testing phase.
FIG.3 illustrates a flowchart of a process for predicting distance of gazed objects using IR camera images using the system of FIG.1, in accordance with some embodiments of the present disclosure.
In an embodiment, the system 100 comprises one or more data storage devices or the memory 102 operatively coupled to the processor(s) 104 and is configured to store instructions for execution of steps of the method 300 by the processor(s) or one or more hardware processors 104. The steps of the method 300 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG.1 and FIG.2, the steps of flow diagram as depicted in FIG.3 and a use case example of presbyopia suffering subjects to check distance of gazed objects to analyze sight in FIG.4. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods, and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
Referring to FIG.3 and the steps of the method 300, at step 302 of the method 300, the one or more hardware processors 104 are configured to pretrain a gaze predictor ML model to predict distance of at least one gazed object positioned from eye of each subject during a systematic execution of a set of tasks.
The gaze predictor ML model 208 of the system 100 is pretrained using different training datasets referred as a first stimulus and a second stimulus. The first stimulus data and the second stimulus data are generated from each subject by performing the systematic execution of the set of tasks where the subject gazes at the object. The data collection process is performed by executing series of tasks which are detailed in further embodiments.
In normal real life scenarios, humans generally have lower vergence angles (angle made by two eyes on the gazed object) for near and it increases with distance. The first stimulus is designed using the hypothesis where the object is placed at 3 different heights and distances to test the effect of angular placement of objects on the distance prediction. To test this further, in the second stimulus objects are placed at three different distances but at the same height and line of sight from the subject.
Referring now to FIG.4A, the first stimulus data is collected where the subject is seated at a fixed position A and three objects are placed at locations B, C and D at distances 60cm (Near), 150cm (Intermediate) and 300cm (Far) from the subject, respectively. The objects at these three locations are placed at different heights, emulating real world scenario. A timed audio stimulus is played through an inhouse-designed kiosk which calls out the object name after every five seconds. The subject is supposed to gaze at that object until they hear the next instruction. Five such trials are conducted with each trial comprising of gazing at the three objects independently. A closed eye session succeeds every trial for a period of five seconds.
Referring now to FIG.4B, the second stimulus data is collected where a digitally controlled motorized, forward-backward moving apparatus is developed in-house which carries a white circular plate with a black fixation dot. It is programmed in such a manner that it halts at each of the three distances at B, C and D (B, C, D are in same line of sight) for a fixed duration of five seconds. The participant is supposed to gaze at the fixation dot.
Once the training dataset are collected, it is further utilized to extract a set of features from each pupillary information of each eye image. The set of features comprises a pupil orientation feature and a pupil dilation feature. Firstly, the pupil orientation feature is normalized corresponding to a baseline pupil orientation signal obtained during initial calibration phase when the subject gazes at the object from the current location to the far distance.
Secondly, the training dataset is utilized to extract the pupil dilation signal feature by executing the steps of where initially the pupil dilation signal is normalized using the baseline pupil dilation signal, wherein the baseline pupil dilation signal is obtained during the initial calibration phase when the subject gazes at the object from the current location to the far distance. Then, the pupil dilation signal is enhanced with a convolution of signature wavelet, wherein the signature wavelet is estimated during the initial calibration phase when subject gazes at the object from the current location to the far distance.
Further, the baseline pupil dilation signal is inputted into a variational Mode Decomposition (VMD) technique which decomposes the baseline pupil dilation signal into discrete modes where each mode is compact with center frequency to obtain an intrinsic mode functions (IMFs) for each mode as output.
Then, optimization is performed around each intrinsic mode functions (IMFs) which is an ideal signature wavelet capturing the gaze distance of the object from the current location to the far distance. Finally, the pupil dilation feature is enhanced in runtime where each pupil dilation feature subjected to convolution with the signature wavelet to maximize gazed depth related component in the signal and to suppress the rest.
At step 304 of the method 300 the one or more hardware processors are configured to receive one or more IR images of each eye of each subject for fixed duration as input from a pair of IR cameras configured to either side of a spectacle.
Referring to an example where the subject has presbyopia and requires solution for correction using auto tunable smart lens at low cost. Human eye orientation changes while gazing at different depths. The system 100 receives one or more IR images of each eye of each subject as input which are further processed in next embodiment.
At step 306 of the method 300 the one or more hardware processors are configured to acquire one or more pupillary image information from each pupil of each eye from the one or more IR images. For the above example the one or more pupillary image information from each pupil of each eye is acquired from the one or more IR images.
Now at step 308 of the method 300 the one or more hardware processors are configured to extract from each pupillary information of each pupil a set of features, and denoising eye blinks from the set of features.
Here, for the above example each pupillary information of each pupil includes a major axis (M), and a minor axis (m) which is obtained from both the IR eye video frames separately. The window of duration 300 milliseconds (as a fixation duration window) is considered for making prediction and this window should not comprise of noise.
Referring now to FIG.5, where the eyeblinks are removed from the one or more IR images. The signal of each IR image from the gaze tracker is considered as x = {1, 2, 3, ...,N} dumps zeroes during the blink portions. This leads to loss of information in the data.
For example, let x ^ ? x where x ^ is a copy of the signal x. Let the event in time a blink is detected be t. A buffer window of 150 ms given by d, defines the region of influence (ROI) of the blink at t in the signal. This ROI is defined by the interval [t - d,t + d] as represented in Equation 1,
x ^[t-d,t+d]=y_L+(x_j-x_L )(y_R-y_L )/((x_R-x_l ) ) ----- Equation 1
? j = 1,2,3,...,P number of samples at any ROI in the duration [t-d,t+d] .(x_L,y_L) is the leftmost data point in the ROI and (x_L,y_L) is the rightmost point in the ROI. Further, standard Savgol filter is applied on this new signal x ^ as in Equation 2,
(x_i ) ^=?_(i=(1-m)/2)^((m-1)/2)¦?C_i,(x_(i+1) ) ^ ?; (m+1)/2=j=N-(m-1)/2
----- Equation 2
The Savgol filter smoothens the data in the window in which it is applied in order to avoid sharp points in the ROI. The ROIs in the signal x is now replaced with the corresponding ROI regions from x ^ as in Equation 3,
x[t-d,t+d]=x ^[t-d,t+d] ---- Equation 3
Thus, the signal x is blink free signal and can be used for further analysis.
At step 310 of the method 300 the one or more hardware processors are configured to predict a distance of a gazed object from current location of the subject using the gaze predictor ML model and the set of features.
In another embodiment, referring now to FIG.6A hypothesized angular orientations termed as angle a, to extract the pupil orientation feature from each pupillary information of each pupil is hypothesized that each pupil follows a certain angular orientation while gazing at different distance. Initially, each pupillary image information is projected in a 2D plane. When the pupil image is projected onto the image in the XY plane as shown in FIG.6A the angle a which is the angle between the minor and the horizontal of the image, varies with the gaze depth or distance. Further, the angle (a) for each pupillary image information is computed, wherein the aangle is an angle between a minor axis and a horizontal component of at least one eye of the subject gazing the angle. Furthermore, the angle (a) is denoised from the eye blinks and sine of the (a) angle is taken to extract the pupil orientation feature from the angle (a).
In another embodiment, referring now to FIG.6B in real world scenarios, the computed angle a shows the boxplot for near, intermediate, and far classes. The pupil orientation feature sin (a) serves two purposes, (i) Firstly as a feature to classify Near and Far gazed distance and (ii) Secondly as a parameter to estimate a vertical component of pupil size time series extraction. Further the pupil orientation feature is normalized corresponding to baseline signal for gaze depth estimation. The baseline pupil orientation feature as the far data which is collected when a participant gazes at Far distance in the initial calibration phase. The mean µ and standard deviation s from the baseline pupil orientation signal are stored for normalization of each pupil orientation signals consecutively.
In another embodiment to extract the pupil dilation feature for each pupil of each eye involves estimation of vertical component of pupillary signal, followed by normalization of the signal using a custom defined signature wavelet as discussed in the next embodiment. Pupillary vertical axis estimation: The pupil image within the IR eye image is usually elliptical, owing to the orientation of the face. It becomes circular when the eye is aligned to the center of the camera. When the pupil shape appears to be elliptical, we get the major and minor axis as the two measures of pupil size. Vertical component of pupil size in the image space is an important feature for gaze depth estimation which we identified through experimentation. With eye movements, the major and minor axes could dynamically interchange between vertical and horizontal components of the image space. It is critical to continuously identify which axis is vertical in order to extract the feature. The said problem is solved from an image perspective having the horizontal (H) and the vertical (V) component of pupil size, which is an interplay of major and minor axes, respectively. Here, ß as a function of sine of the angle as in Equation 4
ß_i=sin??(a_i ) ? -------Equation 4
? i = 1,2,3,...,N number of eye image frames. The mapping of minor (m) and major (M) axes into H or V timeseries is carried out as defined in Equation 5,
f(m_i,M_i )={¦(H_i=m_i@V_i=M_i,if ß_i=1-ß_i@H_i=M_i@V_i=m_i,otherwise)¦
---- Equation 5
The new timeseries V is considered as pupil dilation signal for computing the gaze depth. The baseline signature extraction is described where the human vision is mainly designed for Far distance. Hence, the pupil dilation signal is collected when the subject is gazing at the Far distance in the initial calibration phase as the baseline pupil dilation signal. The mean µ and standard deviation s from this baseline pupil dilation signal are stored for further normalization of further pupil dilation signals. The baseline signal comprises of many hidden frequency components and extraction and identification of components of interest is a non-trivial task. The variational mode decomposition (VMD) decomposes each signal x(t) into K discrete modes u_k. Here, each mode is compact along its center frequency w_k. The technique is to solve a constrained variational function to search for w_k and u_k which is given in Equation 6,
min- ??{??_k¦??_t [(d(t)+j/pt)*u_k (t)] e^(-j?_k t) ??_2^2 } subject to ?_k¦?u_k=x ??
---- Equation 6
The working of VMD algorithm is as follows. Given the required number of modes K, set the initial parameters, u_k^1, w_k^1, ?^1 and n to 0. Update n with n + 1 and repeat the following steps. For every k = 1 through K, update u_k, w_k as described in Equation 7 and Equation 8,
u_k^(n+1) (?)=(x(?)-?_(ik)¦u_i^n (?)+((?^n (?))/2))/(1+2a(?-?_k^n )^2 ) ---- Equation 7
?_k^(n+1)=(?_0^8¦??|u_k^(n+1) (?)|^2 d??)/(?_0^8¦|u_k^(n+1) (?)|^2 d?)
---- Equation 8
The above updates are repeated until the following convergence is achieved as in Equation 9,
?_(k=1)^K¦?(?u_k^(n+1)-u_k^n ?_2^2)/(?u_k ?_2^2 )

Documents

Application Documents

#	Name	Date
1	202321034600-STATEMENT OF UNDERTAKING (FORM 3) [17-05-2023(online)].pdf	2023-05-17
2	202321034600-PROVISIONAL SPECIFICATION [17-05-2023(online)].pdf	2023-05-17
3	202321034600-FORM 1 [17-05-2023(online)].pdf	2023-05-17
4	202321034600-DRAWINGS [17-05-2023(online)].pdf	2023-05-17
5	202321034600-DECLARATION OF INVENTORSHIP (FORM 5) [17-05-2023(online)].pdf	2023-05-17
6	202321034600-FORM-26 [19-06-2023(online)].pdf	2023-06-19
7	202321034600-Proof of Right [13-07-2023(online)].pdf	2023-07-13
8	202321034600-FORM 3 [21-03-2024(online)].pdf	2024-03-21
9	202321034600-FORM 18 [21-03-2024(online)].pdf	2024-03-21
10	202321034600-ENDORSEMENT BY INVENTORS [21-03-2024(online)].pdf	2024-03-21
11	202321034600-DRAWING [21-03-2024(online)].pdf	2024-03-21
12	202321034600-COMPLETE SPECIFICATION [21-03-2024(online)].pdf	2024-03-21
13	202321034600-Response to office action [22-03-2024(online)].pdf	2024-03-22
14	202321034600-REQUEST FOR CERTIFIED COPY [29-05-2024(online)].pdf	2024-05-29
15	202321034600-REQUEST FOR CERTIFIED COPY [29-05-2024(online)]-1.pdf	2024-05-29
16	202321034600-CORRESPONDENCE(IPO)-(CERTIFIED LETTER)-05-06-2024.pdf	2024-06-05
17	202321034600-FORM 3 [02-07-2024(online)].pdf	2024-07-02