Specification
Description:FIELD OF THE INVENTION
Embodiment of the present invention relates to a system and method for measuring user skilling activity in a particular trade and more particularly to a system and method for measuring and enhancing user skilling activity using an event-based vision sensor combined with ultrasonic and inertial measurement units.
BACKGROUND OF THE INVENTION
The conventional progress measuring devices capture videos of individuals practicing a particular trade to identify deficiencies in the manner user performs the desired action. This is typically achieved using frame based cameras or other high speed camera, which not only adds to computational cost but also is a less efficient method of capturing the pose and orientation from all possible angles and corners of the target subject.
While traditional vision-based cameras or frame-based optical sensors are well suited for tasks such as object recognition or other operations associated with frame-based coding, their capability is limited in assessing dynamic tasks such as monitoring, tracking, or position and motion estimation. The main drawback of conventional cameras is the production of a large amount of redundant and unnecessary data that must be captured, transmitted and processed. This high data load slows the reaction time by reducing the time resolution, resulting in increased power consumption, increased size and cost of the overall image capturing system, besides delaying the response to instant red flag situations in certain life-threating scenarios. In addition, most image sensors suffer from other artefacts such as limited dynamic range, poor low light performance, motion blur, and impose a high bandwidth and power usage limitation.
In this vein, the present disclosure sets forth system and method for providing instant response enabling feature using event based vision sensors combined with inertial measurement unit and ultrasonic sensors that are, in combination, capable of detecting critical factors associated with user and object of interest along with contextual background information with dedicated computer vision (CV) computing hardware and a dedicated low-power microprocessor for the purposes of detection, tracking, recognition, and /or analysis of users, objects, and scenes in the camera view. This analysis is performed in a quick turn around time to minimize latency in situations of highly critical trainings where the user need to instantly respond to deviation alerts that are presented in near-real time using the system of present disclosure; the system and method embodying advantageous alternatives and improvements to existing detection and tracking approaches, and that may address one or more of the challenges or needs mentioned herein, as well as provide other benefits and advantages.
OBJECT OF THE INVENTION
An object of the present invention is to provide a system and method for measuring correctness of skilling operation using event based vision sensors that are strategically combined with ultrasonic sensors and inertial measurement units (IMUs) to provide instant feedback and corrective course of action.
Another object of the present invention is to provide an efficient, fast and accurate system and method for measuring skilling activity operation using a combination of IMUs, ultrasonic and event-based sensors that tracks the relevant aspect of skilling operation such as user motion while ignoring the non-interesting static aspect of skilling operation scene.
Yet another object of the present invention is to provide a computationally efficient system and method of measuring accuracy of user in performing skilling operation by making use of event based vision sensor to capture motion related aspects of scene, ultrasonic to capture relevant distances and IMUs for orientations and angles thereby avoiding the transmission of uninteresting scene feature data, yet all important parameters to improve overall system performance efficiency.
Yet another object of the present invention is to provide a less data hungry yet more precise user skilling operation effectiveness measurement system and method with focus on dynamic regions of interest and/or regions of non-interest to isolate changing events and reduce data from non-interesting events.
Yet another object of the present invention is to provide a more robust and seamless simultaneous tracking of motion for that of hand, object held in hand along with interaction thereof to identify an event for further analysis only when needed or as defined and configured by the order.
In yet another object of the present invention, tracking of hand movements along with objects held in hand may be obtained with less delay (low latency) as only activated set of pixel data is processed, making it suitable for scenarios where deviations from optimal need to be instantly reported.
In yet another object of the present invention, the system using the combination of aforementioned sensors provide valuable feedback, real-time error detection besides recommending corrective posture enforcements and movements, thereby enabling fine-tuning skills and personalized training and adaptation experience.
SUMMARY OF THE INVENTION
This section is for the purpose of summarizing some aspects of the present invention and to briefly introduce some preferred embodiments. Simplifications or omissions may be made to avoid obscuring the purpose of the section. Such simplifications or omissions are not intended to limit the scope of the present invention.
In first aspect of disclosure, a system for measuring skilling activity using a head mounted device in real time is disclosed. The system comprises of: an event-based sensor positioned over the head mounted device, and configured to capture asynchronous events associated with a skilling activity, wherein the asynchronous events comprise of capturing change in user and object movements associated with the skilling activity. The system further comprises of an ultrasonic sensor positioned over the head mounted device, and configured to capture slow/static information along with a position and distance information associated with the skilling activity. Furthermore, the system comprises of an inertial measurement unit positioned over the head mounted device, and configured to capture orientation information associated with the skilling activity. Finally, the system comprises of a computing module that is configured to: receive and fuse the asynchronous events, slow/static information, distance information and the orientation information to obtain real time trajectory of the user and object movements; and provide alerts in an event the change in the user and object movements exceeds a predetermined threshold. In one significant aspect of disclosure, the computing module is configured to optimize run-time of the event-based sensor, ultrasonic sensor and the inertial measurement unit based on a set of predefined parameters.
In second aspect of disclosure, a method for measuring skilling activity using a head mounted device in real time is disclosed, wherein the method comprises of: capturing asynchronous events associated with a skilling activity using an event-based sensor positioned over the head mounted device, wherein the asynchronous events comprise of capturing change in user and object movements associated with the skilling activity. The next step includes capturing slow/static information along with a position and distance information associated with the skilling activity using an ultrasonic sensor positioned over the head mounted device. This is followed by capturing orientation information associated with the skilling activity using an inertial measurement unit positioned over the head mounted device. The received information is fused via a computing module to obtain real time trajectory of the user and object movements, which further provides alerts in an event the change in the user and object movements exceeds a predetermined threshold. Here, importantly the run-time of the event-based sensor, ultrasonic sensor and the inertial measurement unit is optimized using the computing module based on a set of predefined parameters.
In one significant aspect of disclosure, the predefined parameters for optimizing the run time of event based sensor, ultrasonic sensor and inertial measurement unit comprises of nature of information, lightning conditions and/or object occlusion.
BRIEF DESCRIPTION OF THE DRAWINGS
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular to the description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, the invention may admit to other equally effective embodiments.
These and other features, benefits and advantages of the present invention will become apparent by reference to the following text figure, with like reference numbers referring to like structures across the views, wherein:
Fig. 1 illustrates an exemplary setting in which a user may be interacting with a head mounted device that incorporates the event-based vision sensor along with ultrasonic sensor and IMUs, in accordance with one embodiment of present disclosure.
DETAILED DESCRIPTION
While the present invention is described herein by way of example using embodiments and illustrative drawings, those skilled in the art will recognize that the invention is not limited to the embodiments of drawing or drawings described and are not intended to represent the scale of the various components. Further, some components that may form a part of the invention may not be illustrated in certain figures, for ease of illustration, and such omissions do not limit the embodiments outlined in any way. It should be understood that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents, and alternatives falling within the scope of the present invention as defined by the appended claims.
As used throughout this description, the word "may" be used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense, (i.e., meaning must). Further, the words "a" or "an" mean "at least one” and the word “plurality” means “one or more” unless otherwise mentioned. Furthermore, the terminology and phraseology used herein is solely used for descriptive purposes and should not be construed as limiting in scope. Language such as "including," "comprising," "having," "containing," or "involving," and variations thereof, is intended to be broad and encompass the subject matter listed thereafter, equivalents, and additional subject matter not recited, and is not intended to exclude other additives, components, integers or steps.
Likewise, the term "comprising" is considered synonymous with the terms "including" or "containing" for applicable legal purposes. Any discussion of documents, acts, materials, devices, articles, and the like are included in the specification solely for the purpose of providing a context for the present invention. It is not suggested or represented that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention.
In this disclosure, whenever a composition or an element or a group of elements is preceded with the transitional phrase “comprising”, it is understood that we also contemplate the same composition, element or group of elements with transitional phrases “consisting of”, “consisting”, “selected from the group of consisting of, “including”, or “is” preceding the recitation of the composition, element or group of elements and vice versa.
The present invention is described hereinafter by various embodiments with reference to the accompanying drawings, wherein reference numerals used in the accompanying drawing correspond to the like elements throughout the description. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiment set forth herein. Rather, the embodiment is provided so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those skilled in the art.
In the following detailed description, numeric values and ranges are provided for various aspects of the implementations described. These values and ranges are to be treated as examples only and are not intended to limit the scope of the claims. In addition, a number of materials are identified as suitable for various facets of the implementations. These materials are to be treated as exemplary and are not intended to limit the scope of the invention.
In accordance with one general embodiment of present disclosure, the present system and method attempts to overcome aforementioned deficiencies using event-based vision sensors along with ultrasonic sensors and inertial measurement units (IMUs) that can check the accuracy of skilling activity in real time without the burden of location constraints and cost. Capturing fleeting moments not customarily captured by commonly available vision or image sensors without having to deal with large amounts of unnecessary redundant data is an arduous task. With the event-based vision sensor, only the skilling activity that a user is performing is getting captured as a highly resolved sequence of asynchronous events, while the static information contained in the background of the user is optimally ignored for majority of time thereby reducing data of non-interesting events.
With the advanced event sensitive sensors, such as dynamic vision sensors (DVS) or event-based sensors all the dynamic moments performed by the user are instantly and easily captured for processing in a cost-effective manner. The entire user movement is recorded as a continuous stream of spatio-temporal patterns representing track, trajectory, speed and velocity of object(s) held in user hand or other (moving) objects of interest in a given field of view. The change in hand movements are observed as incremental change to a continuous signal that can be analyzed at low computational cost. Further issues of occlusions by user own hands, or object recognition and simultaneous localization and mapping can be resolved with such asynchronous behavior of event recording sensors, wherein a high-speed asynchronous stream of events, i.e., brightness changes in the scene are captured.
Thus, once the movement of the object in the scene has been detected (i.e., there is a light change), a time-sequence pixel based sparse event data flow for dynamic pixels (i.e., pixels whose brightness has changed) may be outputted. In other words, the pixel information about the event is recorded in a chronological order where the event-based vision sensor responds more naturally to the edges for capturing low level features such as keypoints, corners etc. Each of the plurality of pixels may detect an event in which incident light becomes brighter or darker than a predetermined threshold value thereby mimicking the spatial temporal information extraction of biological vision systems (neuromorphic computing). The output thus obtained is sparse and asynchronous instead of dense and synchronous output obtained from conventional frame-based vision sensors.
The event signal is obtained from changes detected per pixel and includes information identifying an active pixel (eg. address or index of the pixel, pixel coordinate), time information (e.g. a time stamp) when an event is detected by the active pixel. Also, the event signal includes an event type (e.g. polarity information) i.e. indicative of a polarity of the brightness change, i.e., whether the brightness increases (positive polarity) or decreases (negative polarity).
The moving hand and object held in hand are active pixels of interest for event-based vision sensors that usually generates a surface in space and time, which is much less computationally intensive. The event data frame may include fix-sized batches, voxel grids representing space-time or compressed representations, as it is generally known. The time information may include information associated with a time in which an event occurs in each of the pixels. For example, the time information may include a timestamp corresponding to a time in which an event occurs in each of the pixels, which also helps in determining if the object is moving slow, or at an average pace or at a very high speed based on the time range.
For example, when a difference between a timestamp stored in an arbitrary pixel in an event data and a timestamp stored in a pixel adjacent to the arbitrary pixel is reduced, an object may move quickly. In one alternate embodiment, the system is capable of identifying most probable and likely movements happenings in human body when performing the skilling activity, such as hand movements. In such scenarios, the system may identify an event signal corresponding to hand movement from among event signals forming an event data, by using a machine learning scheme, for example, a Gaussian mixture model, and the like. Likewise, it is possible to identify subtle noise in the overall event as the event signal which is generated a long time ago is highly likely a noise signal.
While the event based dynamic vision sensors provide for fast, low-latency detection of changes in hand movement by only recording significant shifts, they struggle with accurate distance measurement, and orientation in which object of interest is manipulated, when used alone. This spatial precision and angular information is crucial in training scenarios where hand position is important such as vocational training scenarios, medical procedures, robotics or fine motor tasks in manufacturing. Further, since the event- based sensors respond to changes in brightness alone, their threshold values are dependent on specific application context and environment conditions (as explained in greater detail below). Furthermore, these event- based vision sensors respond sluggishly to slow moving objects, or static information that may be contained within a training scenario and important in some aspects for context setting.
In the present disclosure, the system 1000 overcomes the aforementioned challenge of using event-based sensors 100 alone by integrating them with ultrasonic sensors 200 that bring in distance measurement and 3D spatial information along with inertial measurement units (IMUs) 300 to capture orientation data, making the combination robust, powerful for a wide range of applications requiring detection of specific gestures, speeds and movement patterns. As generally known, the ultrasonic sensors 200 continuously emit sound waves and measure the time it takes for the waves to return, allowing precise distance measurement from the sensor to an object. This provides exact spatial positioning and can work well even in low visibility environments where the event-based sensors are often adversely impacted. The ultrasonic sensors 200, being unaffected by lightning conditions, make them useful in dim or irregularly lit training environments.
Further, the ultrasonic sensors 200 can help “fill in” blind spots that event-based sensors might miss, particularly when movements occur out of the line of sight of event-based sensors. This can ensure continuity in object tracking, even when a hand moves behind an object or partially obscures itself, say for example, while playing a shot or performing yoga posture that requires the user to take his hands/arms out of line of sight of the user eye. Apropos, the system 1000 includes a computing module 400 proposes data fusion from event-based 100 and ultrasonic sensors 200 to provide both temporal and spatial movement profiles, allowing users/trainers to study intricate hand dynamics like the acceleration of movement, trajectory curvature and precise hand positioning.
As discussed, while event-based sensors 100 compliment capturing of detailed spatial information contained in fast precise motion, ultrasonic sensors 200 help in detecting object position in space, even if slow or static. Now, referring to training scenarios where fine, slow hand movements need to be precisely captured, the combination of event and ultrasonic based sensors alone may provide limited accuracy. The combination, though can be significantly optimized for providing speedy, accurate, real time and reliable tracking in a broad range of training environment, there are scenarios of resolution and precision constraints where fine-grained changes in position within small areas is desired for capturing subtle motions or small incremental shifts. This deprives the system of obtaining context of movement or surrounding static features or background elements. Further, subtle gradual movements may be interpreted as a noise by these sensors if they fall below the sensitivity or accuracy thresholds and do not trigger significant variations in measured values.
However, for improved tracking of fine, slow movements and more enriched background awareness, the system 1000 incorporates lightweight and computationally less intensive inertial measurement units (IMUs) 300 that are optimally positioned over the head mounted device 500 along with ultrasonic 200 and event-based vision sensors 100. These directly complement the above-mentioned combination of event and ultrasonic based sensors as they continually track orientations, speed and direction that provides a steady stream of data even during slow or subtle movements.
The fusion of inertial measurement units 300 with the event based 100 and ultrasonic sensors 200 help detect slow movements and minor rotations (from IMUs), along with distance readings to get depth and orientation information (from ultrasonic sensors). Event based sensors 100 can then capture fast, transient movements, completing the movement profile. This may be well explained in a scenario discussed below.
Next, creating an effective fusion algorithm for combining the event-based sensor data 100 with the ultrasonic sensor 200 and inertial measurement unit 300 involves an additional challenge as the sparse asynchronous output from the event sensor 100 cannot be directly integrated with synchronous distance and positional measurements from the ultrasonic sensors 200 and orientation and direction measurements from the IMUs 300 that operate on fixed time intervals.
To begin, the event stream is first captured from event sensors 100 that typically represents a tuple containing information related to pixel coordinates, polarity and timestamp. Simultaneously, start capturing continuous distance data from ultrasonic sensors 200, providing distance measurement to the closest object (such as hand) within a set interval and angular orientation/direction data from IMUs 300.
Since the event-based data streams are asynchronous, they need to be matched with ultrasonic data and IMU data using precise timestamps as differences in data acquisition speed can lead to small timing mismatches, affecting the accuracy of fusion and movement tracking. Applying a buffering mechanism where the most recent ultrasonic measurement and IMU signal data is paired with events that occurred within a small-time window (e.g. 10 ms) is a promissory solution. To match all the streams more accurately, linear interpolation is performed on ultrasonic and IMU data to create “virtual points” for intervals between measurements. This ensures that every event is matched with the closest distance and orientation reading.
The above obtained event data is highly granular while the ultrasonic sensors and IMUs offer relatively lower-resolution spatial and orientation data. Further, the data may be noisy based on training environment from where it is captured due to environmental interference that can create artefacts leading to inaccurate movement analysis. Moreover, event-based sensors can generate a large volume of data during rapid movements, increasing processing demands, especially when combined with ultrasonic data and IMU data.
Thus, the data is now subjected to pre-processing that may require grouping events within a fixed time window to reduce data noise and focus on significant movement patterns. Thereafter, spatial filtering may be applied to remove isolated events (that could be noise). Finally, derived features may be computed like velocity or direction if the event pattern represents continuous motion.
Likewise, smooth the distance and position data obtained from ultrasonic sensors along with IMU data using a moving average or low-pass filter to reduce noise. Then, features such as rate of change in distance (velocity) or acceleration along with speed/orientation may be computed, which could indicate approach or retreat of hand. The data pre-processing step is followed by a feature extraction process wherein the spatial density (of most active pixel) and speed and direction movement are captured based on event clusters over time. Similarly, average distance to the object of interest (such as hand) within the time window along with speed/velocity and acceleration, orientation is captured.
It is important to note that instead of directly fusing raw sensor data, features such as position, velocity, distance, acceleration, orientation, speed is performed to simply the fusion process and improve overall interpretability. Now, a weighted average of (above) extracted features such as position, velocity, acceleration, orientation, angle etc. may be derived. In one exemplary embodiment, this may be achieved using Kalman filter as it treats the event-based velocity estimates as observations and ultrasonic distance along with IMU data as measurements. This provides for a smooth, continuous estimate of hand position, orientation and velocity. In tandem, confidence scores may be generated for each of event, ultrasonic and IMU based sensors based on noise levels and anomalies, giving higher weight to readings with higher confidence. Based on the fused features, a real-time trajectory of hand’s movements may be obtained that may be continually updated.
The computing module 400 may provide for threshold-based alerts wherein if certain movement thresholds are crossed (e.g. a minimum distance or an optimal velocity range), trigger feedback. In one systemic approach, the computing module 400 makes use of neural network such as Recurrent neural network (RNN) to recognize specific movement patterns based on the fused data to classify the movement as correct/incorrect. In this context, RNN can learn temporal patterns in hand movements based on labelled training data. Accordingly, an input layer of RNN may accept a sequence of feature vectors. Then, an RNN layer(s) such as Long short-term memory can capture temporal dependencies in skill movement. Finally, the output layer may use a softmax or sigmoid layer to classify each sequence as correct/incorrect.
In next noteworthy embodiment, as the system 1000 integrates with event- based sensors 100, ultrasonic sensors 200 and IMUs 300, it is extremely important to balance the computational cost that comes with accommodation of additional sensors (ultrasonic and IMUs) to event-based sensors. Besides, it is important to optimize the usage of these sensors in background of some critical parametric considerations, namely nature of information to be extracted, lightning conditions, and object occlusions/line of sight as discussed now.
This necessitates selective activation/deactivation of event based and ultrasonic sensors based on nature of information (static/dynamic), lightning conditions and occlusions, while IMUs are active at all times. Thus, event- based sensors (EBS) 100 are activated when dynamic events occur such as significant movement or gesture is detected. However, if the user is stationary, or if the position is stable with no adjustments or movements required, EBS 100 can be turned off to save computational resources.
For ultrasonic sensors 200, if the object of interest is in close proximity to the user, these sensors are activated to actively monitor the distance at critical moments, while it can be turned off until a meaningful interaction between the user and object is found. Simply put, both the EBS 100 and ultrasonic sensor 200 can be turned on before the intended training action is performed to monitor distance and activate tracking of movements, during the process to provide real time feedback and shall be deactivated when the user is in idle state.
Next, lightning conditions are importantly considered for effective performance of sensors, as the system 1000 must be adjusted to handle variations in light levels while maintaining optimal sensor performance. Lightning conditions are known to affect performance of EBS 100 because the highly bright environments might cause the sensor to be overwhelmed by too many “events”, while very low light levels might reduce sensor’s sensitivity. Thus, under stable light conditions, EBS 100 can be set to an adaptive mode that adjust their sensitivity to lightning conditions. These can be deactivated under too bright light, while changes and readings from ultrasonic sensors 200 can be measured for distance estimation and movement approximation.
Thirdly, optimizing sensor usage in events of occlusions is required to mitigate effects of occlusions. For example, EBS 100 are heavily impacted by object occlusion as they lose track of movements, leading to inaccurate data. In low-light conditions, EBS 100 are more sensitive to small changes, so the system should continue to monitor dynamic movements even if parts of object or user are partially occluded. The sensitivity of EBS 100 can be increased to detect small movements or the system can switch to alternate feedback mechanism like a proximity alert from ultrasonic sensor.
For bright light conditions, EBS 100 can be activated only when significant changes are detected in the unoccluded parts of scene. If an occlusion happens, the sensor may reduce sensitivity to avoid unnecessary data capture from ambient light changes but can still track large gestures that occur outside the occluded area. Thus, when the occlusion occurs, EBS 100 can be temporarily deactivated to avoid gathering misleading data.
During this time, the system 1000 can rely on ultrasonic 200 and IMU 300 readings that provides feedback on object position with respect to user, and help approximating motion changes. However, post-occlusion EBS 100 can be re-activated to resume full tracking of user and object. By implementing the above choices, the system 1000 can operate effectively under various conditions, including dynamic movements, lightning fluctuations, and object occlusions.
In the present disclosure, one exemplary embodiment for determining correct skill and user action performance is demonstrated whereby occurrence of the user motion in the preset skilling environment is detected, measured and recommended for corrective action based on the combination of event signal output from the IMUs 300, ultrasonic 200 and event-based vision sensor 100. Here, the skilling may refer to vocational skills such as welding, spray painting, fire and safety training, building and construction, craftsman training, fine arts and design etc.
This is followed by a step of determining whether the motion is a valid motion as the preset skilling environment corresponds to desired user/trainee motion; identifying and recognizing a static pattern of the user/trainee motion in the skilling environment based on spotting object of interest in the static pattern; and an object of interest in the static pattern in advance. Finally, the similarity of the corresponding object of interest in the selected model pattern is compared, and suitability of the user's motion is determined based on the comparison result.
In one example embodiment, the determination of valid user motion is based on a motion pattern classifier which is previously learned by dividing a dynamic video of user performing the skilling activity into a plurality of steps and a step-by-step check according to a partial motion(s) constituting the entire skilling activity. It is characterized by determining whether the operation is a valid operation.
In one exemplary embodiment, the partial motion in a skilling activity such as welding operation may comprise of preparing the objects to be welded together, which may involve ridding the metals of unwanted grease, dust and rust and filing a slanting edge out of the metal side; selecting and wearing proper safety gear including safety gloves, safety shoes, safety glasses to protect one’s eyes and other body features from being damaged by the fierce glare emanated from the metal whilst welding; setting up of work area to keep the work piece and welding paraphernalia on welding table and clearing the floor of any flammable substance; securing the workpiece; striking the arc by touching the workpiece with the electrode for a few seconds and lifting it back; maintaining correct rod position and movement such that it traverses a straight line; forming the bead with refined arc movement; and finally cleaning and painting the weld.
These partial motions are recorded as reference motions with which the user skilling operation needs to be compared, in accordance with, though not limiting to, a welding example. The user posture, viewing angle, angle at which the welding rod is held, rod motion and hand movements are some vital parameters that are important for achieving a smooth and optimal welding experience, as discussed later.
The identification and recognition of the static pattern in the skilling environment may include extracting a static pattern of the user's motion based on an event signal generated within a predetermined threshold time range and matching the user's motion with the model pattern. In one example embodiment, a reference model may be selected for each of the (partial) user motion against which the user action is compared.
The model pattern may be created using welding postures and welding activity of professional welders and extracting a static image from these videos, which may be an image output from an event-based vision sensor 100, and may be an image captured by a conventional camera or converted image. Such a reference model database may also include information such as the physical condition of the welding object, welding motion, rod position and other characteristically defining features of welding operation. In addition, the reference model may be extracted from a database comprising one or more best images of the seasoned welders proficient in performing the welding operation.
Once the user skilling environment and the reference model is selected, and the user starts the welding operation, his position, posture, angle of holding the welding gun, movement of gun and his hands which is designated from the event signal output is displayed. That is, when the combination of IMUs 300, ultrasonic 200 and event vision sensor 100 detects the movement of the user hand along with the welding gun, it collects the event signal, and transmits the event signal to the system, where the computing module 400 analyses the signal received and determines that a user motion occurs as per the reference model extracted from reference model database.
When the occurrence of the user's motion in the skilling environment matches with that of the reference model, it may be determined that the detected motion is a valid motion. The validity judgment is determined by using a pre-trained motion pattern classifier by dividing various videos of various professional users whose connection motions of a specific welding operation (for example, striking the arc) are recorded into a plurality of steps and step-by-step check areas according to partial motions. In this case, the motion pattern classifier may include information about an object of interest for each inspection area. Since image learning and pattern classification are already known methods, detailed descriptions are omitted.
That is, the motion pattern classifier is used to determine whether the motion detected in the skilling environment, like the user posture and position along with welding gun position and orientation, corresponds to the previously-checked motion. If it is determined that the user's operation is still in the address state or incorrectly performed, the operation is determined to be invalid and the next step does not proceed.
If the determination result is a valid operation, the static pattern of the user's motion in the user position and posture may be extracted. The static pattern extraction step extracts a static pattern of the user's motion based on an event signal generated during a predetermined threshold time out of the movements within the time range of detecting the valid motion, and extracts a static pattern that best matches the user's motion of the model pattern extracted from the reference model database. Here, the static pattern may correspond to a static image extracted from a video and may include information of an object (for example, the shape of an object or a position in the image) in the static image.
When the static pattern is extracted in the user's motion, the object of interest (such as welding table or welding floor) in the extracted static pattern may be identified. In this case, the object of interest to be extracted from the static pattern may be determined using the motion pattern classifier or the model pattern. Alternatively, the user may directly determine the object of interest. The object of interest may also include a part of a user's body, for example, a hand, an arm, a shoulder, a leg, or the like, and may include a welding table or welding floor or the like.
The object of interest may be specified in advance in the motion pattern classifier, or may be arbitrarily designated by the user, and is not limited to this embodiment. When the object of interest is determined, the object of interest may be identified in the static pattern extracted from the user's motion based on the determined object of interest. Since identifying the object of interest in the image is a known method, description thereof will be omitted.
If the object of interest is identified in the static pattern extracted from the user's motion, the similarity with the object of interest in the model pattern may be compared to determine suitability of the user's motion (for example, an arc generation step). Next, the shape, movement and orientation of both hands, which are objects of interest in the user motion skilling environment, and the shape of both hands in the model pattern may be compared. By comparing the object of interest of the user and the object of interest of the model pattern, it may be determined whether or not it is a proper operation according to whether or not the shape similarity is within a certain threshold range. The object of interest in the model pattern may be extracted in advance when building the model database.
The similarity determination result between the objects of interest may be fed back to the user through the user interface of the head mounted device 500. For example, a beep sounds through the user interface of head mounted device in real time when it is determined that it is not a proper operation, and is notified through the user interface in real time when it is determined that the correct operation is performed (e.g., correct mark), and the replay of the correct operation on the user head mounted device 500 is performed.
In one preferred embodiment, recurrent models (like Long Short-Term Memory) can serve as a classifier, as the focus here is on sequential patterns in welding operation to create robust training scenario. The LSTM is configured to capture sequential dependencies, such as movement stability, and consistency in torch speed, angle and distance over time. Accordingly, all sensor data is synchronized with accurate timestamps to create uniform input sequence for the LSTM. Then, features are extracted from data and the time series is divided into overlapping or non-overlapping windows (e.g. 2-3 seconds per window) to create input sequence for LSTM. Each real-time sequence is fed through the trained LSTM model to classify the motion pattern in the current window. If the LSTM detects a suboptimal motion patter, corrective feedback is triggered.
Thus, for an immersive AR/VR experience, the visual content of the AR/VR cannot have a high latency. The time between a photon capture due to a change in the scene to the corresponding change in AR/VR (i.e., lag or latency) should not be, e.g. more than 5ms for AR or 20ms for VR. Higher lag times reduce a user’s AR/VR experience and can even lead to motion sickness. For end use cases like training, where the user may have to wear the head mounted device 500 for a prolonged period of time, motion artefacts have to be actively compensated for. With use of combined sensor input, low latency pose estimations may be obtained, as discussed in working principle of the combined sensor performance with role of each sensor distinctly identified and defined.
For example, in a typical virtual welding training, precision, control and adherence to correct technique are critical, and fusing the IMU 300, ultrasonic 200 and event-based sensors 100 can enhance real time tracking and feedback for realistic, effective training experience. At the beginning, the user/trainee may be required to hold and position a VR welding torch that may require a correct orientation, distance and stability to start. IMUs 300 configured on the head mounted device 500 detect the angle and orientation of the torch, ensuring its position correctly. Ultrasonic sensors 200 then confirm the torch’s proximity to the virtual workpiece, which may be quickly monitored for any sudden, unsteady movements by the event-based sensors 100 as the user/trainee brings the torch into position.
As commonly understood, a key skill in welding is maintaining a consistent angle and distance from the workpiece for an even weld bead. IMUs 300 monitor the torch’s angle to ensure it remains consistent, avoiding common errors like wavering or tilting the torch too far. Ultrasonic sensors 200 may then track the distance between the torch and the virtual weld path, providing immediate feedback if the trainee deviates from the optimal range. Event based sensors 100 may then capture rapid or jerky movements helping to reinforce a smooth and stable technique.
Now, moving the torch at a consistent speed is essential for a uniform weld. IMUs 300 and ultrasonic 200 work together to measure both the forward speed and positional stability long the weld path. If the ultrasonic sensor data 200 indicates that the torch is moving too close or too far from the workpiece, the system 1000 can prompt the user to adjust. Event based sensors 100 may then detect any acceleration spikes, alerting the user if they are moving too quickly or erratically.
Further, the position of torch determines factors like heat distribution and penetration, impacting weld quality. The IMU data 300 captures subtle changes in wrist orientation, ensuring the torch angle remains stable. Ultrasonic data 200 verifies distance to avoid common issues like excessive heating or undercutting, while event-based sensors 100 provide high speed feedback on any sudden changes in position, which could affect weld quality.
Furthermore, certain welding positions require precise technique to control gravity’s effect on molten metal. In such scenario, IMU 300 detect hand orientation to verify the user holds the torch correctly for vertical or overhead welding, reducing the risk of molten metal dripping. Ultrasonic sensor 200 track distance from the workpiece, while the event-based sensors 100 alert for sudden hand movements, providing immediate feedback to prevent positioning errors.
Thus, by integrating the IMUs 300, ultrasonic 200 and event-based sensors 100, VR welding training can deliver a comprehensive hands-on learning experience where real-life risks can be simulated and the above combination of sensors can speedily detect fast, risky movements that might lead to tool slips or other unsafe behaviours in actual welding. This set up provides detailed tracking and instant feedback on key aspects of welding, allowing user to master techniques safely and effectively in a controlled virtual environment.
Next, the disclosure presents fusion of IMU 300, ultrasonic 200 and event-based sensors 100 for aforementioned scenario. At first, the sensors are initialized and optimal parameter range(s) is defined. For example, the distance threshold (D) of an object, say torch distance range in this case is defined (e.g. 0.5 to 1.5 cm). Thereafter, an angle threshold (A) is determined (e.g. 45 degrees in this case). Likewise, speed threshold (S) is determined to identify optimal movement speed (e.g. torch movement speed being set as 5-10 cm/s). These variables are initialized to log metrics for distance, angle and speed deviation.
Now the data from all sensors is captured. For example, current distance ‘d’ between the object and hand (e.g. torch and metal in this case) is measured. Similarly, current angle ‘θ’ and speed ‘v’ of the object movement is noted. These events or unusual (welding) behaviours (e.g. overheating, sparks, or unsteady motion) are monitored using event-based vision sensors 100.
Next, with distance measured using ultrasonic sensor 200, following observations and recommendations are made:
If dDmax (too far):
Then trigger feedback for “torch too far” and set distance deviation flag.
Else (distance is optimal):
Set distance status to “optimal”
Likewise, evaluate angle using IMU, and follow below instructions:
If θ ∉ A ± Δ A (angle deviates beyond threshold)
Then trigger feedback for “incorrect torch angle”, and set angle deviation lag.
Else (angle is optimal)
Set angle status to “optimal”
Evaluate speed using IMU, and
If v
Documents
Application Documents
| # |
Name |
Date |
| 1 |
202421090771-STATEMENT OF UNDERTAKING (FORM 3) [22-11-2024(online)].pdf |
2024-11-22 |
| 2 |
202421090771-FORM-9 [22-11-2024(online)].pdf |
2024-11-22 |
| 3 |
202421090771-FORM FOR STARTUP [22-11-2024(online)].pdf |
2024-11-22 |
| 4 |
202421090771-FORM FOR SMALL ENTITY(FORM-28) [22-11-2024(online)].pdf |
2024-11-22 |
| 5 |
202421090771-FORM 1 [22-11-2024(online)].pdf |
2024-11-22 |
| 6 |
202421090771-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [22-11-2024(online)].pdf |
2024-11-22 |
| 7 |
202421090771-DRAWINGS [22-11-2024(online)].pdf |
2024-11-22 |
| 8 |
202421090771-DECLARATION OF INVENTORSHIP (FORM 5) [22-11-2024(online)].pdf |
2024-11-22 |
| 9 |
202421090771-COMPLETE SPECIFICATION [22-11-2024(online)].pdf |
2024-11-22 |
| 10 |
202421090771-STARTUP [25-11-2024(online)].pdf |
2024-11-25 |
| 11 |
202421090771-FORM28 [25-11-2024(online)].pdf |
2024-11-25 |
| 12 |
202421090771-FORM 18A [25-11-2024(online)].pdf |
2024-11-25 |
| 13 |
Abstract.jpg |
2024-12-11 |