Method And System For Classifying Data Associated With An Object

< Back

Method And System For Classifying Data Associated With An Object Manoeuvre

Abstract: ABSTRACT METHOD AND SYSTEM FOR CLASSIFYING DATA ASSOCIATED WITH AN OBJECT MANOEUVRE A method and system of classifying data associated with an object manoeuvre is disclosed. The method comprises obtaining 110 as input data for an object manoeuvre 110, wherein the data is obtained at plurality of locations of the object, when the object is in motion; selecting 120 data associated with at least one area of interest from the plurality of locations of the object, wherein the data is an imbalanced set; pre-process 130 the selected data by generating a class balanced dataset from the selected data, and classifying the data into a pre-defined class/category; and generating 140 a class balanced dataset from the pre-processed data. By performing the above steps, a class balanced dataset from a class unbalanced dataset is obtained that can be utilized to classify incoming data. Figure to be published: Figure 1A

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

12 April 2024

Publication Number

16/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

INDIAN INSTITUTE OF SCIENCE

C V Raman Avenue, Bangalore 560012, Karnataka, India

Inventors

1. RATHNA G N

INDIAN INSTITUTE OF SCIENCE, C V Raman Avenue, Bangalore 560012, Karnataka, India

2. PRASHANT C MURAL

INDIAN INSTITUTE OF SCIENCE, C V Raman Avenue, Bangalore 560012, Karnataka, India

3. VIRAT BHOLA

INDIAN INSTITUTE OF SCIENCE, C V Raman Avenue, Bangalore 560012, Karnataka, India

Claims

1. A method of classifying data associated with manoeuvre of an object, the method comprising: - obtaining 110 as input data associated with manoeuvre of an object110, wherein the data is obtained from a plurality of locations of the object, when the object is in motion; - selecting 120 data associated with at least one area of interest from the plurality of locations of the object, wherein the data is an imbalanced set; - pre-process 130 the data by generating a class balanced dataset from the data, and classifying the data into a pre-defined class/category; and - generating 140 a class balanced dataset from the pre-processed data.

2. The method as claimed in claim 1, wherein the data is obtained from a plurality of sensors placed on the object.

3. The method as claimed in claim 2, wherein the plurality of sensors is configured to: - continuously record data associated with the plurality of locations during motion of the object; - store the data in a repository for further processing, wherein further processing includes pre-processing the data.

4. The method as claimed in claim 3, wherein the data obtained is classified to more than one pre-defined class/category.

5. The method as claimed in claim 4, comprises: - selecting 112 a data sub-set from the pre-defined class/category, wherein the data sub-set comprise anomalies within the pre-defined class; - performing 114 an operation on the data sub-set comprising anomalies; - obtaining 116 a class balanced dataset, and - storing the 118 class balanced dataset in a repository.

6. The method as claimed in claim 1, wherein a module (AL) is provided with class balanced dataset and the trained with the dataset.

7. The method as claimed in claim 6, wherein on receiving new input data from a manoeuvre, the module is configured to automatically classify the data into a pre-defined class by removing duplicate data.

8. A system for classifying data associated with a manoeuvre of an object, the system comprising: - a data extraction module 310 configured to obtain input data associated with a manoeuvre of an object, wherein the data is obtained from a plurality of sensors placed at various locations of the object, when the object is in motion; - a computing module 320 configured to select data associated with at least one area of interest from the plurality of locations of the object, wherein the data is an imbalanced set; - an first module 330 configured to pre-process the selected data by generate a class balanced dataset from the selected data, and classifying the data into a pre-defined class/category; - a second module 340 configured to generate a class balanced dataset from the pre-processed data; and - a repository 350 configured to store class-balanced dataset.

9. The system as claimed in claim 8, wherein the data is obtained from a plurality of sensors placed on the object.

10. The system as claimed in claim 9, wherein the plurality of sensors are configured to: - continuously record data associated with the plurality of locations during motion of the object; - store the data in a repository for further processing, wherein further processing includes pre-processing the data.

11. The system as claimed in claim 10, wherein the data obtained is classified to more than one pre-defined class/category.

12. The system as claimed in claim 8, wherein the computing module is configured to: - select a data sub-set from the pre-defined class/category, wherein the data sub-set comprise anomalies within the pre-defined class; - perform an operation on the data sub-set comprising anomalies; - obtain a class balanced dataset, and store the class balanced dataset in a repository.

13. The system as claimed in claim 8, wherein the trained classifier module is configured to receive a class balanced dataset and the active learning module is trained with the dataset.

14. The system as claimed in claim 8, wherein the trained classifier module on receiving new input data from a manoeuvre, the trained classifier module is configured to automatically classify the data into a pre-defined class by removing duplicate data.

15. A device for classifying data associated with a manoeuvre of an object, the device comprising one or more processors and a memory, the device configured to perform the method steps of any of the claims 1-7. Dated this 12th day of April 2024 Indian Institute of Science By their Agent & Attorney Dr. Eric W B Dias/Reg No 1058 of Khaitan & Co

Specification

Description:TECHNICAL FIELD
[0001] Embodiments of the present disclosure relate to data classification associated with a manoeuvre of an object, and more specifically, to obtain data from a manoeuvre of an object, select data related to an area of interest, pre-process the selected data and classify the selected data, and create a dataset for further classification of other manoeuvre to detect anomalies in object manoeuvre.

BACKGROUND
[0002] Vehicles including airborne vehicles, robots, water vehicles, automobiles etc., involve various actions that are performed to manoeuvre them in their respective environments. Typically manoeuvring these vehicles involve performing actions in a short span of time and involve a lot of training before a person can perform these actions within a given time. Such actions performed on the vehicles may be referred to as manoeuvres. Due of the complexity of the dynamics involved in manoeuvres and complex structure of vehicles, operational data associated with vehicles during their operations and manoeuvres are collected and analysed to identify potential operational hazards and anomalies during their operations.
[0003] Normally, multiple sensors are placed at various locations of the vehicles and data may be collected from these sensors, which may be related to the manoeuvring of these vehicles, amongst other data, for identifying errors or anomalies during manoeuvring of the vehicle. However, manually tracking and tracing the disparate data that have been collected by various sensors placed in these vehicles to identify potential errors or anomalies related to the manoeuvres involves computing complexity, resources and involves time, with large amounts of class imbalanced data for performing such analysis.
[0004]
[0005] Accordingly, there is a need to ensure use of a balanced dataset to ameliorate the disadvantages that may be found in the state of the art.

SUMMARY
[0006] Embodiments of the present disclosure relate to classifying data associated with a manoeuvre associated with an object. In an embodiment, the method includes obtaining as input (receiving) data associated with an object manoeuvre, wherein the data is obtained from a plurality of locations of the object, when the object is in motion. In an exemplary embodiment, data may be obtained from sensors placed on the object. A further embodiment includes selecting data associated with at least one region/area of interest of the object, from a plurality of locations of the object. In an embodiment, raw data obtained from the object data is an imbalanced data set. In an embodiment, the selected data is pre-processed by generating a class balanced dataset from the selected data and classifying the data into a pre-defined class/category. In an embodiment, a class balanced dataset is generated from the pre-processed data. In a further embodiment, data obtained is classified to more than one pre-defined class/category. In a further embodiment, a module (AL) is provided with class balanced dataset and trained with the dataset. In a further embodiment, a module as disclosed herein, on receiving new input data from a manoeuvre, is configured to automatically classify the data into a pre-defined class by removing duplicate data. Other embodiments are also disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The detailed description is described with reference to the accompanying figures. Features, aspects, and advantages of the subject matter of the present disclosure will be better understood with regard to the following description and the accompanying drawings. The figures are intended to be illustrative, not limiting, and are generally described in context of the embodiments, and it should be understood that it is not intended to limit the scope of the disclosure to these particular embodiments. In the figures, the same numbers may be used throughout the drawings to reference features and components. In order that the present disclosure may be readily understood and put into practical effect, reference will now be made to exemplary embodiments as illustrated with reference to the accompanying figures. The figures together with detailed description below, are incorporated in and form part of the specification, and serve to further illustrate the embodiments and explain various principles and advantages.
[0008] Figure 1A is an exemplary method of classifying data associated with a manoeuvre of an object accordance with embodiments of the present disclosure.
[0009] Figure 1B is an exemplary method for classifying data into pre-defined class/category in accordance with embodiments of the present disclosure.
[0010] Figure 1C is an exemplary method for performing an operation on the data sub-set comprising anomalies in accordance with embodiments of the present disclosure.
[0011] Figure 2A is an exemplary graph illustrating distribution of class labels in accordance with embodiments of the present disclosure.
[0012] Figure 2B is an exemplary graph illustrating distribution of class labels in accordance with embodiments of the present disclosure.
[0013] Figure 2C is an exemplary graph illustrating distribution of class labels in accordance with embodiments of the present disclosure.
[0014] Figure 2D is an exemplary graph illustrating distribution of class labels in accordance with embodiments of the present disclosure.
[0015] Figure 3 illustrates an exemplary system for classifying data associated with an object manoeuvre in accordance with embodiments of the present disclosure.
[0016] Throughout the drawings, identical reference numbers designate similar, but not necessarily identical elements. The figures as disclosed herein are not necessarily to scale, and the size of some parts may be exaggerated to more clearly illustrate the example shown. Moreover, the drawings are meant to only be provided as examples and/or implementations consistent with the description, and the description may not be limited to the examples and/or implementations provided in the drawings.

DETAILED DESCRIPTION
[0017] The following describes technical solutions in exemplary embodiments of the subject matter of the present disclosure with reference to the accompanying drawings. In this application as disclosed herein, "at least one" means one or more, and "a plurality of" means two or more. The term "and/or" describes an association relationship for describing associated objects and represents that three relationships may exist. For example, A and/or B may represent the following cases: Only A exists, both A and B exist, and only B exists, where A and B may be singular or plural. The character "/" usually indicates an "or" relationship between the associated objects. "At least one item (piece) of the following" or a similar expression thereof means any combination of the items, including any combination of singular items (piece) or plural items (pieces). For example, at least one item (piece) of a, b, or c may represent a, b, c, a and b, a and c, b and c, or a, b, and c, where a, b, and c each may be singular or plural.
[0018] It should be noted that in this application articles “a”, “an” and “the” are used to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. The terms “comprise” and “comprising” are used in the inclusive, open sense, meaning that additional elements may be included. It is not intended to be construed as “consists of only”. Throughout this specification defined above, unless the context requires otherwise the word “comprise”, and variations such as “comprises” and “comprising”, will be understood to imply the inclusion of a stated element or step or group of elements or steps but not the exclusion of any other element or step or group of elements or steps. The term “including” is used to mean “including but not limited to”. “Including” and “including but not limited to” are used interchangeably. In the structural formulae given herein and throughout the present disclosure, the following terms have been indicated meaning, unless specifically stated otherwise.
[0019] Unless otherwise defined, all terms used in the disclosure, including technical and scientific terms, have meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. By means of further guidance, term definitions are included for better understanding of the present disclosure. The term ‘about’ as used herein when referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, is meant to encompass variations of ±10% or less, preferably ±5% or less, more preferably ±1% or less and still more preferably ±0.1% or less of and from the specified value, insofar such variations are appropriate to perform the present disclosure. It is to be understood that the value to which the modifier ‘about’ refers is itself also specifically, and preferably disclosed.
[0020] It should be noted that in this application, the term such as "example" or "for example" or “exemplary” is used to represent giving an example, an illustration, or descriptions. Any embodiment or design scheme described as an "example" or "for example" in this application should not be explained as being more preferable or having more advantages than another embodiment or design scheme. Exactly, use of the word such as "example" or "for example" is intended to present a related concept in only a specific manner.
[0021] It should be understood that in the embodiments of the present subject matter that "B corresponding to A" indicates that B is associated with A, and B can be determined based on A. However, it should be further understood that determining B based on A does not mean that B is determined based on only A. B may alternatively be determined based on A and/or other information.
[0022] In the embodiments of this application, "a plurality of" means two or more than two. Descriptions such as "first", "second" in the embodiments of this application are merely used for indicating and distinguishing between described objects, do not show a sequence, do not indicate a specific limitation on a quantity of devices in the embodiments of this application, and do not constitute any limitation on the embodiments of this application.
[0023] It is generally known that use of artificial intelligence and generic machine learning techniques have simplified use of disparately large sets of data collected by sensors, wherein the data is used for analysis and detecting anomalous actions with respect ot the object embedded with the sensors. In an exemplary case, sensors on a vehicles may provide anomalous actions during manoeuvres associated with the object.. Further, traditional supervised machine learning approaches often require large amounts of labelled data, which can be time-consuming, costly, and prone to errors. In addition, these supervised machine learning approaches are generally not configured to identify newer anomalies and this would require additional labelling effort before the data can be used for classifying newer anomalies.
[0024] Since, most of the objects may be deployed for use after extensive training, data related to anomalous actions during manoeuvres assocaited with the object are far from common, and the data obtained from the sensors on the object may be imbalanced or class imbalanced. Consequently, randomly selecting data may not adequately represent any minority classes in the dataset. Hence, use of data that does not have large samples of data related to anomalous actions to train a systems to identify anomalous actions may offer sub-optimal results and anomalous actions during the manoeuvre associated with the object may be difficult to identify..
[0025] Normally, active learning methodology involve provision of a small, labelled dataset, which is randomly initialised. A choice of this small, labelled dataset may be extremely important as the choice of the small, labelled dataset samples should participate in the initialisation of the model for its subsequent query suggestions from a pool of unlabelled data. These models are initialised by randomly selecting a portion of unlabelled data for labelling. The anomalies may be typically under-represented if the dataset is imbalanced, which is generally true in most anomaly detection datasets, and embodiments of the present disclosure overcomes this disadvantage..
[0026]
[0027] To this end, exemplary embodiments of the present disclosure relate to a method and system for classifying data associated with a manoeuvre of an object. Typically, objects, for example vehicles, are manoeuvred by controllers, for example humans or automatically, to perform certain actions. In an exemplary case, during take-off and landing of an aircraft several manoeuvres are performed by a pilot. Similar, in another exemplary case, while flying a drone actions may be performed by a human not directly associated with the drone. In an exemplary case, to effectively analyse these actions and identify any anomalous actions while performing manoeuvres, sensors are placed in various locations on the objects and a continuous or periodic feed of data may be obtained from the sensors, which may be recorded and analysed. In an exemplary case, the data related to a specific set of sensors for a particular manoeuvre may be first selected. In an exemplary case, the sensors continuously record data and store the data in a repository. In an exemplary case, the sensors may transmit the data to a repository, which may be coupled to the repository over a network. In an exemplary case, using the data selected for the specific manoeuvre, a class-balanced dataset is generated/created from the selected data and using the class-balanced dataset, the data related to the specific manoeuvre is classified into pre-defined class or pre-defined category. Based on the classified data, a class-balanced dataset is generated for the pre-defined data.
[0028] In an exemplary case, the method includes classifying data obtained into more than one pre-defined class/category. In an exemplary case, a data sub-set may be selected from the pre-defined class. In an exemplary case, the data sub-set may contain anomalies within the pre-defined class/category. In an exemplary case, an operation may be performed as to the data sub-set to generate a class-balanced data set from the data sub-set and the generated class-balanced dataset may be stored in a repository. In an exemplary case, the generated class-balanced dataset may be used to train a module, such as an active learning module, and classify incoming new input data into one of the pre-defined classes.
[0029] In an exemplary caase, an object may be a vehicle which include aircrafts, drones, helicopters, unmanned aerial vehicles, space vehicles, ships, submarines, underwater vehicles, land-based robots or unmanned land-based vehicle, automobiles etc.. In an exemplary case, a manoeuvre may be defined as a set of actions that are performed with respect to the object to function/operate/navigate the object. It should be obvious to a person of ordinary skill in the art that the above-list of vehicles termed as objects is only an exemplary list and a number of other objects may be considered to be part of this list, and all such objects fall within the scope of the present disclosure.
[0030] In an exemplary case, the object may be considered to be an aircraft and the manoeuvres may be the actions performed by the aircraft at various instants of time. In an exemplary case, an aircraft may have the following manoeuvres “take-off”, “land” “fly”, etc. In an exemplary case, Each of these manoeuvres typically have plurality of actions associated with them. In an exemplary case, each aircraft during its operation typically records data in an aircraft data recorder that may be related operation of the aircraft. In an exemplary case, the aircraft data recorder receives data from various sensors placed at various locations of the aircraft, collates and stores the data, wherein the stored data may be processed instantly or at a later instant of time. In an exemplary case, for purpose of illustration, specific manoeuvre of aircraft landings may be considered.
[0031] In an exemplary case, a dataset consisting of approximately 99837 aircraft landings, each with 160 timestamps and 20 features is utilized as an initial dataset. In the exemplary case, the corresponding aircraft manoeuvre labels may be categorized into four classes: label 0 represents an ideal aircraft manoeuvre, while labels 1, 2, and 3 correspond to different anomalies observed during landing of the aircraft. In the exemplary caase, based on the above dataset, the operations of classifying data with respect to an object manoeuvre is explained using the flowchart of Figure 1A.
[0032] Reference is now made to Figure 1A, which is an exemplary method for classifying data associated with manoeuvre of an object in accordance with embodiments of the present disclosure. At step 110, data for an object is obtained/collected/recorded at various locations of the object when the object is performing a manoeuvre. At step 120, data related to one specific area of interest is selected. The specific area of interest is typically related to the manoeuvres performed by the aircraft. At step 130, the selected data for a specific area of interest is pre-processed. By pre-processing, a class balanced dataset is generated from the selected data. At step 140, using the pre-processed data, by iteratively selecting (querying) samples from the dataset, a class balanced dataset is generated
[0033] In an exemplary case, while performing a manoeuvre, the object is normally in motion and not meant to be in a stationary state. As discussed previously, an example of an object is an aircraft and manoeuvres are related actions that are performed by the aircraft, when the aircraft is in motion.. In an exemplary case, an aircraft may have the following manoeuvres such as “take-off”, “land” and “fly”. In an exemplary case, each of these manoeuvres may be associated with a plurality of actions associated with them. In an exemplary case, each aircraft during its operation typically records data in an aircraft data recorder . In an exemplary case, the aircraft data recorder may receive data from various sensors placed at various locations of the aircraft and stores the data. It should be obvious to a person skilled in the art that the data may be recorded and stored in other devices and all such devices fall within the scope of the present disclosure. In a exemplary embodiment, sensors placed at ailerons, rudders, spoilers, flaps, elevators, slats, horizontal stabilizers etc., of an aircraft may collect data related to the aircraft movement and the data is recorded to store data. In an exemplary case, some of the data collected by the sensors placed on the aircraft are used to calculate secondary values such as aircraft speed, angle of attack, air pressure, glideslope deviation, roll angle, pitch angle etc., and these data are also recorded.
[0034] In an exemplary case, using the recorded data, a classification of the data is performed into pre-defined classes or categories. In an exemplary case, the pre-defined classes or category may be defined on the basis of manoeuvres performed by the objects or on the basis of specific actions related to manoeuvres that may be performed by the objects. In an exemplary case, landing of an aircraft may be considered as a manoeuvre. In an exemplary case, data recorded by the sensors placed in the ailerons, rudders, spoilers, flaps, elevators, slats, horizontal stabilizers etc., of an aircraft may be considered and classified into specific classes. In an exemplary case, for an aircraft landing manoeuvre, a nominal class (class 0) may be defined to include samples having an ideal aircraft landing, a “speed high” class (class 1) may be defined to include samples having high airspeed during the aircraft landing, a “path high” class (class 2) may be defined to include samples having either a high glidepath or a lower glidepath, a “flap error” (class 3) may be defined to include samples having an error due to flap deployment while landing. It should be obvious to a person of ordinary skill in the art that the samples might be classified into more than one class, but for purpose of illustration only four classes have been considered in the present disclosure. In an exemplary case, an aircraft having high airspeed may be classified into two classes, “speed high” class as well as “path high” class, because the approach of the aircraft may be altered to a higher glide path than required due to the dependency on the aircraft airspeed..
[0035] At step 120, data related to one specific area of interest is selected. In an exemplary case, the specific area of interest is typically related to the manoeuvres performed by the aircraft. for an aircraft, the manoeuvres may be “take-off”, “land” and “fly”. In can exemplary case, sensor data is collected for all the manoeuvres for the sensors placed on the aircraft. In an exemplary case, a sample set of data consisting of about 99837 aircraft landings, each with 160 timestamps and 20 primary and secondary values are considered. In an exemplary case, each data instance is 160 × 20 dimensions, where the first dimension corresponds to the window of time in seconds, and the second dimension is the measured variables. In an exemplary embodiment, since a large number of samples are considered for a specific manoeuvre of aircraft landing, the data may be considered to be “imbalanced” where the values of the sample might not be distributed between the classes representing the ideal aircraft landing manoeuvre and the classes representing the aircraft manoeuvres has anomalies. In an exemplary case, for an aircraft landing manoeuvre, a nominal class (class 0) may be defined to include samples having an ideal aircraft landing, a “speed high” class (class 1) may be defined to include samples having high airspeed during the aircraft landing, a “path high” class (class 2) may be defined to include samples having either a high glidepath or a lower glidepath, a “flap error” (class 3) may be defined to include samples having an error due to flap deployment.
[0036] At step 130, the selected data for a specific area of interest is pre-processed. In an exemplary case, by pre-processing, a class balanced dataset may be generated from the selected data. In an exemplary case, before pre-processing the data, it may be required to classify the obtained data into the pre-defined classes. In an exemplary case, a nominal class may be defined to represent an ideal manoeuvre of the object, a specific set of classes may be defined to represent anomalous manoeuvres, which may be a result due to a specific action/cause. In an exemplary case, for an aircraft landing manoeuvre, a nominal class (class 0) may be defined to include samples having an ideal aircraft landing, a “speed high” class (class 1) may be defined to include samples having high airspeed during the aircraft landing, a “path high” class (class 2) may be defined to include samples having either a high glidepath or a lower glidepath, a “flap error” (class 3) may be defined to include samples having an error due to flap deployment. In an exemplary case, the samples that are selected are classified into any of the above classes/categories. In an exemplary case, by iteratively selecting samples for classification and processing, the selected data chosen for an area of interest is classified into classes and categories.
[0037] At step 140, using the pre-processed data, by iteratively selecting (querying) samples from the dataset, a class balanced dataset is generated. In an exemplary case, for generating a class balanced dataset, an methodology such as active learning model may be utilized. In an exemplary case, the approach adopts a semi-supervised learning strategy where some available data may be used as labelled data, and the remaining data may be treated as unlabelled. In an exemplary case, this allows to iteratively label a subset of the unlabelled data using the trained model, and the model’s accuracy may be evaluated against the rest of the unlabelled data. In an exemplary case, the active learning process assists in enhancing the flight manoeuvre classification model’s accuracy while efficiently using labelled and unlabelled data.
[0038] In an exemplary case, the losses of a training set may be calculated as follows:

where, nl represents the number of labelled points, and Remp(h) is the average loss of all training samples. In an exemplary case, this measure may be referred to as in-sample error or empirical risk because it is derived from a sample of empirical data rather than the entire dataset. In an exemplary case, after training the model, the goal may be to predict outputs for new or unseen data. In an exemplary case, among the generated hypotheses, the best one minimizes the expected loss over the entire input space.
[0039] In an exemplary case, this measure may be referred to as risk or out-of-sample error (R), and it is defined as follows:
In an exemplary case, since the joint distribution P (X, Y) is unknown (i.e., the test dataset is unknown/un-limited), accurately calculating the risk may be difficult. In an exemplary case, the objective is not to minimize the risk itself but to minimize the gap known as the generalization gap between Remp and R. In an exemplary case, this gap can be expressed as follows:

where, |H| represents the size of the hypothesis space, and e is a small number. In the exemplary case, the right-hand side of the Equation indicates that increasing the size of the hypothesis pace (|H| ? 8) results in a larger generalization gap, even if the training error is high. In the exemplary case, conversely, increasing the number of training points improves the results by reducing the generalization gap..
[0040] Reference is now made to Figure 1B, which is an exemplary method for classifying data into pre-defined class/category in accordance with embodiments of the present disclosure. At step 112, a data sub-set is selected for classification from the obtained data which have been classified into pre-defined classes or categories. At step 114, an operation on the data sub-set comprising anomalies is performed. The operation performed is explained with respect to the flowchart of Figure 1C. At step 116, the class balanced dataset is obtained and then at step 118, the generated class balanced dataset is stored in a repository.
[0041] In an exemplary case, it may be ensured that the data sub-set contains anomalies within the pre-defined class. In an exemplary case, the data sub-set may be chosen in a manner that the highest distribution/instances of anomalies may be present in the data sub-set. In the exemplary case, these data sub-set points may be strategically chosen to cover various regions of space, with particular emphasis on uncertain regions. In the exemplary case, by actively exploring and annotating such informative instances, any risk associated with extrapolation may be mitigated and the overall performance of models, such as learning models employes, significantly improved. In an exemplary case, data sub-set from either of class 1 (speed high), class 2 (path high) or class 3 (flap error) may be considered. In the exemplary case, the choice of the data sub-set with larger instances of anomalies in the data sub-set performing an operation for classification of the samples in the data sub-set may be considered to be of considerable importance.
[0042] Reference is now made to Figure 1C, which is an exemplary method for performing an operation on the data sub-set which has anomalies in accordance with embodiments of the present disclosure. At step 114A, the operation performed includes calculating a “moving average” for each anomalous data-point is calculated for a specific time period. At step 114B, a “deviation” for each value from that of calculated “moving average” is calculated. At step 114C, “dimensionality matching” is performed with respect to the sub-set and all the anomalies. At step 114D “min-max” normalization is performed to scale the deviation values to the range of [0, 1]. At step 114E the “outlier separation” is performed in order to identify the outliers with deviations outside two standard deviations from the mean are calculated, and in the data most anomalies and some nominal data would be present. At step 114 F, the duplicates are removed. In the exemplary case, on removal of duplicates from the data sub-set a class balanced data set is obtained.
[0043] Reference is made now to Figure 2A, which is a graph illustrating distribution of class labels in accordance with embodiments of the present disclosure. In the exemplary case, the graph denotes the class distribution of a selected sub-set with respect to “Speed high” class (class 1) of the anomalies. Graph 210 shows in x -axis the classes a nominal class (class 0) may be defined to include samples having an ideal aircraft landing, a “speed high” class (class 1) may be defined to include samples having high airspeed during the aircraft landing, a “path high” class (class 2) may be defined to include samples having either a high glidepath or a lower glidepath, and a “flap error” (class 3) may be defined to include samples having an error due to flap deployment. In the exemplary case, the number of labels in each of these classes may be represented on the y-axis. It may be observed that although the “speed high” class has the highest number of samples classified, the number of samples classified due to “flap error” (class 3) may be also higher than the other classes. It may be understood that the anomalies due to “flap error” impacts the “high speed” class.
[0044] Reference is now made to Figure 2B, which is a graph illustrating distribution of class labels in accordance with embodiments of the present disclosure. In the exemplary case, the graph denotes the class distribution of a selected sub-set with respect to “flap error” class (class 3) of the anomalies. In the exemplary case, graph 220 shows on the x -axis the classes a nominal class (class 0) e defined to include samples having an ideal aircraft landing, a “speed high” class (class 1) defined to include samples having high airspeed during the aircraft landing, a “path high” class (class 2) defined to include samples having either a high glidepath or a lower glidepath and a “flap error” (class 3) defined to include samples having an error due to flap deployment. In the exemplary case, the number of labels in each of these classes is represented on the y-axis. In the exemplary case, it may be observed that although the “flap error” class has the highest number of samples classified, the number of samples classified due to “high speed” (class 1) and “glide slope” error may be almost equivalent.
[0045] Reference is now made to Figure 2C, which is a graph illustrating distribution of class labels in accordance with embodiments of the present disclosure. In an exemplary case, the graph denotes the class distribution of selected sub-set with respect to “glide slope error” class (class 2) of the anomalies. In the exemplary case, graph 230 shows on x -axis the classes a nominal class (class 0) defined to include samples having an ideal aircraft landing, a “speed high” class (class 1) defined to include samples having high airspeed during the aircraft landing, a “path high” class (class 2) defined to include samples having either a high glidepath or a lower glidepath and a “flap error” (class 3) defined to include samples having an error due to flap deployment. In the exemplary case, it may be observed that the distribution of samples between the “high speed” class and the “flap error” class are almost equivalent to each other.
[0046] Reference is now made to Figure 2D, which is a consolidated distribution of all the four classes in accordance with embodiments of the present disclosure. In the exemplary case, graph 240 denotes a class-balanced dataset that may be generated on removal the outliers corresponding to the class balanced dataset that is generated from the data sub-set as explained in step 114 of the flowchart of Figure 1B.
[0047] Reference is now made to Figure 3, which illustrates an exemplary system 300 for classifying data associated with an object manoeuvre in accordance with the present disclosure. The system 300 as illustrated includes a data extraction unit 320, a computing unit 330, an active learning unit 340, a trained classifier 350 and a repository 360. Each of these sub-systems may be implemented using hardware or software or a combination of hardware and software. It should be obvious to a person of ordinary skill in the art that other components and/or modules may be added to the system and all such variations fall within the scope of the present disclosure. In the exemplary system 300, data extraction unit 320 may include a pre-processing unit, filtering unit etc., and the illustration only broadly illustrates some of the main components that may be required for performing embodiments of the present disclosure.
[0048] Data 310 obtained from various locations for the object, wherein data 310 is received at data extraction unit 320. Data extraction unit 320 processes the received data and stores the data as a dataset for a specific object. Based on the requirement, data extraction unit 320 performs basic processing of the data that is received from the objrct and classifies the data. In an exemplary case, the object considered in the present disclosure is an aircraft and the actions are related to manoeuvres that an aircraft typically perform during it’s operational phase. In the exemplary case, an aircraft may have the following manoeuvres such as “take-off”, “land”, “fly” etc. In the exemplary case, each of these manoeuvres is associated with a plurality of actions. In the exemplary case, each aircraft during its operation records data in an aircraft data recorder related to the various actions of the aircraft during its operation. In the exemplary case, the aircraft data recorder receives data from various sensors placed at various locations of the aircraft, and stores the data in them aircraft data recorder. In an alternate embodiment, the recorded data may be transmitted to a server and/or a computing system to be stored and analysed. In am exemplary case, sensors placed at ailerons, rudders, spoilers, flaps, elevators, slats, horizontal stabilizers etc., of an aircraft may be configured to collect data related to the aircraft movement, and the data is transmitted from the sensor to the data recorder or a repository, wherein the data is recorded or stored. In an exemplary case, some data collected by the sensors placed in the aircraft may be used to compute secondary values such as aircraft speed, angle of attack, air pressure, glideslope deviation, roll angle, pitch angle etc. In the exemplary case, the data may also recorded in an aircraft data recorder or any other recording device, which may be external to the aircraft, or may be transmitted from the aircraft data recorder to an external storage system that may be configured to perform computations. In the exemplary, by using the recorded data (collected data), a classification of the data may be performed and the data may be preferably classified into certain pre-defined classes or categories. In the exemplary case, one set of data may be classified into multiple categories.
[0049] In the exemplary case, the pre-defined classes or category may be defined on the basis of the actions or manoeuvres associated with the objects or on the basis of specific actions related to manoeuvres performed by the object. In the exemplary case, landing of an aircraft as a manoeuvre is considered in the present disclosure. In the exemplary case, the data recorded by the sensors placed in the ailerons, rudders, spoilers, flaps, elevators, slats, horizontal stabilizers, etc., of an aircraft may be collected and classified into specific classes. In the exemplary case, for an aircraft landing manoeuvre, a nominal class (class 0) may be defined to include samples having an ideal aircraft landing, a “speed high” class (class 1) may be defined to include samples having high airspeed during the aircraft landing, a “path high” class (class 2) may be defined to include samples having either a high glidepath or a lower glidepath, a “flap error” (class 3) may be defined to include samples having an error due to flap deployment. In the exemplary case, it may be understood that the samples may be classified into more than one class. In the exemplary case, the data set is class-imbalanced in nature and needs to be class balanced dataset.
[0050] Computing unit 330 enables the next set of actions to process the class-unbalanced dataset and generate a class balanced dataset. Computing unit 330 chooses a data sub-set from the class unbalanced dataset and initiates processing to tge class imbalanced dataset to obtain a class balanced dataset. In the exemplary case, it may be ensured that the data sub-set contains anomalies within any of the pre-defined class. In the exemplary case, the data sub-set may be chosen in a manner that the highest distribution/instances of anomalies may be present in the data sub-set. In the exemplary case, these data sub-set points may be strategically chosen to cover various regions of the space, with particular emphasis on uncertain regions. Computing unit 330 may be configured to perform an operation on the data sub-set that includes anomalies. In the exemplary case, the operation performed is computing a “moving average” for each anomalous data-point at a specific time period. Computing unit 330 computes a “moving average” for each anomalous data-point for a specific time period. A “deviation” associated with of each value from the computed “moving average” is determined. Computing unit 330 then performs “dimensionality matching” with respect to the sub-set and all the anomalies. Computing unit 330 is then configured to perform “min-max” normalization to scale the deviation values to the range of [0, 1]. Computing unit 330 is then configured to determine the “outlier separation” in order to identify the outliers with deviations outside two standard deviations from the mean where most anomalies and some nominal data would be present. Computing unit 330 is configured to remove the duplicates and a class balanced data set from the data sub-set may be obtained. In the exemplary case, using the class balanced data set, the rest of the dataset may be classified, which may be performed using the Active Learning Unit (ALU) 340. It should be obvious that other models may be used in place of and in addition to the ALU, and all such variation for the ALU fall within the scope of the present disclosure. ALU as indicated in the present disclosure is only exemplary in nature and should not be construed as limiting in any sense.
[0051] Active Learning Unit 340 uses either of uncertainty-based active learning methods or disagreement-based active learning methods to classify rest of the obtained data set. In an exemplary case, uncertainty sampling methods such as Least Confidence Method, Margin Sampling, and Entropy Sampling may be used for training the active learning model. However, the methods deployed may be dependent on choice of samples. It should be obvious to a person of ordinary skill that the methods may be replaced by other methods and all such variations fall within the scope of the present disclosure.
[0052] In an exemplary case, using the Least Confidence method, the samples selected for annotation for the current model will be the least confident sample. In an exemplary case, by focusing on the samples that the model is uncertain about, more informative labels may be obtained and the model’s performance may be improved in those regions of the feature space where the model is applied. In the least confidence method, each unlabelled sample may be assigned a confidence score based on the model’s prediction probabilities. The confidence score represents the model’s certainty or confidence in its prediction for that sample. The samples with the lowest confidence scores, indicating the least confidence, may be selected for annotation. For probabilistic learning methods in binary classification, the active learner queries points with a posterior probability close to 0.5 for being positive. In multi-class problems, the general formula for computing the least confidence may be given by: y

where x* is the least confident instance, = arg maxy ) represents the class label of x with the highest posterior probability using the model h, and is the conditional class probability of class y given the unlabelled point x.
[0053] The least Confidence method only considers information about the most likely label(s) and disregards information about the remaining distribution. To address this limitation, the margin sampling method was introduced. Margin sampling is based on the concept of uncertainty and aims to select samples that lie near the decision boundary of the model. In a multi-class classification, the margin sampling method computes the difference between each class’s highest predicted probability and a second highest predicted probability. This difference represents the” margin” or the level of uncertainty in the prediction for a given sample across multiple classes. Samples with smaller margins, where the top two predicted probabilities are closer in value may be considered more uncertain and may be selected for annotation. Margin Sampling calculates the margin between the first and second most probable class labels as follows:

Where and represent the first and second most probable class labels under the model h. Querying about the labels of such instances may improve the model’s ability to discriminate between these closely ranked classes, i.e., a small margin indicates that the trained model (h) may be disadvantageous to distinguish between the two most likely classes, such as an overlapping classes.
[0054] Entropy sampling is commonly used for both binary and multi-class classification problems. Entropy sampling method selects samples based on the entropy of a predicted class probabilities. Entropy is a measure of uncertainty or information content. In the context of active learning, the entropy of a sample’s predicted class probabilities quantifies the amount of uncertainty associated with a selected sample. Higher entropy indicates higher uncertainty, which implies that the model may have lesser confident in its predictions for that sample.
Entropy sampling is particularly effective when regions are in the feature space where the model is uncertain, or the classes are highly overlapping. By focusing on samples with high entropy, the active learning process can effectively explore and query informative instances that may be likely to improve the model’s overall performance. Instances with the highest entropy value may be selected for querying while employing this model.
[0055] Active Learning Unit 340 uses disagreement-based active learning methods to classify rest of the obtained data set. For active learning, vote entropy sampling may be employed, which is based on the concept of entropy, and measures the uncertainty or information content of a probability distribution. In the context of active learning, vote entropy sampling aims to select the samples expected to provide the learning model’s most informative or uncertain information. Typically initializing an ensemble of models, train the models on different subsets of labelled data, and calculate the entropy of the vote distribution for each unlabelled sample. The samples with higher vote entropy may then be selected for annotation using a query strategy like Least Confidence Method.
[0056] A degree of disagreement amongst the data sets in the vote entropy method can be expressed as follows:

where the variable yi represents the set of all possible labels. The variable m corresponds to the number of classifiers, equivalent to the number of committee members in the query-by-committee (QBC) approach. Lastly, V(yi) represents the number of votes received by a particular label yi based on the predictions of all the classifiers in the committee.
[0057] In max disagreement sampling method, the datasets may be trained classifiers or models trained on different labelled data subsets. The disagreement among the data sets is measured based on the maximum disagreement. The samples with the highest disagreement among the data sets are selected for annotation, as they are considered the most uncertain or informative samples. The mathematical expressions are explained below.
[0058] For each unlabelled sample x, calculate the predicted class probabilities for x using each data set model:

For each class Cj, calculate the disagreement score D (x, Cj) as:

and select6 sample x with the maximum disagreement score:

[0059] In a KL Max Disagreement Sampling the data set are again trained classifiers or models, but the disagreement is measured using the Kullback-Leibler (KL) divergence. KL divergence is a measure of the difference between two probability distributions. The samples with the highest KL divergence indicating the greatest discrepancy in the predictions of the committee members are selected for annotation.
[0060] For each unlabelled sample x, calculate the predicted class probabilities for x using each data set model: P (x, Ci) for i = 1 to N, where Ci represents the i-th class. For each pair of data set (i, j), calculate the KL divergence score KL (x, Ci,Cj) as:

For each class Ck, calculate the disagreement score D (x, Ck) as:

And, select sample x with the maximum disagreement score:

Accordingly, based on the requirement a combination of the above method can be used by the active learning unit 340. On completion of creation of the class balanced dataset from the class unbalanced dataset, the model is ready for classification of any new data that is received, once the system is trained.
[0061] Trained classifier 350 and repository 360 typically utilizes the trained models to classify the incoming data (also referred to as received data or collected data) and enables classification of datasets. The repository stores the relevant class balanced data and stores the models.
[0062] Although the present disclosure has been described with reference to several preferred embodiments, it should be understood that the present disclosure is not limited to the preferred embodiments disclosed here. Embodiments of the present disclosure are intended to cover various modifications and equivalent arrangements within the spirit and scope of the appended claims. Although the foregoing disclosure has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practised within the scope of the appended claims. Examples of the present disclosure have been described in language specific to structural features and/or methods. It should be noted that there are many alternative ways of implementing both the process and apparatus of the present invention. Accordingly, embodiments of the present disclosure are to be considered illustrative and not restrictive, and the invention is not to be limited to the details given herein but may be modified within the scope and equivalents of the appended claims. It should be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed and explained as examples of the present disclosure. , Claims:We Claim:

1. A method of classifying data associated with manoeuvre of an object, the method comprising:
- obtaining 110 as input data associated with manoeuvre of an object110, wherein the data is obtained from a plurality of locations of the object, when the object is in motion;
- selecting 120 data associated with at least one area of interest from the plurality of locations of the object, wherein the data is an imbalanced set;
- pre-process 130 the data by generating a class balanced dataset from the data, and classifying the data into a pre-defined class/category; and
- generating 140 a class balanced dataset from the pre-processed data.

2. The method as claimed in claim 1, wherein the data is obtained from a plurality of sensors placed on the object.

3. The method as claimed in claim 2, wherein the plurality of sensors is configured to:
- continuously record data associated with the plurality of locations during motion of the object;
- store the data in a repository for further processing, wherein further processing includes pre-processing the data.

4. The method as claimed in claim 3, wherein the data obtained is classified to more than one pre-defined class/category.

5. The method as claimed in claim 4, comprises:
- selecting 112 a data sub-set from the pre-defined class/category, wherein the data sub-set comprise anomalies within the pre-defined class;
- performing 114 an operation on the data sub-set comprising anomalies;
- obtaining 116 a class balanced dataset, and
- storing the 118 class balanced dataset in a repository.

6. The method as claimed in claim 1, wherein a module (AL) is provided with class balanced dataset and the trained with the dataset.

8. A system for classifying data associated with a manoeuvre of an object, the system comprising:
- a data extraction module 310 configured to obtain input data associated with a manoeuvre of an object, wherein the data is obtained from a plurality of sensors placed at various locations of the object, when the object is in motion;
- a computing module 320 configured to select data associated with at least one area of interest from the plurality of locations of the object, wherein the data is an imbalanced set;
- an first module 330 configured to pre-process the selected data by generate a class balanced dataset from the selected data, and classifying the data into a pre-defined class/category;
- a second module 340 configured to generate a class balanced dataset from the pre-processed data; and
- a repository 350 configured to store class-balanced dataset.

9. The system as claimed in claim 8, wherein the data is obtained from a plurality of sensors placed on the object.

10. The system as claimed in claim 9, wherein the plurality of sensors are configured to:
- continuously record data associated with the plurality of locations during motion of the object;
- store the data in a repository for further processing, wherein further processing includes pre-processing the data.

11. The system as claimed in claim 10, wherein the data obtained is classified to more than one pre-defined class/category.

12. The system as claimed in claim 8, wherein the computing module is configured to:
- select a data sub-set from the pre-defined class/category, wherein the data sub-set comprise anomalies within the pre-defined class;
- perform an operation on the data sub-set comprising anomalies;
- obtain a class balanced dataset, and store the class balanced dataset in a repository.

13. The system as claimed in claim 8, wherein the trained classifier module is configured to receive a class balanced dataset and the active learning module is trained with the dataset.

Dated this 12th day of April 2024 Indian Institute of Science
By their Agent & Attorney

Dr. Eric W B Dias/Reg No 1058
of Khaitan & Co

Documents

Application Documents

#	Name	Date
1	202441029683-STATEMENT OF UNDERTAKING (FORM 3) [12-04-2024(online)].pdf	2024-04-12
2	202441029683-REQUEST FOR EARLY PUBLICATION(FORM-9) [12-04-2024(online)].pdf	2024-04-12
3	202441029683-PROOF OF RIGHT [12-04-2024(online)].pdf	2024-04-12
4	202441029683-POWER OF AUTHORITY [12-04-2024(online)].pdf	2024-04-12
5	202441029683-FORM-9 [12-04-2024(online)].pdf	2024-04-12
6	202441029683-FORM-8 [12-04-2024(online)].pdf	2024-04-12
7	202441029683-FORM FOR SMALL ENTITY(FORM-28) [12-04-2024(online)].pdf	2024-04-12
8	202441029683-FORM 18A [12-04-2024(online)].pdf	2024-04-12
9	202441029683-FORM 1 [12-04-2024(online)].pdf	2024-04-12
10	202441029683-FIGURE OF ABSTRACT [12-04-2024(online)].pdf	2024-04-12
11	202441029683-EVIDENCE OF ELIGIBILTY RULE 24C1f [12-04-2024(online)].pdf	2024-04-12
12	202441029683-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [12-04-2024(online)].pdf	2024-04-12
13	202441029683-EVIDENCE FOR REGISTRATION UNDER SSI [12-04-2024(online)].pdf	2024-04-12
14	202441029683-EDUCATIONAL INSTITUTION(S) [12-04-2024(online)].pdf	2024-04-12
15	202441029683-DRAWINGS [12-04-2024(online)].pdf	2024-04-12
16	202441029683-DECLARATION OF INVENTORSHIP (FORM 5) [12-04-2024(online)].pdf	2024-04-12
17	202441029683-COMPLETE SPECIFICATION [12-04-2024(online)].pdf	2024-04-12
18	202441029683-FER.pdf	2024-05-06
19	202441029683-RELEVANT DOCUMENTS [30-05-2024(online)].pdf	2024-05-30
20	202441029683-POA [30-05-2024(online)].pdf	2024-05-30
21	202441029683-FORM 13 [30-05-2024(online)].pdf	2024-05-30
22	202441029683-OTHERS [02-08-2024(online)].pdf	2024-08-02
23	202441029683-FER_SER_REPLY [02-08-2024(online)].pdf	2024-08-02
24	202441029683-COMPLETE SPECIFICATION [02-08-2024(online)].pdf	2024-08-02
25	202441029683-CLAIMS [02-08-2024(online)].pdf	2024-08-02
26	202441029683-Proof of Right [14-08-2024(online)].pdf	2024-08-14

Search Strategy

1	SearchHistoryE_03-05-2024.pdf