Method And System For Generating Training Data For Training A Machine

< Back

Method And System For Generating Training Data For Training A Machine Learning Data Model

Abstract: Quality and accuracy of predictions generated by a Machine Learning (ML) model depends on quality/accuracy of training data used for training and generating the ML model, in addition to other parameters. State of the art systems for generating the training data may miss out on identifying all the objects present in a video/image being processed, which in turn affects quality of the training data. The disclosure herein generally relates to machine learning, and, more particularly, to a method and a system for generating training data for training a machine learning data model. The system determines a geometric constraint of objects present in each of a plurality of frames being processed, and further determines at least one fitting technique that matches the determined geometric constraint. The determined at least one fitting technique is used to generate an enhanced training dataset, which can be used to train a ML model. [To be published with FIGS. 2A and 2B]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

28 January 2020

Publication Number

31/2021

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

kcopatents@khaitanco.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-04-23

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point Mumbai 400021 Maharashtra, India

Inventors

1. SHARMA, Hrishikesh

Tata Consultancy Services Limited TCS Innovation Labs, 7th floor, ODC-4, Gopalan Global axis H block, KIADB Export Promotion Area, Whitefield Bengaluru 560066 Karnataka, India

2. GHOSH, Hiranmay

Tata Consultancy Services Limited TCS Innovation Labs, 7th floor, ODC-4, Gopalan Global axis H block, KIADB Export Promotion Area, Whitefield Bengaluru 560066 Karnataka, India

3. PURUSHOTHAMAN, Balamuralidhar

Tata Consultancy Services Limited TCS Innovation Labs, 7th floor, ODC-4, Gopalan Global axis H block, KIADB Export Promotion Area, Whitefield Bengaluru 560066 Karnataka, India

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR GENERATING TRAINING DATA FOR TRAINING A MACHINE LEARNING DATA
MODEL
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD [001] The disclosure herein generally relates to machine learning, and, more particularly, to a method and a system for generating training data for training a machine learning data model.
BACKGROUND
[002] Development in the field of machine learning has significantly aided industrial automation program. Machines (robots) are trained to perform specific tasks and the machine learning and Artificial Intelligence (AI) allow the machines to have decision making capabilities required for completing the assigned task(s).
[003] Such machines are equipped with one or more machine learning models (or data models) which are data models generated by using a training mechanism using appropriate training data. In an operating environment, the data models are responsible for processing data collected in real-time from surrounding environment, and for generating dynamic predictions towards performing the assigned task(s). Accuracy with which a data model can generate the recommendations largely depends on quality as well quantity of the training data being used.
[004] For example, when a machine learning model is required for performing change detection in a scene over a period of time, the model is to be trained to identify objects in the scene. In this use-case, if the machine learning model fails to identify any object the scene, the same would adversely affect final result (i.e. change detection). Existing mechanisms in the field of generating training data may miss out annotation of certain objects in a video or image being processed, especially objects that are smaller in size in comparison with other objects in the scene. Even if all the objects are annotated, location of the identified objects may not be marked properly in the frames.
[005] While annotating objects, close-fitting annotation of small objects, typically ranging around 30x30 pixels, is practically difficult. Region of Interest (RoI) cannot be zoomed beyond a point to ensure snug-fitting of annotation. This

is because due to low resolution blurring happens, which in turn leads to ill-formed contours and hence poor drawing of the bounding boxes. Area-wise, such labeling noise was found to be leading to upto 35% error in labeling. Presence of the labeling noise in a training data leads to poor training and poor subsequent prediction by a Machine Learning (ML) model.
SUMMARY
[006] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor implemented method for generating training data for training a machine learning model is provided. A primary training dataset is generated by extracting a plurality of frames from a video of a scene, via one or more hardware processors. Further, an annotated bounding box is drawn around each of a plurality of objects in each of the plurality of frames, via the one or more hardware processors. Further, a secondary training dataset is generated by determining coordinates of top-left and bottom-right corners of the bounding box around each of a plurality of objects in each of the plurality of frames, via the one or more hardware processors. Then a geometric specification of each of the plurality of objects is determined, using a domain knowledge data matching the scene in the video, via the one or more hardware processors. Further, at least one geometric constraint matching the determined geometric specification is determined, via the one or more hardware processors. Further, at least one fitting technique matching the determined at least one geometric constraint is determined, via the one or more hardware processors. Then an enhanced training dataset is generated by processing the secondary training dataset using the determined at least one fitting technique, via the one or more hardware processors.
[007] In another aspect, a system for training a machine learning model is provided. The system includes one or more hardware processors, one or more communication interfaces, and a memory storing a plurality of instructions. The plurality of instructions when executed cause the one or more hardware processors to generate a primary training dataset by extracting a plurality of

frames from a video of a scene. Further, an annotated bounding box is drawn around each of a plurality of objects in each of the plurality of frames. Further, a secondary training dataset is generated by determining coordinates of top-left and bottom-right corners of the bounding box around each of a plurality of objects in each of the plurality of frames. Then a geometric specification of each of the plurality of objects is determined, using a domain knowledge data matching the scene in the video. Further, at least one geometric constraint matching the determined geometric specification is determined. Further, at least one fitting technique matching the determined at least one geometric constraint is determined. Then an enhanced training dataset is generated by processing the secondary training dataset using the determined at least one fitting technique.
[008] In yet another aspect, a non-transitory computer readable medium for generating training data for training a machine learning model is provided. A primary training dataset is generated by extracting a plurality of frames from a video of a scene, via one or more hardware processors. Further, an annotated bounding box is drawn around each of a plurality of objects in each of the plurality of frames, via the one or more hardware processors. Further, a secondary training dataset is generated by determining coordinates of top-left and bottom-right corners of the bounding box around each of a plurality of objects in each of the plurality of frames, via the one or more hardware processors. Then a geometric specification of each of the plurality of objects is determined, using a domain knowledge data matching the scene in the video, via the one or more hardware processors. Further, at least one geometric constraint matching the determined geometric specification is determined, via the one or more hardware processors. Further, at least one fitting technique matching the determined at least one geometric constraint is determined, via the one or more hardware processors. Then an enhanced training dataset is generated by processing the secondary training dataset using the determined at least one fitting technique, via the one or more hardware processors.

[009] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[010] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[011] FIG. 1 illustrates an exemplary system for generating enhanced training data, according to some embodiments of the present disclosure.
[012] FIGS. 2A and 2B (collectively referred to as FIG. 2) is a flow diagram depicting steps involved in the process of generating the enhanced training data using the system of FIG. 1, according to some embodiments of the present disclosure.
[013] FIG. 3 is a flow diagram depicting steps involved in the process of determining and selecting a domain knowledge data for determining a geometric specification of objects, using the system of FIG. 1, according to some embodiments of the present disclosure.
[014] FIG. 4 is an example diagram depicting line fitting done by the system of FIG. 1 for generating the enhanced training dataset, according to some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS [015] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed

embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims.
[016] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 4, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[017] FIG. 1 illustrates an exemplary system 100 for generating enhanced training data, according to some embodiments of the present disclosure. The system 100 includes one or more hardware processors 102, communication interface(s) or input/output (I/O) interface(s) 103, and one or more data storage devices or memory 101 operatively coupled to the one or more hardware processors 102. The one or more hardware processors 102 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, graphics controllers, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) are configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[018] The communication interface(s) 103 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the communication interface(s) 103 can include one or more ports for connecting a number of devices to one another or to another server.
[019] The memory 101 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random

access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more components (not shown) of the system 100 can be stored in the memory 101. The memory 101 is configured to store operational instructions which when executed cause one or more of the hardware processor(s) 102 to perform various actions associated with the generation of the enhanced training data, being handled by the system 100. The various steps involved in the process of generating the enhanced training data are explained with description of FIG.2 and FIG. 3. All the steps in FIG. 2 and FIG. 3 are explained with reference to the system of FIG. 1.
[020] FIGS. 2A and 2B (collectively referred to as FIG. 2) is a flow diagram depicting steps involved in the process of generating the enhanced training data using the system of FIG. 1, according to some embodiments of the present disclosure. The system 100 collects a video of a scene as input, from an image/video sensor which may be internally or externally associated with the system 100 via the communication interface(s) 103. In an alternate embodiment, one or more images are collected as input.
[021] The collected video is processed to extract a plurality of frames, and generates (202) a preliminary training dataset. The extracted frames form the preliminary training dataset. The system 100 further draws (204) a bounding box each, around each of a plurality of objects in each of the plurality of frames, using any suitable technique/application. In an embodiment, the bounding boxes are rectangular in shape. However, bounding boxes of any appropriate shape maybe used.
[022] The system 100 further determines coordinates of top-left and bottom-right corners of each of the bounding boxes, and this information is used to generate (206) a secondary training dataset. The information on the coordinates forms/generates the secondary training dataset. In each of the frames, one or more objects may be present, and size and shape of each of the objects may be different. In the next step, the system 100 determines a geometric specification of each of

the objects. The term ‘geometric specification’ refers to the size, shape, relative positioning with respect to other instances and other physical characteristics of an object being considered. The geometric specification is determined by the system 100 based on a domain knowledge data matching scene captured in the frame being processed. The domain knowledge data is scene specific, and contains one or more images of the scene captured at an earlier point of time, information pertaining to all objects in the scene, information pertaining to the geometric specification of each of the objects and so on. The domain knowledge data acts as a point of reference to the system 100, and the system 100 can determine the geometric specification of the objects based on the domain knowledge data. The domain knowledge data includes geometric domain knowledge which imposes constraints on layout of object instances or object parts in the frames, via various geometric constructs such as parametric curves and parametric surfaces. Such geometric constraints are provided via an ontological specification or rule-based specification (both of which can encode constraints).
[023] In an embodiment, a plurality of domain knowledge databases are stored in the memory. Each of the plurality of domain knowledge databases contain the domain knowledge data matching a particular scene When processing one or more frames, the system 100 identifies/interprets the scene in the one or more frames, and then automatically determines one or more matching domain knowledge databases. For example, if the scene is of a railway track (as in FIG. 4), the system 100 searches in the memory 101 and determines the domain knowledge database matching the scene (i.e. railway track) from the plurality of domain knowledge databases stored in the memory 101. In another embodiment, the domain knowledge database matching a scene is determined based on a user input configured with the system 100. In various embodiments, the user input for determining the domain knowledge data is pre-configured or dynamically configured with the system 100. Further the system 100 uses the domain knowledge data (also referred to as ‘determined domain knowledge data’) from the determined one or more domain knowledge databases.

[024] Further, the system 100 determines (210) at least one geometric constraint that matches the determined geometric specification, based on the determined domain knowledge data. Examples of the geometric constrains are, but not limited to, line constraint and parabola constraint. For example, from the scene in FIG. 4, which is of a railway track, the system 100 may detect at least the line constraint. If the image is a power-grid, at least the parabola constraint is determined as present in the scene. Further, for the determined at least one geometric constraint, the system 100 determines at least one matching fitting technique. The at least one fitting technique is used as a compensation technique by the system 100 so as to process the secondary training dataset. Examples of the fitting techniques that may be used by the system 100 are, but not limited to, line fitting and Hough-based methods. By applying the at least one matching fitting technique on the secondary training dataset, the system 100 adjusts/corrects the coordinates of each of the bounding boxes in such a way that closer-to-exact location of each of the objects is captured by the corresponding bounding box i.e. the object is inscribed within the bounding box. In an embodiment, all the coordinates of each of the bounding boxes are adjusted/corrected. In another embodiment, only selected coordinates are adjusted/corrected. For example, in a rectangular bounding box, if coordinates of x and y, two diametrically opposite corners, are known, the other two corner coordinates can be calculated. So in this case coordinates of only the two diametrically opposite corners are adjusted/corrected.
[025] The system 100 generates (214) an enhanced training dataset which includes information pertaining to the adjusted/corrected coordinates of bounding box of each of the objects. The correction/adjustment of the coordinates of the bounding boxes around each of the objects reduces/eliminates the labeling noise, which in turn allows the system 100 to identify position/locations of the objects accurately. The enhanced training dataset can be further used to train (216) a machine learning model for any suitable application. In various embodiments, steps in method 200 may be performed in the same order in any alternate order

that is technically feasible. In another embodiment, one or more steps in the method 200 may be omitted.
[026] When the objects in the railway track image as in FIG. 4 are joint-plates/clips along a rail track, the system 100 fits a line along each of the corners. Since each rectangle can be characterized by position of two corner locations (top-left and bottom-right corners), such lines are fitted. Two lines each are fit to the two corners of each bounding box, within the two groups of bounding boxes, one abutting left side track, and the other abutting right side track. The line fitting may be done using least-squares fitting or any such suitable technique which reduces the labeling noise. After fitting the line, a perpendicular projection of the original corner positions is taken onto the fitted line, to derive the estimated/predicted position of that corner, without noise. Based on the predicted corner positions, the coordinates of the bounding boxes are adjusted accordingly, to generate the enhanced training dataset.
[027] FIG. 3 is a flow diagram depicting steps involved in the process of determining and selecting a domain knowledge data for determining a geometric specification of objects, using the system of FIG. 1, according to some embodiments of the present disclosure. The one or more frames are processed by the system 100 using appropriate technique(s) to determine (302) scene in the frames. Further, a domain knowledge data matching the determined scene is determined (304) from a plurality of domain knowledge data stored in the memory 101. Based on contents of the determined domain knowledge data, geometric specification of each of the objects in the scene is determined (306). In various embodiments, steps in method 300 may be performed in the same order in any alternate order that is technically feasible. In another embodiment, one or more steps in the method 300 may be omitted. Experimental Results: 1. Dataset used:
[028] During the experiments, two datasets of infrastructural systems were used. The first dataset consists of 327 frames of railway network, having 2209 annotated small objects (joint plates). This dataset was collected from a

video available in a public platform. Non-periodic frames were carefully shortlisted (e.g. 1st, 10th, 24th, …), such that point of view of camera (mounted on a drone) keeps on changing non-trivially, while ensuring upto 70% overlap between two successive shortlisted frames. The second dataset contains 355 frames, and has a different rail network technology, leading to 2147 annotated clips being the small components of interest. Clips, though equally small, have a very different appearance from the joint plates. Each dataset is then split into training-validation subsets, using 80:20 ratio. 2. Experiments:
[029] A series of 4 experiments were conducted to prove that the generation of the enhanced training dataset by the system 100 improves a detection performance i.e. detection of objects in the frames while being within the timing budget. Further, to establish its generality, more experiments were performed along two dimensions. In one dimension, a core Deep Neural Network (DNN) module/detection algorithm was varied while keeping the dataset same/fixed, and the results were checked to determine whether a positive mapping gain was obtained. In the second dimension, the core DNN module remains fixed, the dataset is changed, and the result is checked again. Both the first dataset and second dataset were used during the experiments.
[030] Throughout this section, PASCAL VOC mAP metrics are highlighted (at IoU threshold 0.5), though as can be seen, the results are consistent at other threshold values also. Also, it is obvious that 50% area-wise overlap can lead upto 71% overlap in bounding box edge length. This implies that the bounding boxes detected at IoU=0.5 threshold do significantly and meaningfully overlap with ground truth boxes.

GT unaligned, GT aligned, Pred aligned Pred aligned
Pred Pred after after
unaligned unaligned Experiment 1 experiment 2
IOU mAP mAP mAP mAP

Threshold (Experiment 1) (Experiment 2) (Experiment 3) (Experiment 4)
0.1 0.8177 0.9038 0.8177 0.9038
0.3 0.8177 0.9038 0.8177 0.9038
0.5 0.811 0.8839 0.816 0.8839
0.7 0.6501 0.6521 0.6101 0.6522
0.9 0.0712 0.0381 0.0301 0.0312
Table.1 – Performance across 4 experiments
[031] The results of this set of 4 experiments are shown in Table 1. From this table, it is clear that experiment 2 gives a clear performance advantage over the baseline, across all IoU thresholds. Only at the extreme threshold of 0.9, is this improvement inverted. This inversion can be attributed to the following. When FIG. 4 is closely observed, an inconspicuous sub-pattern can be spotted. That is, every alternate appearing joint plate alongside either the left or the right track does not have (protruding) hooks attached to both its ends. Hence there are two sets of bounding boxes with aspect ratios that are close-by but not same. Hence fitting a single line along e.g. top-left corners of all joint plates introduces noise for some corners, in form of extra offset. At a very high IoU threshold, this minor difference gets highlighted. When the two groups are modelled as separate patterns, and two different lines are fitted through e.g. their top-left corners instead of one, an increase in mAP between experiment 1 and experiment 2 was achieved.
[032] The perceived lack of performance improvement in experiments 3 and 4 can be explained as follows. In the contemporary single-stage DNN detection pipelines, the localization task under the multi-task setup is generally a regression task head. The regression task, most often modeled as linear regression, makes the inherent assumption of localization residues/offsets (akin to prediction noise) being Gaussian. The model fitting process then proceeds to reduce this noise via maximum likelihood estimation. Since the noise is already suppressed

within the regression task head within the DNN, most likely one will gain no more advantage by suppressing the same once more outside of the DNN, which is what we observe. So hereafter only the first and second experiments are considered.
[033] Now the dataset and small object class within are changed and the core DNN detection mechanism is fixed. The first and second experiments were repeated over the second dataset, having small clips as objects of interest. For lack of space, results are presented only for the PASCAL VOC metric (IoU threshold = 0.5), as in Table 2.

GT unaligned, Pred unaligned GT aligned, pred unaligned
mAP (Experiment 1) mAP (Experiment 2)
0.811 0.863
Table. 2 – Performance gain over different dataset
[034] An improvement in mAP metric was again detected, though just a shade less (~5%). Manual fine-tuning of annotation bounding boxes, when done, is expected to further improve the results.
[035] In order to assess timing advantage of using the system 100, a detector called RetinaNet was used. Timing performance is measured as the (inference) time taken during a forward pass. The standalone mAP performance of RetinaNet is in top-15% among publicly available options, over COCO dataset. The results are shown in Table 3. Usage of RetinaNet also establishes the generality of the approach taken by system 100.

GT unaligned, Pred unaligned GT aligned, Pred unaligned Prediction time (ms)
Model mAP (Experiment 1) mAP (Experiment 2)

SSD+MobileNetv2 0.81 0.88 27.1

RetinaNet 0.84 0.88 47.3
Table. 3 – Corroboration of performance gain using different DNN
[036] As can be seen, marginal mAP improvement was obtained in the baseline experiment # 1 using advanced DNNs such as RetinaNet, the inference time almost doubles up. This reduces the frame processing speed from around 37 fps to around 21 fps. Compensation for labeling noise gives better mAP (88%), slightly better than RetinaNet baseline, without adding to the prediction time. As a side observation, it can be seen that even while using another DNN (RetinaNet), the mAP performance does non-trivially increase, the labeling noise is compensated. This also establishes generality of the approach of system 100.
[037] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[038] The embodiments of present disclosure herein addresses unresolved problem of improving training data used for training a machine learning model. The embodiment, thus provides a mechanism to generate an enhanced training data corresponding to a preliminary training dataset. Moreover, the embodiments herein further provides a mechanism for generating the enhanced training dataset based on a domain knowledge data.
[039] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any

combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[040] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[041] The illustrated steps are set out to explain the exemplary
embodiments shown, and it should be anticipated that ongoing technological
development will change the manner in which particular functions are performed.
These examples are presented herein for purposes of illustration, and not
limitation. Further, the boundaries of the functional building blocks have been
arbitrarily defined herein for the convenience of the description. Alternative
boundaries can be defined so long as the specified functions and relationships
thereof are appropriately performed. Alternatives (including equivalents,
extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item

or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[042] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[043] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method (200) for generating training data for
training a machine learning model for object detection, comprising:
generating (202) a primary training dataset by extracting a plurality
of frames from a video of a scene, via one or more hardware
processors;
drawing (204) an annotated bounding box around each of a
plurality of objects in each of the plurality of frames, via the one or
more hardware processors;
generating (206) a secondary training dataset by determining
coordinates of top-left and bottom-right corners of the bounding
box around each of a plurality of objects in each of the plurality of
frames, via the one or more hardware processors;
determining (208) a geometric specification of each of the plurality
of objects, using a domain knowledge data matching the scene in
the video, via the one or more hardware processors;
determining (210) at least one geometric constraint matching the
determined geometric specification, via the one or more hardware
processors;
determining (212) at least one fitting technique matching the
determined at least one geometric constraint, via the one or more
hardware processors; and
generating (214) an enhanced training dataset by processing the
secondary training dataset using the determined at least one fitting
technique, via the one or more hardware processors.
2. The method as claimed in claim 1, wherein generating the enhanced
training dataset comprises adjusting the coordinates of the bounding box
around each of a plurality of objects in each of the plurality of frames.

3. The method as claimed in claim 1, wherein determining the geometric
specification comprises:
determining (302) at least one scene in each of the plurality of frames;
determining (304) a domain knowledge database matching the at
least one scene, from a plurality of domain knowledge databases;
and determining (306) the geometric specification based on the domain
knowledge data in the domain knowledge database determined as
matching the at least one scene.
4. A system (100) for training a machine learning model for object detection,
comprising:
one or more hardware processors (102);
one or more communication interfaces (103); and
a memory (101) storing a plurality of instructions, wherein the
plurality of instructions when executed cause the one or more
hardware processors (102) to:
generate (202) a primary training dataset by extracting a plurality
of frames from a video of a scene;
draw (204) an annotated bounding box around each of a plurality
of objects in each of the plurality of frames;
generate (206) a secondary training dataset by determining
coordinates of top-left and bottom-right corners of the bounding
box around each of a plurality of objects in each of the plurality of
frames;
determine (208) a geometric specification of each of the plurality
of objects, using a domain knowledge data matching the scene in
the video;
determine (210) at least one geometric constraint matching the
determined geometric specification;

determine (212) at least one fitting technique matching the
determined at least one geometric constraint; and
generate (214) an enhanced training dataset by processing the
secondary training dataset using the determined at least one fitting
technique.
5. The system (100) as claimed in claim 4, wherein the system (100) generates the enhanced training dataset by adjusting the coordinates of the bounding box around each of a plurality of objects in each of the plurality of frames.
6. The system (100) as claimed in claim 4, wherein the system (100) determines the geometric specification by:
determining (302) at least one scene in each of the plurality of frames;
determining (304) a domain knowledge database matching the at least one scene, from a plurality of domain knowledge databases; and determining (306) the geometric specification based on the domain knowledge data in the domain knowledge database determined as matching the at least one scene.

Documents

Application Documents

#	Name	Date
1	202021003758-STATEMENT OF UNDERTAKING (FORM 3) [28-01-2020(online)].pdf	2020-01-28
2	202021003758-REQUEST FOR EXAMINATION (FORM-18) [28-01-2020(online)].pdf	2020-01-28
3	202021003758-FORM 18 [28-01-2020(online)].pdf	2020-01-28
4	202021003758-FORM 1 [28-01-2020(online)].pdf	2020-01-28
5	202021003758-FIGURE OF ABSTRACT [28-01-2020(online)].jpg	2020-01-28
6	202021003758-DRAWINGS [28-01-2020(online)].pdf	2020-01-28
7	202021003758-DECLARATION OF INVENTORSHIP (FORM 5) [28-01-2020(online)].pdf	2020-01-28
8	202021003758-COMPLETE SPECIFICATION [28-01-2020(online)].pdf	2020-01-28
9	Abstract1.jpg	2020-02-03
10	202021003758-FORM-26 [24-03-2020(online)].pdf	2020-03-24
11	202021003758-Proof of Right [17-06-2020(online)].pdf	2020-06-17
12	202021003758-FER.pdf	2021-10-19
13	202021003758-OTHERS [13-12-2021(online)].pdf	2021-12-13
14	202021003758-FER_SER_REPLY [13-12-2021(online)].pdf	2021-12-13
15	202021003758-PatentCertificate23-04-2024.pdf	2024-04-23
16	202021003758-IntimationOfGrant23-04-2024.pdf	2024-04-23

Search Strategy

1	SearchHistoryE_24-08-2021.pdf