Abstract: TITLE: A system (100) and method (200) for evaluating metadata of humans (12) in an image. Abstract The present disclosure proposes a system (100) and method (200) for evaluating metadata of humans (12) in an image. The system (100) comprises an imaging unit (101) and at least a processor (102) in communication with each other. The processor (102) configured to determine a set of keypoints of human labels in an image using a key point estimation algorithm. Various keypoints are connected to identify one or more human body part by means of the processor (102). A distance between the imaging unit (101) and the labelled human is calculated using camera metrics. At least one metadata is identified associated with human by means of the processor (102) based on relative analysis of the human body part and the calculated distance. Figure 1.
Description:Complete Specification:
The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed
Field of the invention
[0001] The present disclosure relates to the field of data annotation and labelling. In particular, it discloses a method for evaluating metadata of humans in an image and a system thereof.
Background of the invention
[0002] Data annotation is an indispensable part of training AI/ML models for specific real-world applications. Data annotation process extracts the key information from raw data and converts it into a format usable for deep learning(DL) based techniques. One such real world application is in the field of driver assistance or automated driving to identify and respond to pedestrian movement. In order to develop a Deep Learning based model that can locate pedestrians in a given image, the DL model has to be trained with images having pedestrians along with details of the location(with bounding boxes) of the pedestrians. This location of pedestrians has to be manually marked as rectangular bounding box and the labeling tool provides the coordinates in pixel location for the bounding boxes.
[0003] Apart from these bounding box annotations, several other attributes or metadata that are associated with the labels are provided as part of manual annotation. Verifying the quality of annotation is an important step before the labeled data can be used for deep learning model development as errors in labeled data can affect the accuracy of the model. The verification step involves manually examining both labels and metadata for correctness. The metadata is important to capture the variability of the data. The meta-data related to many of the object detection tasks include details and flags related to the position of the object. Verification of such huge number data labels is a very time-consuming process. Hence, there is a need to automate the verification of metadata in labeling.
[0004] Patent Application CN111652258 A titled “Image classification data annotation quality evaluation method” discloses an image classification data annotation quality evaluation method which comprises the following steps: providing an image data set which comprises images and classification data obtained by manually annotating each image; carrying out image feature extraction, extracting a plurality of feature vectors for describing image colors in each image based on an image HSV channel, and extracting a plurality of feature vectors for describing image appearances in each image based on local features of the images; measuring the dispersion degree of the features, and modeling the dispersion degree of the color and/or appearance feature vectors by utilizing statistic analysis; and carrying out automatic scoring, scoring and sorting the images based on a discrete degree model obtained by modeling, and evaluating the classification data according to a sorting result. According to the invention, automatic data annotation quality evaluation can be realized, and a quantitative basis is provided for assisting manual evaluation, so that the time cost is reduced.
Brief description of the accompanying drawings
[0005] An embodiment of the invention is described with reference to the following accompanying drawings:
[0006] Figure 1 depicts a system (100) for evaluating metadata of humans in an image; and at least
[0007] Figure 2 illustrates method steps (200) for evaluating metadata of humans (12) in an image.
Detailed description of the drawings
[0008] Figure 1 depicts a system (100) for evaluating metadata of humans (12) in an image. The system (100) comprises an imaging unit (101) and at least a processor (102) in communication with each other. The imaging unit (101) resides inside the vehicle (11). Modern vehicles are equipped with imaging unit (101) such as the front video camera. The camera has a key part to play in driver assistance systems because it enables vehicles to reliably detect objects and people using image-processing algorithms combined with artificial intelligence methods. This also makes the vehicles fit for future applications involving video-based driver assistance systems, such as automated driving.
[0009] The processor (102) is part of a larger computing system that can reside remotely in a server or cloud. The processor (102) can either be a logic circuitry or a software programs that respond to and processes logical instructions to get a meaningful result. A hardware processor may be implemented in the system as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA), and/or any component that operates on signals based on operational instructions.
[0010] The processor (102) configured to: label humans (12) in the image based on computer vision techniques; determine a set of keypoints in the human labeling using a key point estimation algorithm; connect various keypoints to identify one or more human body part by means of the processor (102); calculate a distance between the imaging unit (101) and the labelled human by means of the processor (102); identify at least one metadata associated with human (12) by means of the processor (102) based on relative analysis of the human body part and the calculated distance.
[0011] One metadata is class of the human (12) as child or adult. The processor (102) classifies a human as adult based by measuring distance between a neck key point and a midpoint of line between left ankle keypoint and right ankle keypoint. The processor (102) identifies body orientation of the human by calculating the angle of line between the neck keypoint and the midpoint of line between left ankle keypoint and right ankle keypoint. The processor (102) identifies head orientation of the human by calculating by calculating an angle between the neck keypoint and a head keypoint.
[0012] It should be understood at the outset that, although exemplary embodiments are illustrated in the figures and described below, the present disclosure should in no way be limited to the exemplary implementations and techniques illustrated in the drawings and described below.
[0013] Figure 2 illustrates method steps for evaluating metadata of humans in an image. The system (100) for evaluation metadata for humans (12) in the image has been elucidated in accordance with figure 1. For the purpose of clarity, it is reiterated that system (100) comprises an imaging unit (101) and at least a processor (102) in communication with each other. The imaging unit (101) captures images of the vehicle (11) surroundings and objects such as poles, trees, pedestrians etc.
[0014] Method step 201 comprises labelling humans (12) in the image based on computer vision techniques. The latest generation of the front video camera utilizes an innovative, high-performance system (100)-on-chip (SoC) with a microprocessor (102) for image-processing algorithms. It combines classic image-processing algorithms with artificial-intelligence methods for comprehensive scene interpretation and reliable object detection. Alternatively, object (the object here being the human/pedestrian) detection can be performed separately by the by the processor (102).
[0015] Method step 202 comprises determining a set of keypoints in the human labeling using a key point estimation algorithm. In case of pedestrians/humans, the key-point estimator can provide 17 keys points. Keypoint detection is a popular computer vision technique for locating key object parts in an image. It defines spatial locations or points that stand out in an image, like key parts for human body like joints, ears, hips, elbow, neck, ankles, head etc.
[0016] Method step 203 comprises connecting various keypoints to identify one or more human body part by means of the processor (102).
[0017] Method step 204 comprises calculating a distance between the imaging unit (101) and the labelled human (12) by means of the processor (102). Camera metrics are used to find distance between camera and human captured by the camera using monocular camera distance estimation technique. This is achieved by a homography mapping assuming the road to be a plan structure and making using of the camera calibration matrix. The method uses two images, one is taken at a camera point and the other is taken at the point where the camera is moved along the optical axis. The distance is calculated from the ratio between each size of an object projected on two images.
[0018] Method step 205 comprises identifying at least one metadata associated with human (12) by means of the processor (102) based on relative analysis of the human body part and the calculated distance. One metadata is class of the human (12) as child or adult. Classification of the human further comprises measuring distance between a neck key point and a midpoint of line between left ankle keypoint and right ankle keypoint by means of the processor (102). First monocamera parameters which are both camera intrinsic and extrinsic parameters are for human distance estimation (depth) from camera. A look-up calibration table is prepared between depth and child height. The LuT (Look-up Table)is used to map distance and bounding box height to calibrated person height. If calibrated person height < preset threshold- then the processor (102) identifies it a child.
[0019] Another metadata is body orientation of the human (12). This is identified by the processor (102) by calculating the angle of line between the neck keypoint and the midpoint of line between left ankle keypoint and right ankle keypoint. Another metadata is head orientation of human (12). This is identified by the processor (102) by calculating an angle between the neck keypoint and a head keypoint.
[0020] The metadata i.e. child or adult, head orientation, body orientation of the human as described herein is used to monitor vulnerabilities from pedestrian movements on the road for a vehicle. A person skilled in the art will appreciate that while these method steps describes only a series of steps to accomplish the objectives, these methodologies may be implemented with modifications and customized alterations for a specific application.
[0021] This idea to develop a system (100) and method for evaluating metadata of humans in an image automates the process of verification of huge number data labels, thereby making the process of ground truthing faster, efficient and less prone to human error.
[0022] It must be understood that the embodiments explained in the above detailed description are only illustrative and do not limit the scope of this invention. Any modification to the system (100) and method for evaluating metadata of humans in an image are envisaged and form a part of this invention. The scope of this invention is limited only by the claims.
, Claims:We Claim:
1. A system (100) for evaluating metadata of humans (12) in an image, the image captured by means of at least one imaging unit (101), said system (100) comprising at least one processor (102) in communication with the imaging unit (101), the processor (102) configured to: label humans in the image based on computer vision techniques; determine a set of keypoints in the human labeling using a key point estimation algorithm, characterized in that system (100), the processor (102) configured to:
connect various keypoints to identify one or more human body part;
calculate a distance between the imaging unit (101) and the labelled human (12);
identify at least one metadata associated with human (12) based on relative analysis of the human body part and the calculated distance.
2. The system (100) for evaluating metadata of humans in an image as claimed in claim 1, wherein one metadata is class of the human (12) as child or adult.
3. The system (100) for evaluating metadata of humans in an image as claimed in claim 1, wherein the processor (102) classifies a human (12) as adult based by measuring distance between a neck key point and a midpoint of line between left ankle keypoint and right ankle keypoint.
4. The system (100) for evaluating metadata of humans in an image as claimed in claim 1, wherein the processor (102) identifies body orientation of the human (12) by calculating the angle of line between the neck keypoint and the midpoint of line between left ankle keypoint and right ankle keypoint.
5. The system (100) for evaluating metadata of humans in an image as claimed in claim 1, wherein the processor (102) identifies head orientation of the human (12) by calculating by calculating an angle between the neck keypoint and a head keypoint.
6. A method (200) of evaluating metadata of humans (12) in an image, the image captured by means of at least one imaging unit (101), said imaging unit (101) in communication with a processor (102), the method comprising: labelling (201) humans (12) in the image based on computer vision techniques; determining (202) a set of keypoints in the human labeling using a key point estimation algorithm; characterized in that method:
connecting (203) various keypoints to identify one or more human body part by means of the processor (102);
calculating (204) a distance between the imaging unit (101) and the labelled human by means of the processor (102);
identifying (205) at least one metadata associated with human by means of the processor (102) based on relative analysis of the human body part and the calculated distance.
7. The method (200) of evaluating metadata of humans in an image as claimed in claim 6, wherein one metadata is class of the human as child or adult.
8. The method (200) of evaluating metadata of humans in an image as claimed in claim 6, wherein classification of the human further comprises measuring distance between a neck key point and a midpoint of line between left ankle keypoint and right ankle keypoint.
9. The method (200) of evaluating metadata of humans in an image as claimed in claim 6, wherein one metadata is body orientation of the human that is identified by calculating the angle of line between the neck keypoint and the midpoint of line between left ankle keypoint and right ankle keypoint.
10. The method (200) of evaluating metadata of humans in an image as claimed in claim 6, wherein one metadata is head orientation of human that is identified by calculating an angle between the neck keypoint and a head keypoint.
| # | Name | Date |
|---|---|---|
| 1 | 202341058321-POWER OF AUTHORITY [31-08-2023(online)].pdf | 2023-08-31 |
| 2 | 202341058321-FORM 1 [31-08-2023(online)].pdf | 2023-08-31 |
| 3 | 202341058321-DRAWINGS [31-08-2023(online)].pdf | 2023-08-31 |
| 4 | 202341058321-DECLARATION OF INVENTORSHIP (FORM 5) [31-08-2023(online)].pdf | 2023-08-31 |
| 5 | 202341058321-COMPLETE SPECIFICATION [31-08-2023(online)].pdf | 2023-08-31 |