Method And System For Benchmarking And Improving Accuracy Of A

< Back

Method And System For Benchmarking And Improving Accuracy Of A Perception System

Abstract: A method for improving an accuracy level of a target perception system (102) is provided. The method includes capturing a raw video and a reference video of the surroundings of a vehicle (110) simultaneously, and identifying object-related information related to a plurality of objects in the reference video in a first coordinate system using a reference perception system (104). Further, the method includes identifying object-related information and ground truth information related to the plurality of objects in the raw video in a second coordinate system using the target perception system (102) and a perception electronic control unit (122), respectively. Moreover, the method includes determining an accuracy level of each of the reference and target perception systems (102 and 104), and training the target perception system (102) with the identified ground truth information for automatically improving the accuracy level of the target perception system (102).

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

13 September 2023

Publication Number

40/2023

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Applicants

TATA ELXSI LIMITED

TATA ELXSI LIMITED, ITPB Road, Whitefield, Bangalore – 560048, India

Inventors

1. JYOTSANA SINGH

TATA ELXSI LIMITED, ITPB Road, Whitefield, Bangalore – 560048, India

2. SUNITHA KALLOR MISRA

TATA ELXSI LIMITED, ITPB Road, Whitefield, Bangalore – 560048, India

Specification

Description:RELATED ART

[0001] Embodiments of the present specification relate generally to a perception system, and more particularly to a system and an associated method for evaluating and improving accuracy of a newly developed perception system.
[0002] A navigation system of an autonomous vehicle such as an advanced driver assistance system (ADAS) system deployed in a vehicle often includes a perception system. The perception system usually includes one or more on-board sensors such as one or more cameras, light detection and ranging (LIDAR) sensors, and radio detection and ranging (RADAR) sensors. The perception system perceives the surroundings of the vehicle and recognizes objects in the surroundings to plan a safe navigation path for the vehicle using one or more of these on-board sensors.
[0003] Generally, the perception system performs a variety of functionalities in the vehicle. Examples of such functionalities include detection of objects in the surroundings of the vehicle, detection of lanes of an ego vehicle and other nearby vehicles, and recognition of traffic information such as traffic signs and traffic lights.
[0004] Further, the information determined using the perception system enables the vehicle and/or associated electronic control units (ECUs) to take several safety-critical decisions. For example, a distance-to-obstacle information determined using the perception system enables a brake ECU to take a safety critical decision of whether to gradually slow down the vehicle or to immediately activate an emergency brake. Thus, any newly developed perception system, whose output is used to implement several safety-critical decisions, has to be thoroughly tested and validated for associated accuracy before deploying the perception system in the vehicle.
[0005] To test accuracy of the newly developed perception system, the output results of the newly developed perception system may need to be benchmarked against the output results of one or more reference perception systems that already exist in the market. However, presently there are no systems that allow benchmarking and evaluating accuracy of the newly developed perception system with respect to accuracy of the one or more existing reference perception systems.
[0006] Certain existing solutions evaluate accuracy of a newly developed perception system by introducing data noise into sensor data that is captured by a perception sensor. For example, the US patent application US20220004818A1 describes a system that determines accuracy of a perception system. To that end, the system provides sensor data captured by a perception sensor as an input to a perception algorithm to generate a first detection list. Further, the system introduces data noise such as Gaussian blur, photorealistic haze, and photorealistic defocus into the sensor data using an associated simulator. Subsequently, the system provides the sensor data including the data noise as an input to the same perception algorithm to generate a second detection list. The system then compares the first and second detection lists to determine detection accuracy and weakness of the perception system. Though the system described in the US patent application US20220004818A1 alludes to determining accuracy of the perception system, the system does not have the capability to evaluate and compare accuracy levels of two different makes and/or models of perception systems against each other. Such a comparison is required to identify and ascertain if an accuracy level of a newly developed perception system is better than an accuracy level of another perception system that already exists in the market before deploying the newly developed perception system in real-world vehicles.
[0007] Accordingly, there remains a need for an accuracy determining system that determines and evaluates an accuracy level of a newly developed perception system with respect to an accuracy level of a reference perception system that already exists in the market.

BRIEF DESCRIPTION

[0008] It is an objective of the present disclosure to provide a method for improving an accuracy level of a target perception system. The method includes capturing a raw video and a reference video of the surroundings of a selected space simultaneously using a raw image sensor and a reference image sensor, respectively, that are deployed in the selected space. The raw image sensor is vertically displaced from the reference image sensor in the selected space such that both the reference image sensor and the raw image sensor include the same horizontal field of view. Further, the method includes identifying object-related information related to a plurality of objects in a plurality of frames of the reference video in a first coordinate system by processing the reference video using a reference perception system. Furthermore, the method includes identifying object-related information related to the plurality of objects in the plurality of frames of the raw video in a second coordinate system by processing the raw video using the target perception system. The first coordinate system is different from the second coordinate system.
[0009] In addition, the method includes identifying ground truth information related to the plurality of objects in the plurality of frames of the raw video in the second coordinate system by processing the raw video using a perception electronic control unit in the selected space. Moreover, the method includes converting the ground truth information and the object-related information that is identified by the target perception system from the second coordinate system to the first coordinate system by the perception electronic control unit. The method further includes determining an accuracy level of results output by the reference perception system by comparing the object-related information that is identified in the first coordinate system by the reference perception system with the ground truth information that is identified by the perception electronic control unit and is subsequently converted to the first coordinate system. Further, the method includes determining an accuracy level of results output by the target perception system by comparing the object-related information that is identified by the target perception system and subsequently converted to the first coordinate system with the ground truth information that is identified by the perception electronic control unit and is subsequently converted to the first coordinate system.
[0010] Furthermore, the method includes training the target perception system with the ground truth information identified in the second coordinate system by the perception electronic control unit when the determined accuracy level of the target perception system is less than a designated threshold and/or the determined accuracy level of the reference perception system for automatically improving the accuracy level of the target perception system. The first coordinate system corresponds to a world coordinate system and the second coordinate system corresponds to a pixel coordinate system. Identifying ground truth information related to the plurality of objects in the plurality of frames of the raw video in the second coordinate system includes generating bounding boxes around the plurality of objects by processing user inputs including the bounding boxes that are drawn around the plurality of objects by a user. Further, identifying ground truth information includes identifying classes of the plurality of objects in the plurality of frames of the raw video. Further, identifying ground truth information includes identifying a position of each of the plurality of objects based on a horizontal pixel distance along an X-axis between an origin point of the raw image sensor and a reference point in a corresponding object, and a vertical pixel distance along a Y-axis between the origin point and the reference point. Furthermore, identifying ground truth information includes identifying a width of each of the plurality of objects based on pixel coordinates of a top left corner and a top right corner of a bounding box enclosing a corresponding object selected from the plurality of objects in the plurality of frames of the raw video. In addition, identifying ground truth information includes identifying a length of each of the plurality of objects based on pixel coordinates of the top left corner and a bottom left corner of the bounding box enclosing a corresponding object selected from the plurality of objects in the plurality of frames of the raw video.
[0011] Moreover, identifying ground truth information includes generating the ground truth information related to the plurality of objects in the plurality of frames of the raw video in the pixel coordinate system. The generated ground truth information includes sizes of the generated bounding boxes, the identified classes, the position of each of the plurality of objects, the width of each of the plurality of objects, and the length of each of the plurality of objects. Converting the ground truth information from the second coordinate system to the first coordinate system includes converting the generated ground truth information including the position, the width, and the length of each of the plurality of objects in the plurality of frames of the raw video from the pixel coordinate system to the world coordinate system. The perception electronic control unit converts the ground truth information from the second coordinate system to the first coordinate system based on one or more intrinsic calibration parameters and one or more extrinsic calibration parameters of the raw image sensor. The one or more intrinsic calibration parameters include one or more of a focal length, an optical center, and radial distortion coefficients of a lens of the raw image sensor. The one or more extrinsic calibration parameters include one or more of a translation matrix and a rotation matrix that are used to convert the ground truth information from the pixel coordinate system to the world coordinate system. Further, converting the ground truth information from the second coordinate system to the first coordinate system includes storing the ground truth information related to the plurality of objects in the plurality of frames of the raw video in the world coordinate system in a ground truth database.
[0012] The reference perception system corresponds to an off-the-shelf perception system that is available in the market and is already tested and deployed in the selected space including real-world vehicles. The target perception system corresponds to a perception system that is developed subsequent to the reference perception system and is yet to be deployed in the selected space including real-world vehicles. The object-related information identified by each of the reference perception system and the target perception system includes one or more of bounding boxes generated around the plurality of objects, classes, positions, widths, and lengths of the plurality of objects. Determining the accuracy level of the reference perception system includes comparing the object-related information identified by the reference perception system from each frame selected from the plurality of frames of the reference video with the ground truth information identified by the perception electronic control unit from each corresponding frame selected from the plurality of frames of the raw video. Further, the method includes determining a precision score of the reference perception system using each of the plurality of frames of the reference video based on a number of object-related information that are accurately identified and a number of object-related information that are inaccurately identified by the reference perception system from a corresponding frame selected from the plurality of frames of the reference video. Furthermore, the method includes determining an average precision score of the reference perception system based on the corresponding precision score of the reference perception system determined using each of the plurality of frames of the reference video. In addition, the method includes determining a recall score of the reference perception system using each of the plurality of frames of the reference video based on a number of object-related information that are accurately identified and a number of object-related information that are unidentified by the reference perception system from a corresponding frame selected from the plurality of frames of the reference video.
[0013] Moreover, the method includes determining an average recall score of the reference perception system based on the corresponding recall score of the reference perception system determined using each of the plurality of frames of the reference video. The method further includes determining the accuracy level of the reference perception system based on the determined average precision score and the determined average recall score of the reference perception system. Determining the accuracy level of results output by the target perception system includes comparing the object-related information and the ground truth information that are identified by the target perception system and the perception electronic control unit, respectively from each frame selected from the plurality of frames of the raw video with each other. Further, the method includes determining a precision score of the target perception system using each of the plurality of frames of the raw video based on a number of object-related information that are accurately identified and a number of object-related information that are inaccurately identified by the target perception system from a corresponding frame selected from the plurality of frames of the raw video. Furthermore, the method includes determining an average precision score of the target perception system based on the corresponding precision score of the target perception system determined using each of the plurality of frames of the raw video. Moreover, the method includes determining a recall score of the target perception system using each of the plurality of frames of the raw video based on a number of object-related information that are accurately identified and a number of object-related information that are unidentified by the target perception system from a corresponding frame selected from the plurality of frames of the raw video.
[0014] The method further includes determining an average recall score of the target perception system based on the corresponding recall score of the target perception system determined using each of the plurality of frames of the raw video. Further, the method includes determines the accuracy level of the target perception system based on the determined average precision score and the determined average recall score of the target perception system. Determining the accuracy level of results output by the reference perception system includes determining an area of intersection between a first bounding box and a second bounding box. The first bounding box is generated by the reference perception system around an object in a frame of the reference video. The second bounding box is generated by the perception electronic control unit around the same object in the corresponding frame of the raw video. The area of intersection is determined based on X-Y coordinates of top left corners of the first bounding box and the second bounding box. Further, the method includes determining an area of union between the first bounding box and the second bounding box based on an area of the first bounding box, an area of the second bounding box, and the determined area of intersection between the first bounding box and the second bounding box. Furthermore, the method includes determining a bounding box generation accuracy of the reference perception system based on the determined area of intersection and the determined area of union between the first bounding box and the second bounding box. Determining the accuracy level of results output by the target perception system includes determining an area of intersection between a first bounding box and a second bounding box. The first bounding box is generated by the target perception system around an object in a frame of the raw video. The second bounding box is generated by the perception electronic control unit around the same object in the same frame of the raw video. The area of intersection is determined based on X-Y coordinates of top left corners of the first bounding box and the second bounding box.
[0015] Further, the method includes determining an area of union between the first bounding box and the second bounding box based on an area of the first bounding box, an area of the second bounding box, and the determined area of intersection between the first bounding box and the second bounding box. Furthermore, the method includes determining a bounding box generation accuracy of the target perception system based on the determined area of intersection and the determined area of union between the first bounding box and the second bounding box. Identifying object-related information related to the plurality of objects in the reference video in the first coordinate system includes converting the object-related information that is identified by the reference perception system from the first coordinate system to the second coordinate system. The perception electronic control unit converts the object-related information to the second coordinate system based on one or more intrinsic calibration parameters and one or more extrinsic calibration parameters of the reference image sensor. The one or more intrinsic calibration parameters include one or more of focal lengths and an optical center of the reference image sensor. The one or more extrinsic calibration parameters include one or more of a translation matrix and a rotation matrix that are used to convert the object-related information from the first coordinate system to the second coordinate system. The method further includes displaying values of the object-related information identified by each of the target perception system, the reference perception system, and the perception electronic control unit in the second coordinate system on a display unit for enabling a user to manually compare the accuracy level of the target perception system with the accuracy level of the reference perception system.
[0016] It is another objective of the present disclosure to provide a perception unit benchmarking system. The perception unit benchmarking system includes a perception electronic control unit that is operatively coupled to a target perception system including a raw image sensor and a reference perception system including a reference image sensor. The raw image sensor is vertically displaced from the reference image sensor in a selected space such that both the reference image sensor and the raw image sensor include the same horizontal field of view. The perception electronic control unit receives object-related information in a first coordinate system that is identified by the reference perception system from a reference video of the surroundings of the selected space. The received object-related information includes information related to a plurality of objects in a plurality of frames of the reference video. Further, the perception electronic control unit receives object-related information in a second coordinate system that is identified by the target perception system from a raw video of the surroundings of the selected space. The received object-related information includes information related to the plurality of objects in the plurality of frames of the raw video. The raw video and the reference video are captured simultaneously using the raw image sensor and the reference image sensor, respectively deployed in the selected space.
[0017] Furthermore, the perception electronic control unit identifies ground truth information related to the plurality of objects in the plurality of frames of the raw video in the second coordinate system by processing the raw video, and converts the ground truth information and the object-related information that is identified by the target perception system from the second coordinate system to the first coordinate system. Moreover, the perception electronic control unit determines an accuracy level of results output by the reference perception system by comparing the object-related information that is identified in the first coordinate system by the reference perception system with the ground truth information that is identified by the perception electronic control unit and is subsequently converted to the first coordinate system. In addition, the perception electronic control unit determines an accuracy level of results output by the target perception system by comparing the object-related information that is identified by the target perception system and is subsequently converted to the first coordinate system with the ground truth information that is identified by the perception electronic control unit and is subsequently converted to the first coordinate system. The perception electronic control unit further trains the target perception system with the ground truth information that is identified in the second coordinate system by the perception electronic control unit when the determined accuracy level of the target perception system is less than a designated threshold and/or the determined accuracy level of the reference perception system for automatically improving the accuracy level of the target perception system.
[0018] Each of the raw image sensor and the reference image sensor includes one or more of a camera, a light detection and ranging sensor, a radio detection and ranging sensor, an ultrasonic imaging system, and a thermographic camera. The perception electronic control unit is implemented as a learning system that resides in or more of a local storage system, a remote storage system, and a cloud storage system communicatively coupled to the selected space. The perception electronic control unit automatically trains the target perception system with the ground truth information including one or more of sizes of bounding boxes generated around the plurality of objects, and classes, positions, widths, and lengths of the plurality of the objects in the plurality of frames of the raw video when the determined accuracy level of the target perception system is less than one of the designated threshold and the determined accuracy level of the reference perception system. The selected space includes one or more of a vehicle, a retail space, an industrial area, a factory, a military area, a manufacturing facility, a storage facility, a warehouse, a home premises, a public place, a hospital area, and a supermarket. The perception unit benchmarking system includes one or more of an obstacle detection system, a lane-keep assist system, a traffic-sign detection system, a traffic-light detection system, a surveillance system, an industrial operation monitoring system, a medical system, and an electronic device.

BRIEF DESCRIPTION OF DRAWINGS

[0019] These and other features, aspects, and advantages of the claimed subject matter will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
[0020] FIG. 1 illustrates a block diagram depicting an exemplary perception unit benchmarking (PUB) system that determines accuracy levels of both a target perception system and a reference perception system, in accordance with embodiments of the present disclosure;
[0021] FIGS. 2A-B illustrate a flow diagram depicting an exemplary method for identifying ground truth information of objects in a plurality of frames of a raw video using the PUB system of FIG. 1, in accordance with embodiments of the present disclosure;
[0022] FIG. 3A illustrates an exemplary frame of a raw video including a bounding box generated around an object using the PUB system of FIG. 1, in accordance with embodiments of the present disclosure;
[0023] FIG. 3B illustrates the exemplary frame of the raw video from which a position of the object in a pixel coordinate system is identified using the PUB system of FIG. 1, in accordance with embodiments of the present disclosure;
[0024] FIG. 4 illustrates a flow diagram depicting an exemplary method for identifying object-related information from the plurality of frames of the raw video using the target perception system of FIG. 1, in accordance with embodiments of the present disclosure;
[0025] FIGS. 5A-B illustrate a flow diagram depicting an exemplary method for determining accuracy levels of results produced by each of the target perception system and the reference perception system using the PUB system of FIG. 1, in accordance with embodiments of the present disclosure;
[0026] FIG. 6 illustrates an exemplary frame of a reference video captured using the reference perception system of FIG. 1, in accordance with aspects of the present disclosure;
[0027] FIG. 7A illustrates an exemplary frame of the raw video including a first bounding box and a second bounding box that are generated around an object using the ground truth identifying system and the target perception system of FIG. 1, respectively, in accordance with embodiments of the present disclosure; and
[0028] FIG. 7B illustrates an exemplary frame of the reference video including the first bounding box and a third bounding box that are generated around the object using a perception electronic control unit and the reference perception system of FIG. 1, respectively, in accordance with embodiments of the present disclosure.

DETAILED DESCRIPTION

[0029] The following description presents an exemplary perception unit benchmarking (PUB) system that determines accuracy levels of both a newly developed perception system and a reference perception system that already exists in the market. Particularly, embodiments described herein disclose the PUB system that compares the determined accuracy level of the newly developed perception system and the determined accuracy level of the reference perception system against each other. Further, the PUB system automatically improves accuracy of the newly developed perception system when the associated accuracy level is less than the accuracy level of the reference perception system.
[0030] Generally, sensor manufacturing companies such as camera manufacturing companies develop and periodically launch new perception systems in the market. The sensor manufacturing companies develop such new perception systems using advanced and sophisticated algorithms such as artificial intelligence and machine-learning algorithms, and hence, the newly developed perception systems are expected to provide accurate results when compared to results output by perception systems that already exist in the market. However, in real-world scenarios, the newly developed perception systems may not provide accurate results when compared to results output by the existing perception systems when there are hardware and/or software issues exist in the newly developed perception systems.
[0031] Further, such a newly developed perception system may not accurately identify object-related information when the newly developed perception system is deployed in a selected space without properly testing and benchmarking an associated accuracy level against the accuracy level of a perception system that already exists in the market. Examples of the selected space include one or more of, but are not limited to, a vehicle, a retail space, an industrial area, a factory, a military area, a manufacturing facility, a storage facility, a warehouse, a home premises, a public place, a hospital area, and a supermarket. For instance, the newly developed perception system may not accurately identify information that is critical for a safe navigation of a real-world vehicle when the newly developed perception system is deployed in the vehicle without properly testing and benchmarking an associated accuracy level against the accuracy level of a perception system that already exists in the market. In order to address the aforementioned issues, the present disclosure provides the PUB system that automatically identifies and benchmarks an accuracy level of a perception system that is newly developed using advanced and sophisticated techniques against an accuracy level of another perception system that already exists in the market. Further, the PUB system automatically improves the accuracy level of the newly developed perception system by suitably training the newly developed perception system with real and accurate object-related information identified by the PUB system. When the newly developed perception system whose accuracy is improved is deployed in a real-world vehicle, the newly developed perception system accurately identifies information related to various stationary and dynamic objects in the surroundings of the vehicle, which ensures a safe navigation of the vehicle without any collision with objects in the surroundings of the vehicle.
[0032] Further, it may be noted that some original equipment manufacturers (OEMs), for example, some automotive OEMs do not develop perception systems on their own. Such automotive OEMs acquire perception systems for their vehicles from sensor manufacturing companies, and deploy the acquired perception systems in their vehicles. The PUB system described in the present disclosure assists automotive OEMs in selecting a right perception system for their vehicles by comparing accuracy levels of different perception systems developed by different sensor manufacturing companies against each other and by selecting a specific perception system from the different perception systems that outputs accurate results.
[0033] For example, the PUB system compares accuracy levels of different LIDAR-based perception systems developed by different sensor manufacturing companies against each other. Further, the PUB system identifies a specific LIDAR-based perception system from the LIDAR-based perception systems that accurately determines speeds, directions, and distances of objects in the surroundings of a vehicle. Automotive OEMs may then use the specific LIDAR-based perception system identified by the PUB system in their vehicles for accurately identifying information related to various stationary and dynamic objects in real-world scenarios.
[0034] In another example, the PUB system compares accuracy levels of different ultrasonic transducers that are to be employed in a medical application and are developed by different sensor manufacturing companies against each other. Further, the PUB system identifies a specific ultrasonic transducer from the ultrasonic transducers that accurately captures images of internal organs of a human body. Medical equipment manufacturers may then use the specific ultrasonic transducer identified by the PUB system in their ultrasonic imaging machines for accurately capturing images of internal organs of humans in real-world scenarios.
[0035] In yet another example, the PUB system compares accuracy levels of different imaging sensors that are to be employed in a surveillance application and are developed by different sensor manufacturing companies against each other. Further, the PUB system identifies a specific imaging sensor from the imaging sensors whose output is accurate and satisfies predefined safety and security compliance norms. Surveillance equipment manufacturers may then use the specific imaging sensor identified by the PUB system in their surveillance equipment’s for accurately performing surveillance operations in real-world scenarios.
[0036] Hereinafter, for the sake of simplicity, the newly developed perception system is referred as a target perception system. As noted previously, conventional systems determine an accuracy level of the target perception system by introducing simulated noise into sensor data captured by a perception sensor. However, such conventional systems do not compare and evaluate the accuracy level of the target perception system with respect to the accuracy level of a reference perception system.
[0037] However, embodiments of the present disclosure provide a PUB system system that compares accuracy levels of different makes, models, and/or versions of perception systems with each other, and identifies a particular perception system that provides most accurate output results. It may be noted that different embodiments of the present PUB system may be used in different application areas.
[0038] For example, the PUB system may be used in an obstacle detection application area to compare a distance-to-an obstacle information detected by each of the target perception system and the reference perception system with ground truth (GT) information including an actual distance-to-the obstacle. Further, the PUB system identifies the target perception system or the reference perception system, whose output is matching with the actual distance-to-the obstacle information, as an accurate result producing perception system.
[0039] In another example, the PUB system may be used in a lane-keep assist application area to compare lane width and color information detected by each of the target perception system and the reference perception system with GT information including actual lane width and color information. Further, the PUB system identifies the target perception system or the reference perception system whose output is matching with the actual lane width and color information as the best-performing perception system.
[0040] Similarly, the PUB system may be used in a traffic-sign detection application area to compare traffic sign information such as a traffic sign type, a position of the traffic sign, and height and width of the traffic sign detected by each of the target perception system and the reference perception system with GT information including actual traffic sign information. Further, the PUB system identifies the target perception system or the reference perception system whose output is matching with the actual traffic sign information as the best-performing perception system.
[0041] Likewise, the PUB system may be used in a traffic-light detection application area to compare traffic light color information detected by each of the target perception system and the reference perception system with GT information including the actual traffic light color information. Further, the PUB system identifies the target perception system or the reference perception system whose output is matching with the actual traffic light color information as the best-performing perception system.
[0042] In addition, the PUB system may be used in a surveillance application to compare user information including a number of users, age, and/or gender of the users who may have entered into a secured premises identified by each of the target perception system and the reference perception system with GT information including the actual user information. Examples of the secured premises include banks, other financial institutions, and research facilities. Further, the PUB system identifies the target or perception system whose output is matching with the actual user information as the best-performing perception system.
[0043] Further, the PUB system may be used in an industrial operation monitoring application to compare a distance, by which a robotic arm is to be moved to pick an object, identified by each of the target perception system and the reference perception system with GT information including the actual distance by which the robotic arm is to be moved. Furthermore, the PUB system identifies the target perception system or the reference perception system whose output is matching with the actual distance information as the best-performing perception system.
[0044] Further, the PUB system may be used in evaluating accuracy levels of perception systems used in electronic devices such as smartphones, laptops, tablets, or any camera-enabled devices. For example, the target perception system and the reference perception system developed to perform the functionalities of face recognition applications may be installed in a smartphone device. In this example, the PUB system may be used to compare gender and age of users predicted by each of the target perception system and the reference perception system with GT information including actual gender and age of the users. The PUB system then identifies the target perception system or the reference perception system whose output is matching with the actual gender and age of the users as the best-performing perception system.
[0045] The present PUB system determines corresponding accuracy of target and reference perception systems used in different application areas. However, for clarity, an embodiment of the PUB system that determines a corresponding accuracy of target and reference perception systems is described subsequently with reference to FIGS. 1-7B in the context of an object detection application.
[0046] FIG. 1 illustrates a block diagram depicting an exemplary PUB system (100) that determines accuracy levels of both a target perception system (102) and a reference perception system (104) that are to be employed in an object detection application. The term “target perception system (102)” described herein throughout various embodiments of the present disclosure refers to a newly developed perception system whose accuracy is to be determined and benchmarked against another perception system before deploying such target perception system (102) in real-world vehicles. The term “reference perception system (104)” described herein throughout various embodiments refers to an off-the-shelf perception system that is available in the market and is already tested and deployed in real-world vehicles.
[0047] Conventionally, the reference perception system (104) deployed in a real-world vehicle captures a reference video of the surroundings of the vehicle using an associated perception sensor. Subsequently, the reference perception system (104) processes the captured video and identifies object-related information from the captured video. Examples of such object-related information identified from the captured video include classes of objects identified in a navigation path of the vehicle, positions of the identified objects, and lengths and widths of the identified objects. Though the reference perception system (104) identifies the object-related information from the captured video, the reference perception system (104) does not separately store the captured video in an associated database. Hence, users cannot generally access the video captured by the reference perception system (104) and can only access the object-related information identified from the captured video.
[0048] As the reference perception system (104) does not have any provision to store the captured video, conventional PUB systems cannot access the captured video from the reference perception system (104) and provide the captured video as input to the target perception system (102). The target perception system (102) may then use the received video to identify corresponding object-related information for comparing accuracy levels of the target and reference perception systems (102 and 104).
[0049] Further, the reference perception system (104) conventionally identifies and outputs values of the object-related information in a world coordinate system. In contrast, the target perception system (102) identifies and outputs values of the object-related information in an image or a pixel coordinate system. Therefore, the conventional PUB systems cannot directly compare the output values of the reference perception system (104) with the output values of the target perception system (102) to identify if a new perception system is outputting accurate values.
[0050] In order to address the aforementioned issues associated with conventional PUB systems, the present disclosure provides the PUB system (100). Specifically, the PUB system (100) compares an accuracy level of the reference perception system (104), that does not include a provision to store the captured video, with an accuracy level of the target perception system (102). Further, the PUB system (100) compares the accuracy level of the reference perception system (104) that identifies the object-related information in the world coordinate system with the accuracy level of the target perception system (102) that identifies the object-related information in the pixel coordinate system.
[0051] To compare accuracy levels of the target and reference perception systems (102 and 104), the present disclosure provides a raw image sensor (106) and a reference image sensor (108) that are deployed in a selected space (110). For the sake of simplicity, the selected space (110) is depicted and is subsequently described to be a vehicle (110). However, it is to be understood that the selected space (110) may also be a retail space, an industrial area, a factory, a military area, a manufacturing facility, a storage facility, a warehouse, a home premises, a public place, a hospital area, and a supermarket, which may be monitored using the target perception system (102) post testing and benchmarking an accuracy level of the target perception system (102) against an accuracy level of the reference perception system (104). In one embodiment, the raw image sensor (106) and the reference image sensor (108) are deployed within a cabin compartment below a rear-view mirror disposed within the vehicle (110). However, it is to be understood that the raw image sensor (106) and the reference image sensor (108) may also be deployed in other suitable locations within the vehicle (110).
[0052] In certain embodiments, the raw image sensor (106) and the reference image sensor (108) are perception sensors used for perceiving objects in the surroundings of the vehicle (110). Examples of the raw image sensor (106) or the reference image sensor (108) deployed within the vehicle (110) include a camera sensor, a LIDAR sensor, a RADAR sensor, an ultrasonic imaging system, or a thermographic camera. For sake of simplicity, the raw image sensor (106) and the reference image sensor (108) are subsequently described to be camera devices such as a raw camera (106) and a reference camera (108), respectively. However, it is to be understood that the raw image sensor (106) or the reference image sensor (108) may include a LIDAR sensor or a RADAR sensor.
[0053] In one embodiment, the raw camera (106) is deployed at a particular distance from the reference camera (108) such that both the raw camera (106) and the reference camera (108) include the same horizontal field of view (112) but the raw camera (106) is vertically displaced from the reference camera (108), as depicted in FIG. 1. Therefore, when the vehicle (110) navigates via a navigation path, both the raw camera (106) and the reference camera (108) simultaneously capture same or similar videos of the surroundings of the vehicle (110).
[0054] In certain embodiments, the reference perception system (104) processes the reference video, thus captured by the reference camera (108) during navigation of the vehicle (110), and identifies object-related information from the captured reference video. Examples of such object-related information identified from the captured reference video include classes of objects identified during navigation of the vehicle (110) via the navigation path, bounding boxes generated around the identified objects, positions of the identified objects in the world coordinate system, and lengths and widths of the objects in the world coordinate system along with timestamp information. Subsequently, the reference perception system (104) transmits the identified object-related information to the PUB system (100) via a communications link (114). The PUB system (100) then stores the received object-related information in an associated reference perception system (RPS) database (116).
[0055] In certain embodiments, the PUB system (100) may be implemented by suitable code on a processor-based system, such as a general-purpose or a special-purpose computer. Furthermore, the PUB system (100), for example, may include one or more general-purpose processors, specialized processors, graphical processing units, microprocessors, programming logic arrays, field programming gate arrays, integrated circuits, systems on chips, and/or other suitable computing devices.
[0056] In one embodiment, the PUB system (100) may reside locally in the vehicle (110). In this implementation, the PUB system (100) is communicatively coupled to the target perception system (102), the reference perception system (104), and a video storing system (120) via the communications link (114) such as via an Ethernet cable, a universal serial bus (USB) cable, a controller area network (CAN), a serial peripheral interface, or a clocked serial interface. Alternatively, the PUB system (100) may be a remote system such as a cloud-based system that resides at a remote location outside the vehicle (110). In this implementation, the PUB system (100) is remotely coupled to the target perception system (102), the reference perception system (104), and the video storing system (120) via a telematics control unit (118) in the vehicle (110). In one embodiment, the telematics control unit (118) transmits a raw video captured by the raw camera (106) and the object-related information identified by the target and reference perception systems (102 and 104) to the remotely located PUB system (100) via the communications link (114) such as a Wi-Fi network, an Ethernet, a cellular data network, or the internet.
[0057] Upon receiving the object-related information identified by the reference perception system (104) from the telematics control unit (118), the PUB system (100) stores the received object-related information in the RPS database (116). In certain embodiments, similar to processing of the reference video by the reference perception system (104), the target perception system (102) deployed in the vehicle (110) processes the raw video that is captured by the raw camera (106) and is subsequently stored in the video storing system (120). Further, the target perception system (102) identifies the object-related information in the pixel coordinate system from the raw video using one or more of known or proprietary algorithms and techniques. Examples of such object-related information identified from the raw video include classes of objects identified during navigation of the vehicle (110) via the navigation path, bounding boxes generated around the identified objects, positions of the identified objects in the pixel coordinate system, and lengths and widths of the identified objects in the pixel coordinate system along with timestamp information.
[0058] Subsequently, the target perception system (102) transmits the identified object-related information to the PUB system (100) via the communications link (114). The PUB system (100) then converts the object-related information that is identified in the pixel coordinate system by the target perception system (102) to the world coordinate system using a perception electronic control unit (ECU) (122), for example, using a methodology described subsequently with reference to FIGS. 2A-B. Further, the PUB system (100) then stores the object-related information that is converted to the world coordinate system in an associated target perception system (TPS) database (124).
[0059] In certain embodiments, the PUB system (100) also receives the raw video of the surroundings of the vehicle (110), captured by the raw camera (106) and stored in the video storing system (120), from the telematics control unit (118) via the communications link (114). In one embodiment, the PUB system (100) plays the raw video received from the telematics control unit (118) using a video playback system (126) such as a media player for enabling the perception ECU (122) to identify ground truth (GT) information from the raw video.
[0060] In certain embodiments, the perception ECU (122) identifies GT information that corresponds to object-related information from the raw video using one or more techniques, which are described in detail with reference to FIGS. 2A-B. Examples of such techniques include 2-dimesional boxing technique, semantic segmentation technique, polygon segmentation technique, key points technique, and line annotation technique. Examples of the GT information identified using one or more of these techniques include classes of objects identified during navigation of the vehicle (110) via the navigation path, bounding boxes generated around the identified objects, positions of the identified objects in the pixel coordinate system, and length and width of the identified objects in the pixel coordinate system.
[0061] In certain embodiments, the values of the GT information identified by the perception ECU (122) are true and accurate values. The perception ECU (122) uses the identified values of the GT information as golden references for comparison with the object-related information identified by each of the target and reference perception systems (102 and 104) and for evaluating corresponding accuracy of the target and perception systems (102 and 104).
[0062] In certain embodiments, the perception ECU (122) identifies the values of the GT information in the pixel coordinate system. However, the reference perception system (104) generally identifies the values of the object-related information in the world coordinate system. Hence, the values of the GT information identified in the pixel coordinate system by the perception ECU (122) cannot be directly compared with the values of the object-related information identified in the world coordinate system by the reference perception system (104) to identify an accuracy level of the reference perception system (104).
[0063] In order to address the aforementioned issues, the perception ECU (122) converts the GT information identified in the pixel coordinate system to the world coordinate system. In one embodiment, the perception ECU (122) converts the GT information from the pixel coordinate system to the world coordinate system using one or more calibration parameters stored in an associated calibration database (128), as described in further detail subsequently with reference to FIGS. 2A-B. Subsequently, the perception ECU (122) stores the GT information such as classes of the objects identified during navigation of the vehicle (110) via the navigation path, bounding boxes generated around the identified objects, positions of the identified objects in the world coordinate system, and lengths and widths of the identified objects in the world coordinate system along with timestamp information in an associated GT database (130).
[0064] In certain embodiments, the perception ECU (122) compares the GT information stored in the world coordinate system in the GT database (130) with the object-related information stored in the world coordinate system in the TPS database (124). Further, the perception ECU (122) determines an accuracy level of the target perception system (102) based on the comparison, as described subsequently with reference to FIGS. 5A-B. Similarly, the perception ECU (122) compares the GT information stored in the world coordinate system in the GT database (130) with the object-related information stored in the world coordinate system in the RPS database (116). Further, the perception ECU (122) determines an accuracy level of the reference perception system (104) based on the comparison, as described subsequently with reference to FIGS. 5A-B.
[0065] In one embodiment, the perception ECU (122) may be implemented as a machine-learning based system. When the accuracy level of the target perception system (102) is less than the accuracy level of the reference perception system (104) or less than a designated threshold, the perception ECU (122) automatically trains the target perception system (102) with the GT information that is identified by the perception ECU (122) for improving or increasing the accuracy level of the target perception system (102).
[0066] Further, in certain embodiments, the PUB system (100) displays the values of the object-related information identified by each of the target and reference perception systems (102 and 104) and the values of the GT information identified by the perception ECU (122) on a display unit (132). Examples of the display unit (132) include a human-machine interface in the vehicle (110), a heads-up display unit in the vehicle (110), and a remote display device that resides at a remote location outside the vehicle (110). Displaying these values on the display unit (132) enables a user to manually compare and identify accuracy levels of results produced by the target and reference perception systems (102 and 104).
[0067] It may be noted that the target perception system (102), reference perception system (104), and perception ECU (122) may include one or more electronic control units or may be implemented by suitable code on a processor-based system, such as a general-purpose or a special-purpose computer. Accordingly, the target perception system (102), reference perception system (104), and perception ECU (122), for example, may include one or more electronic control units, general-purpose processors, specialized processors, graphical processing units, microprocessors, programming logic arrays, field programming gate arrays, integrated circuits, systems on chips, and/or other suitable computing devices. An embodiment describing an exemplary method by which the perception ECU (122) identifies the GT information from the raw video is described in greater detail with reference to FIGS. 2A-B.
[0068] FIGS. 2A-B illustrate a flow diagram depicting an exemplary method (200) for identifying GT information from a raw video that is captured by the raw camera (106) deployed in the vehicle (110). The order in which the exemplary method is described is not intended to be construed as a limitation, and any number of the described blocks may be combined in any order to implement the exemplary method disclosed herein, or an equivalent alternative method. Additionally, certain blocks may be deleted from the exemplary method or augmented by additional blocks with added functionality without departing from the claimed scope of the subject matter described herein.
[0069] At step (202), the raw camera (106) captures the raw video of the surroundings of the vehicle (110) when the vehicle (110) navigates via a navigation path and stores the captured raw video in the video storing system (120). In one embodiment, the captured raw video includes a plurality of frames each of which includes timestamp information. At step (204), the video playback system (126) receives the raw video stored in the video storing system (120) via the communications link (114), and plays the received raw video.
[0070] At step (206), the perception ECU (122) generates bounding boxes around objects in the plurality of frames of the raw video. In one embodiment, the perception ECU (122) enables a user to manually draw bounding boxes around objects in the plurality of frames of the raw video. In this embodiment, the perception ECU (122) uses one or more annotation techniques such as 2-dimesional boxing technique, semantic segmentation technique, polygon segmentation technique, key points technique, and line annotation technique for enabling the user to manually draw bounding boxes around objects in different frames of the raw video. For example, FIG. 3A illustrates a schematic diagram depicting one such exemplary bounding box (304) that is drawn by the user around an object (302) in a specific frame (300) of the raw video using one or more of the annotation techniques noted previously.
[0071] Alternatively, in certain embodiments, the perception ECU (122) automatically generates bounding boxes around objects in the plurality of frames of the raw video without receiving any user inputs. In this embodiment, the perception ECU (122) automatically generates bounding boxes around the objects using one or more techniques such as a histogram of oriented gradients (HOG) feature extraction technique, a Haar feature extraction technique, a support vector machine (SVM) technique, and a convolutional neural network (CNN).
[0072] At step (208), the perception ECU (122) identifies classes of the objects in the plurality of frames of the raw video. To that end, the perception ECU (122) may employ the 2D boxing tool or technique, which provides options to a user to manually annotate the objects in the plurality of frames of the raw video with their corresponding classes. For example, with reference to FIG. 3A, the perception ECU (122) employs the 2D boxing tool, which provides an option to the user to manually annotate the object (302) in the frame (300) of the raw video as “Car”. Alternatively, in lieu of manually annotating classes of the objects, the perception ECU (122) automatically identifies classes of the objects in the plurality of frames of the raw video using one or more of the annotation techniques noted previously.
[0073] At step (210), the perception ECU (122) identifies positions of the objects in the plurality of frames of the raw video in a pixel coordinate system. For example, with reference to FIGS. 3A-B, the perception ECU (122) identifies a position of the object (302) in the exemplary frame (300) in the pixel coordinate system. To that end, the perception ECU (122) considers a particular point in the raw camera (106) as an origin point (306) of the pixel coordinate system. Subsequently, the perception ECU (122) identifies a horizontal pixel distance (308) along an X-axis (310) between the origin point (306) and a reference point (312) in the object (302). In addition, the perception ECU (122) identifies a vertical pixel distance (314) along a Y-axis (316) between the origin point (306) and the reference point (312) in the object (302). The perception ECU (122) then identifies the position of the object (302) in the pixel coordinate system using the identified horizontal pixel distance (308) and the identified vertical pixel distance (314) as represented herein, for example, using equation (1).

OP=(Xpos,Ypos) (1)

[0074] In equation (1), ‘OP’ corresponds to the position of the object (302) represented in the pixel coordinate system, ‘Xpos’ corresponds to the horizontal pixel distance (308) identified between the origin point (306) and the reference point (312), and ‘Ypos’ corresponds to the vertical pixel distance (314) identified between the origin point (306) and the reference point (312). It is to be understood that the perception ECU (122) similarly identifies positions of the objects in other selected subset of frames of the raw video in the pixel coordinate system based on their corresponding horizontal and vertical pixel distances from the origin point (306) using the previously noted equation (1).
[0075] At step (212), the perception ECU (122) determines widths of the objects in the plurality of frames of the raw video in the pixel coordinate system. For example, with reference to FIG. 3A, the perception ECU (122) identifies a width of the object (302) in the frame (300) of the raw video in the pixel coordinate system. To that end, as depicted in FIG. 3A, the perception ECU (122) identifies a particular corner of the frame (300), for example, a top left corner of the frame (300) as an origin (318) of the pixel coordinate system. Further, the perception ECU (122) determines horizontal and vertical pixel coordinates with respect to the origin (318) of a top left corner (320) of the bounding box (304) of the object (302), which is represented as (X1, Y1) in FIG. 3A. In addition, the perception ECU (122) determines horizontal and vertical pixel coordinates of a top right corner (322) of the bounding box (304), which is represented as (X2, Y2) in FIG. 3A. Subsequently, the perception ECU (122) identifies the width of the object (302) using the determined horizontal pixel coordinates of both the top left corner (320) and the top right corner (322) as represented herein, for example, using equation (2).

Width=X2-X1 (2)

[0076] In certain embodiments, the perception ECU (122) similarly determines widths of the objects in the other selected subset of frames of the raw video based on differences between horizontal coordinates of top left corners of corresponding bounding boxes and horizontal coordinates of top right corners of corresponding bounding boxes using equation (2).
[0077] At step (214), the perception ECU (122) determines lengths of the objects in the plurality of frames of the raw video in the pixel coordinate system. For example, with reference to FIG. 3A, the perception ECU (122) identifies a length of the object (302) in the frame (300) of the raw video in the pixel coordinate system. To that end, with respect to the origin (318), the perception ECU (122) determines horizontal and vertical pixel coordinates of a bottom left corner (324) of the bounding box (304) of the object (302), which is represented as (X3, Y3) in FIG. 3A. Subsequently, the perception ECU (122) identifies the length of the object (302) using the vertical pixel coordinate of the bottom left corner (324) and the vertical pixel coordinate of the top left corner (320) as represented herein, for example, using equation (3).

Length=Y3-Y1 (3)

[0078] In certain embodiments, the perception ECU (122) similarly determines lengths of the objects in the other selected subset of frames of the raw video based on differences between vertical coordinates of bottom left corners of corresponding bounding boxes and vertical coordinates of top left corners of corresponding bounding boxes using equation (3).
[0079] Subsequently, at step (216), the perception ECU (122) generates GT information of the objects in the plurality of frames of the raw video, for example, based on their corresponding bounding boxes, classes, positions, lengths, and widths information identified in the pixel coordinate system by the perception ECU (122).
[0080] At step (218), the perception ECU (122) converts the GT information identified in the pixel coordinate system to the world coordinate system. For example, the perception ECU (122) converts the GT information of the object (302) that is identified in the pixel coordinate system to the world coordinate system, for example, using equation (4).

[X Y Z]=(S[U V 1] A^(-1)-t) R^(-1) (4)

[0081] In equation (4), a matrix representing [X Y Z] corresponds to the GT information of the object (302) represented in the world coordinate system, ‘S’ corresponds to a scaling factor, and ‘U’ and ‘V’ correspond to values of the GT information of the object (302) identified in the pixel coordinate system by the perception ECU (122). Further, in equation (4), ‘A’ corresponds to a matrix representing intrinsic calibration parameters of the raw camera (106). Examples of such intrinsic calibration parameters include a focal length of the raw camera (106), an optical center of the raw camera (106), and radial distortion coefficients of a lens of the raw camera (106), whose associated values are previously determined at the time of installation of the raw camera (106) in the vehicle (110) and are stored in the calibration database (128). In addition, in equation (4), ‘t’ and ‘R’ correspond to translation and rotation matrices, respectively, and represent extrinsic calibration parameters of the raw camera (106). Similar to the values of intrinsic calibration parameters determined and stored in the calibration database (128), the values of the extrinsic calibration parameters are also determined at the time of installation of the raw camera (106) in the vehicle (110) and are stored in the calibration database (128).
[0082] In one embodiment, the GT information of the object (302) that is converted to the world coordinate system includes one or more of a class of the object (302), a bounding box generated around the object (302), and values of a position, a length, and a width of the object (302) represented in the world coordinate system. Though it is not described in detail, it is to be understood that the perception ECU (122) similarly converts the GT information related to the objects in the other selected subset of frames of the raw video from the pixel coordinate system to the world coordinate system, for example, using the previously noted equation (4).
[0083] At step (220), the perception ECU (122) stores the GT information of the objects in the plurality of frames of the raw video in the world coordinate system in the GT database (130). In certain embodiments, the GT information, thus stored in the GT database (130), is true and accurate information and acts as a golden reference for comparison with object-related information identified by the target perception system (102).
[0084] FIG. 4 illustrates a flow diagram depicting an exemplary method (400) for identifying object-related information from the raw video by the target perception system (102) of FIG. 1. At step (402), the target perception system (102) retrieves the raw video captured by the raw camera (106) from the video storing system (120). At step (404), the target perception system (102) identifies object-related information related to the objects in the plurality of frames of the raw video in a pixel coordinate system. In one embodiment, the target perception system (102) identifies object-related information related to the objects in the pixel coordinate system, for example, using known or proprietary perception algorithms and/or techniques that are specific to an organization or an entity that has developed the target perception system (102).
[0085] Examples of such object-related information identified by the target perception system (102) using known or proprietary perception algorithms and/or techniques include bounding boxes generated around the objects in different selected subset of frames of the raw video, classes of the objects, positions of the objects in the pixel coordinate system, and lengths and widths of the objects in the pixel coordinate system. At step (406), the perception ECU (122) converts the object-related information identified in the pixel coordinate system to the world coordinate system, for example, using equation (4), noted previously.
[0086] In one embodiment, the object-related information, thus converted to the world coordinate system, includes bounding boxes generated around the objects in different selected subset of frames of the raw video, classes of the objects, positions of the objects, and lengths and widths of the objects in the world coordinate system. At step (408), the target perception system (102) stores the object-related information in the world coordinate system in the TPS database (124).
[0087] In certain embodiments, subsequent to storing the object-related information in the TPS database (124), the perception ECU (122) compares the GT information that is previously identified and stored in the GT database (130) with the object-related information that is stored in the TPS database (124) to identify accuracy levels of results produced by the target perception system (102). In addition, the perception ECU (122) compares the GT information stored in the GT database (130) with the object-related information that is previously identified and stored in the RPS database (116) to identify accuracy levels of results produced by the reference perception system (104), as described in detail in the following paragraphs with reference to FIGS. 5A-B.
[0088] FIGS. 5A-B illustrate a flow diagram depicting an exemplary method (500) for determining accuracy levels of results produced by each of the target perception system (102) and the reference perception system (104) using the PUB system (100) of FIG. 1. At step (502), the perception ECU (122) retrieves the GT information stored in the world coordinate system in the GT database (130). At step (504), the perception ECU (122) retrieves the object-related information stored in the world coordinate system in the TPS database (124). At step (506), the perception ECU (122) retrieves the object-related information stored in the world coordinate system in the RPS database (116).
[0089] At step (508), the perception ECU (122) determines an average precision score of each of the target perception system (102) and the reference perception system (104). In one embodiment, the perception ECU (112) determines the average precision score in order to subsequently determine an accuracy level of each of the target and reference perception systems (102 and 104). In certain embodiments, the average precision score, thus determined by the perception ECU (122), indicates the abilities of the target and/or reference perception systems (102 and 104) to accurately identify information related to objects in a selected subset of frames of a video.
[0090] To determine the average precision score of the target perception system (102), the perception ECU (122) selects a particular frame, for example, the frame (300) occurring at a particular instant of time in the raw video. Subsequently, the perception ECU (122) retrieves the GT information, which is previously identified from the particular frame (300) and is stored in the GT database (130). Further, the perception ECU (122) retrieves the object-related information, which is previously identified from the particular frame (300) by the target perception system (102) and is stored in the TPS database (124). The perception ECU (122) then compares the retrieved GT information with the retrieved object-related information to determine a precision score of the target perception system (102) using the frame (300).
[0091] The following Table 1 provides exemplary GT information and an exemplary object-related information, which are previously identified from the particular frame (300) and are stored in the GT database (130) and the TPS database (124), respectively.
[0092] Table 1 – GT and object-related information

Object related GTI identified ORI identified ORI identified from
information from frame (300) from frame (300) corresponding
(ORI) by PECU (122) by TPS (102) frame (600) by
RPS (104)

Timestamp 10:00:00 AM 10:00:00 AM 10:00:00 AM
Object Class Vehicle Vehicle Vehicle
Object Position 6 meters from 6 meters from 7 meters from
the vehicle (110) the vehicle (110) the vehicle (110)
Object Length 4.5 meters 4.5 meters 4.5 meters
Object Width 1.8 meters 1.8 meters 1.5 meters
Number of 2 including 2 including 1 that includes
Objects the object (302) the object (302) only the object
and lane and lane (302). Lane
markings (326) markings (326) markings (326)
are not identified

[0093] In Table 1, ‘GTI identified from frame (300) by PECU (122)’ corresponds to GT information that is previously identified by the perception ECU (122) from the frame (300) and stored in the GT database (130). ‘ORI identified by TPS (102)’ corresponds to object-related information that is previously identified by the target perception system (102) from the frame (300) and stored in the TPS database (124).
[0094] Further, in Table 1, ‘ORI identified by RPS (104)’ corresponds to object-related information that is previously identified by the reference perception system (104) from a frame (600) and is stored in the RPS database (116). In one embodiment, the frame (600) is part of a reference video that is captured by the reference camera (108). The frame (600) occurs in the reference video at a particular instant of time, which is same as the particular instant of time at which the frame (300) occurs in the raw video.
[0095] To determine the precision score of the target perception system (102) using the frame (300), the perception ECU (122) compares the GT information that is identified from the frame (300) with the object-related information that is identified by the target perception system (102) from the frame (300). Subsequently, the perception ECU (122) determines the precision score of the target perception system (102) using the frame (300) based on the comparison, for example using equation (5).

Precision score= (True Positive)/(True Positive+False Positive) (5)

[0096] In equation (5), the term “true positive” corresponds to a number of object-related information that are accurately identified by the target perception system (102). For instance, with reference to exemplary data tabulated in Table 1, the perception ECU (122) compares the GT information identified from the frame (300) with the object-related information identified from the frame (300). Based on the comparison, the perception ECU (122) determines that all 5 object-related information that are identified from the frame (300) by the target perception system (102) such as a number of objects identified in the frame (300), a class of the object (302), a position of the object (302), a length of the object (302), and a width of the object (302) are accurate and match with the GT information that is identified from the frame (300). Accordingly, in this example, the perception ECU (122) determines a value associated with “true positive” as 5.
[0097] Further, in equation (5), the term “false positive” corresponds to a number of object-related information that are incorrectly identified by the target perception system (102). As all 5 object-related information identified from the frame (300) by the target perception system (102) are accurate, the perception ECU (122) determines a value associated with “false positive” as 0. Accordingly, the perception ECU (122) determines the precision score of the target perception system (102) using the frame (300) as 1 using equation (5) when the determined value of “true positive” is 5 and the determined value of “false positive” is 0.
[0098] In certain embodiments, the perception ECU (122) similarly determines precision scores of the target perception system (102) using various selected subset of frames of the raw video based on corresponding true positive and false positive values using equation (5). Further, the perception ECU (122) determines an average of the precision scores of the target perception system (102) determined using the selected subset of frames of the raw video as the average precision score of the target perception system (102).
[0099] In certain embodiments, the perception ECU (122) similarly determines the average precision score of the reference perception system (104). To that end, for instance, with reference to exemplary data tabulated in Table 1, the perception ECU (122) compares the exemplary GT information that is identified from the frame (300) with the object-related information that is identified by the reference perception system (104) from the frame (600). In one embodiment, the frame (300) and the corresponding frame (600) are frames that are captured simultaneously at the same instant of time by the raw camera (106) and the reference camera (108), respectively.
[00100] Further, based on the comparison, the perception ECU (122) determines that only 2 object-related information that are identified by the reference perception system (104) from the frame (600) are accurate. For example, the perception ECU (122) determines only the class and the length of the object (302) identified by the reference perception system (104) from the frame (600) to be accurate and matching with the GT information that is identified from the frame (300). Further, the perception ECU (122) determines that the remaining 3 object-related information that are identified by the reference perception system (104) from the frame (600) such as the position of the object (302), the width of the object (302), and the number of objects in the frame (600) are inaccurate and to be not matching with the GT information that is identified from the frame (300). Accordingly, in this example, the perception ECU (122) determines values associated with “true positive” and “false positive” as 2 and 3, respectively. Further, the perception ECU (122) determines the precision score of the reference perception system (104) using the frame (600) as 0.4 using equation (5) when the determined value of “true positive” is 2 and the determined value of “false positive” is 3.
[00101] In certain embodiments, the perception ECU (122) similarly determines precision scores of the reference perception system (104) using a selected subset of frames of the reference video that is captured by the reference camera (108) using corresponding true positive and false positive values using equation (5). Further, the perception ECU (122) determines an average of the precision scores of the reference perception system (104) determined using the selected subset of frames as the average precision score of the reference perception system (104).
[00102] At step (510), the perception ECU (122) determines an average recall score of each of the target perception system (102) and the reference perception system (104). In one embodiment, the perception ECU (112) determines the average recall score in order to subsequently determine an accuracy level of each of the target and reference perception systems (102 and 104). In certain embodiments, the average recall score, thus determined by the perception ECU (122), indicates the abilities of each of the target and reference perception systems (102 and 104) to identify information related to objects in a selected subset of frames of a video without actually missing information related to any of those objects in the selected subset of frames. In certain embodiments, the perception ECU (122) compares the GT information that is identified from the frame (300) with the object-related information that is identified by the target perception system (102) from the frame (300). Further, the perception ECU (122) determines the recall score of the target perception system (102) using the frame (300) based on the comparison, for example, using an equation (6).

Recall score= (True Positive)/(True Positive+False Negative) (6)

[00103] In Equation (6), the term “true positive” corresponds to a number of object-related information that are accurately identified by the target perception system (102) and “false negative” corresponds to a number of GT information that are not at all identified by the target perception system (102). For instance, with reference to exemplary data tabulated in Table 1, the perception ECU (122) compares the GT information that is identified from the frame (300) with the object-related information that is identified by the target perception system (102) from the frame (300). Based on the comparison, the perception ECU (122) determines that all 5 object-related information that is identified by the target perception system (102) from the frame (300) are accurate, as noted previously. Further, the perception ECU (122) determines that all the GT information that is identified from the frame (300) are also identified by the target perception system (102) from the frame (300). Accordingly, in this example, the perception ECU (122) determines values associated with “true positive” and “false negative” as 5 and 0, respectively. Further, the perception ECU (122) determines the recall score of the target perception system (102) using the frame (300) as 1 using equation (6) when the determined value of “true positive” is 5 and the determined value of “false negative” is 0.
[00104] In certain embodiments, the perception ECU (122) similarly determines recall scores of the target perception system (102) using various selected subset of frames of the raw video using corresponding true positive and false negative values using equation (6). Further, the perception ECU (122) determines an average of the recall scores of the target perception system (102) determined using the selected subset of frames as the average recall score of the target perception system (102).
[00105] In certain embodiments, the perception ECU (122) also determines the average recall score of the reference perception system (104). For instance, with reference to exemplary data tabulated in Table 1, the perception ECU (122) compares the exemplary GT information that is identified from the frame (300) with the object-related information that is identified by the reference perception system (104) from the frame (600). Based on the comparison, the perception ECU (122) determines that there are only 2 object-related information that are identified by the reference perception system (104) from the frame (600) are accurate, as noted previously. Further, the perception ECU (122) determines that the reference perception system (104) fails to identify an object corresponding to lane markings (326) in the frame (600), which is identified by the perception ECU (122) in the frame (300). Accordingly, in this example, the perception ECU (122) determines values associated with “true positive” and “false negative” as 2 and 1, respectively. Further, the perception ECU (122) determines the recall score of the reference perception system (104) using the frame (600) as 0.67 using equation (5) when the determined value of “true positive” is 2 and the determined value of “false negative” is 1.
[00106] In certain embodiments, the perception ECU (122) similarly determines recall scores of the reference perception system (104) using various selected subset of frames of the reference video that is captured by the reference camera (108) using corresponding true positive and false negative values using equation (6). Further, the perception ECU (122) determines an average of the recall scores of the reference perception system (104) determined using the selected subset of frames of the reference video as the average recall score of the reference perception system (104).
[00107] At step (512), the perception ECU (122) determines an accuracy level of each of the target perception system (102) and the reference perception system (104). In one embodiment, the perception ECU (122) determines an accuracy level of the target perception system (102) based on the determined average precision score and the determined average recall score of the target perception system (102) using equation (7). Similarly, the perception ECU (122) determines an accuracy level of the reference perception system (104) based on the determined average precision score and the determined average recall score of the reference perception system (104) using equation (8).

Accuracy level of the TPS (102)= APS of the TPS * ARS of the TPS/((APS of the TPS + ARS of the TPS)/2) (7)
Accuracy level of the RPS (104)= APS of the RPS * ARS of the RPS/((APS of the RPS + ARS of the RPS)/2) (8)

[00108] In equations (7) and (8), the term ‘APS’ corresponds to the average precision score, ‘ARS’ corresponds to the average recall score, the ‘TPS’ corresponds to the target perception system (102), and the ‘RPS’ corresponds to the reference perception system (104). In one embodiment, the determined accuracy levels correspond to F1 scores of the target and reference perception systems (102 and 104). For the sake of simplicity, these F1 scores are referred throughout various embodiments of the present disclosure as the accuracy levels of the target and reference perception systems (102 and 104), respectively.
[00109] At step (514), the perception ECU (122) determines a bounding box generation accuracy of each of the target perception system (102) and the reference perception system (104). In one embodiment, the perception ECU (122) determines the bounding box generation accuracy of each of the target and reference perception systems (102 and 104) using equations (9)-(13).

Area of intersection=(XGTBB-XPSBB+1)*(YGTBB-YPSBB+1) (9)
Area of PSBB=(PSBB Width+1)*(PSBB length+1) (10)
Area of GTBB=(GTBB Width+1)*(GTBB length+1) (11)
Area of union=Area of GTBB+Area of PSBB-Area of intersection (12)
Bounding box generation accuracy= (Area of intersection)/(Area of union) (13)

[00110] In Equations (9) - (13), ‘XGTBB’ and ‘YGTBB’ correspond to a top left X-coordinate and a top left Y-coordinate, respectively of a bounding box generated around an object by the perception ECU (122). Further, ‘XPSBB’ and ‘YPSBB’ correspond to a top left X-coordinate and a top left Y-coordinate, respectively of a bounding box generated around an object by the target perception system (102) or the reference perception system (104). In addition, ‘GTBB’ corresponds to the bounding box generated around the object by the perception ECU (122), and ‘PSBB’ corresponds to the bounding box generated around the object by the target perception system (102) or the reference perception system (104).
[00111] For instance, FIG. 7A illustrates an exemplary view depicting a first bounding box (702) and a second bounding box (704) that are generated around an object (706) in a frame (700) of the raw video by the perception ECU (122) and the target perception system (102), respectively. In this example, the perception ECU (122) determines an area of intersection between the first bounding box (702) and the second bounding box (704) based on X-Y coordinates of a top left corner (708) of the first bounding box (702) and X-Y coordinates of a top left corner (710) of the second bounding box (704), using equation (9). Further, the perception ECU (122) determines an area of the second bounding box (704) generated by the target perception system (102) based on a width and a length of the second bounding box (704) using equation (10).
[00112] In addition, the perception ECU (122) determines an area of the first bounding box (702) based on a width and a length of the first bounding box (702) using equation (11). Furthermore, the perception ECU (122) determines an area of union between the first bounding box (702) and the second bounding box (704) based on the determined area of the first bounding box (702), area of the second bounding box (704), and area of intersection using equation (12). The perception ECU (122) then determines the bounding box generation accuracy of the target perception system (102) for the object (706) in the frame (700) based on the determined area of intersection and the determined area of union using equation (13).
[00113] In certain embodiments, the perception ECU (122) similarly determines bounding box generation accuracies of the target perception system (102) for objects in various selected subset of frames of the raw video, for example, using equations (9)-(13), noted previously. Subsequently, the perception ECU (122) determines an average of the bounding box generation accuracies determined for objects in the selected subset of frames of the raw video as the bounding box generation accuracy of the target perception system (102).
[00114] In certain embodiments, the perception ECU (122) similarly determines the bounding box generation accuracy of the reference perception system (104). For instance, FIG. 7B illustrates an exemplary view depicting the first bounding box (702) and a third bounding box (714) that are generated around the object (706) in a frame (712) of the reference video by the perception ECU (122) and the reference perception system (104), respectively. In one embodiment, the frame (700) and the frame (712) are captured at the same instant of time by the raw camera (106) and the reference camera (108), respectively while the vehicle (110) is navigating via a particular navigation path. Therefore, the frame (700) and the frame (712) include same object information but are captured by two different cameras such as the raw camera (106) and the reference camera (108), respectively.
[00115] With reference to FIG. 7B, the perception ECU (122) determines the area of intersection between the first bounding box (702) and the third bounding box (714) based on X-Y coordinates of the top left corner (708) of the first bounding box (702) and X-Y coordinates of a top left corner (716) of the third bounding box (714), using equation (9). Further, the perception ECU (122) determines an area of the third bounding box (714) generated by the reference perception system (104) based on a width and a length of the third bounding box (714) using equation (10). In addition, the perception ECU (122) determines an area of union between the first bounding box (702) and the third bounding box (714) based on the determined area of the first bounding box (702), area of the third bounding box (714), and area of intersection using equation (12). The perception ECU (122) then determines the bounding box generation accuracy of the reference perception system (104) for the object (706) in the frame (712) based on the determined area of intersection and the determined area of union using equation (13).
[00116] In certain embodiments, the perception ECU (122) similarly determines bounding box generation accuracies of the reference perception system (104) for objects in various selected subset of frames of the reference video, for example, using equations (9)-(13), noted previously. Subsequently, the perception ECU (122) determines an average of the bounding box generation accuracies determined for objects in the selected subset of frames of the reference video as the bounding box generation accuracy of the reference perception system (104).
[00117] At step (516), the perception ECU (122) identifies if the accuracy level and the bounding box generation accuracy of the target perception system (102) are greater than the accuracy level and the bounding box generation accuracy of the reference perception system (104), respectively. At step (518), the perception ECU (122) automatically performs a correction action to improve accuracy level of results produced by the target perception system (102) when at least one of the accuracy level and the bounding box generation accuracy of the target perception system (102) is less than the accuracy level and the bounding box generation accuracy of the reference perception system (104), respectively.
[00118] An example of the corrective action performed to improve accuracy levels of results produced by the target perception system (102) includes training the target perception system (102) with GT information that is identified from the raw video by the perception ECU (122). In certain embodiments, each of the perception ECU (122) and the target perception system (102) may be implemented as a local or remote learning-based system employing one or more machine learning algorithms. Examples of such machine learning algorithms include support vector machines, cubist and lasso regression, decision tree learning, association rule learning, artificial neural networks, deep learning, inductive logic programming, clustering, Bayesian networks, reinforcement learning, representation learning, similarity and metric learning, sparse dictionary learning, and genetic algorithms.
[00119] For instance, when the bounding box generation accuracy of the target perception system (102) is less than a designated threshold or than the bounding box generation accuracy of the reference perception system (104), the perception ECU (122) identifies that the target perception system (102) implemented as a machine-learning based system fails, to accurately generate bounding boxes around objects in different frames of the raw video. Accordingly, in this example, the perception ECU (122) automatically trains the target perception system (102) with the GT information including sizes of bounding boxes that are generated by the perception ECU (122) around the objects in different frames of the raw video for enhancing the bounding box generation accuracy of the target perception system (102).
[00120] Similarly, when the accuracy level of the target perception system (102) is less than a designated threshold or is less than the accuracy level of the reference perception system (104), the perception ECU (122) identifies that the target perception system (102), when implemented as the machine-learning based system, fails to accurately identify one or more of classes, positions, lengths, and widths of objects exist in different frames of the raw video. Accordingly, in this example, the perception ECU (122) automatically trains the target perception system (102) with the GT information including classes, positions, lengths, and/or widths of objects identified by the perception ECU (122) from different frames of the raw video for enhancing the accuracy level of the target perception system (102).
[00121] In addition to automatically determining the accuracy levels and the bounding box generation accuracies of the target and reference perception systems (102 and 104), the PUB system (100) also displays values of the object-related information that are identified by the target and reference perception systems (102 and 104) and perception ECU (122) on the display unit (132). Displaying ORI values on the display unit (132) enables a user to compare results output by each of the target and reference perception systems (102 and 104) with results output by the perception ECU (122), and manually identify a particular perception system that outputs more accurate results.
[00122] In one embodiment, prior to displaying the values of the object-related information on the display unit (132), the PUB system (100) converts the object related information previously identified in the world coordinate system by the reference perception system (104) to the pixel coordinate system. In certain embodiments, the PUB system (100) converts the object related information previously identified in the world coordinate system by the reference perception system (104) to the pixel coordinate system based on intrinsic and extrinsic calibration parameters of the reference camera (108) using equation (14).

s[u [fx 0 cx [r11 r12 r13 t1 [X (14)
v 0 fy cy r21 r22 r23 t2 Y
1]= 0 0 1] r31 r32 r33 t3] Z
1]

[00123] In Equation (14), ‘U’ and ‘V’ correspond to the object-related information that is identified in the world coordinate system by the reference perception system (104) and is subsequently converted to the pixel coordinate system, ‘fx’ and ‘fy’ correspond to focal lengths of the reference camera (108), and ‘cx’ and ‘cy’ represent an optical center of the reference camera (108). Further, in Equation (14), ‘r’ and ‘t’ correspond to rotational and translational matrices, respectively, representing extrinsic calibration parameters of the reference camera (108). In addition, in Equation (14), ‘X’, ‘Y’, and ‘Z’ correspond to values of the object-related information that are identified in the world coordinate system by the reference perception system (104).
[00124] Post converting the object related information identified by the reference perception system (104) to the pixel coordinate system, the PUB system (100) displays the values of the object-related information identified by all of the target perception system (102), reference perception system (104), and perception ECU (122) in the pixel coordinate system on the display unit (132). As noted previously, displaying these values on the display unit (132) enables the user to manually compare the displayed values amongst each other and identify a particular perception system that outputs accurate results.
[00125] Conventional systems cannot directly compare an accuracy level of the reference perception system (104) that already exists in the market with an accuracy level of the target perception system (102) that is newly developed and is yet to be deployed in real-world vehicles. This is because, the reference perception system (104) available in the market does not generally store a source video from which the reference perception system (104) has identified the object-related information. Hence, the conventional systems cannot retrieve the source video from the reference perception system (104) and provide the source video as an input to the target perception system (102) for comparing accuracy levels of the target and reference perception systems (102 and 104).
[00126] However, the PUB system (100) described in the present disclosure compares the accuracy level of even such off-the-shelf reference perception system (104) that does not include any provision to store the source video with the accuracy level of a newly developed target perception system (102). Further, when the accuracy level of the target perception system (102) is less than the accuracy level of the reference perception system (104), the PUB system (100) automatically improves the accuracy level of the target perception system (102) by training the target perception system (102) with the GT information identified by the perception ECU (122). When such target perception system (102) whose accuracy level is improved is deployed in real-world vehicles, the target perception system (102) identifies information related to objects in the surroundings of a host vehicle more accurately, which in turn, enables the host vehicle to navigate safely without colliding with the objects in the surroundings of the host vehicle. Further, the target perception system (102) with improved accuracy level would accurately identify information related to static and dynamic objects, lanes, traffic signs, and traffic lights in the surroundings of the host vehicle, which ensures a safe navigation of the host vehicle, and safety and health of the driver, passengers, and surrounding infrastructure.
[00127] Although specific features of various embodiments of the present systems and methods may be shown in and/or described with respect to some drawings and not in others, this is for convenience only. It is to be understood that the described features, structures, and/or characteristics may be combined and/or used interchangeably in any suitable manner in the various embodiments shown in the different figures.
[00128] While only certain features of the present systems and methods have been illustrated and described herein, many modifications and changes will occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes.

LIST OF NUMERAL REFERENCES:

100 Perception unit benchmarking system
102 Target perception system (TPS)
104 Reference perception system (RPS)
106 Raw image sensor
108 Reference image sensor
110 Vehicle
112 Horizontal field of view
114 Communications link
116 RPS database
118 Telematics control unit
120 Video storing system
122 Perception electronic control unit
124 TPS database
126 Video playback system
128 Calibration database
130 Ground truth (GT) database
132 Display unit
200-220 Steps of a method for identifying GT information
300, 700 Frames of a raw video
302 Object in a frame
304, 702, 704, 714 Bounding boxes
306 Reference point in the raw camera
308 Horizontal pixel distance
310 X-axis of pixel coordinate system
312 Reference point in an object
314 Vertical pixel distance
316 Y-axis of pixel coordinate system
318 Origin of pixel coordinate system
320,322 Top left corners of bounding boxes
324 Bottom left corner of bounding box
326 Lane markings
400-408 Steps of a method for identifying object-related information from the raw video by TPS (102)
500-518 Steps of a method for determining accuracy levels of results produced by TPS (102) and RPS (104)
600 Frame of a reference video
706 Object in the frame of a raw video
708, 710, 716 Top left corners of bounding boxes
712 Frame of a reference video

, Claims:We claim:

1. A method for improving an accuracy level of a target perception system (102), comprising:
capturing a raw video and a reference video of the surroundings of a selected space (110) simultaneously using a raw image sensor (106) and a reference image sensor (108), respectively, that are deployed in the selected space (110), wherein the raw image sensor (106) is vertically displaced from the reference image sensor (108) in the selected space (110) such that both the reference image sensor (108) and the raw image sensor (106) comprise the same horizontal field of view;
identifying object-related information related to a plurality of objects in a plurality of frames of the reference video in a first coordinate system by processing the reference video using a reference perception system (104);
identifying object-related information related to the plurality of objects in the plurality of frames of the raw video in a second coordinate system by processing the raw video using the target perception system (102), wherein the first coordinate system is different from the second coordinate system;
identifying ground truth information related to the plurality of objects in the plurality of frames of the raw video in the second coordinate system by processing the raw video using a perception electronic control unit (122) in the selected space (110);
converting the ground truth information and the object-related information that is identified by the target perception system (102) from the second coordinate system to the first coordinate system by the perception electronic control unit (122);
determining an accuracy level of results output by the reference perception system (104) by comparing the object-related information that is identified in the first coordinate system by the reference perception system (104) with the ground truth information that is identified by the perception electronic control unit (122) and is subsequently converted to the first coordinate system;
determining an accuracy level of results output by the target perception system (102) by comparing the object-related information that is identified by the target perception system (102) and is subsequently converted to the first coordinate system with the ground truth information that is identified by the perception electronic control unit (122) and is subsequently converted to the first coordinate system; and
training the target perception system (102) with the ground truth information identified in the second coordinate system by the perception electronic control unit (122) when the determined accuracy level of the target perception system (102) is less than a designated threshold, when the determined accuracy level of the target perception system (102) is less than the determined accuracy level of the reference perception system (104), or a combination thereof, for automatically improving the accuracy level of the target perception system (102).

2. The method as claimed in claim 1, wherein the first coordinate system corresponds to a world coordinate system and the second coordinate system corresponds to a pixel coordinate system.

3. The method as claimed in claim 2, wherein identifying ground truth information related to the plurality of objects in the plurality of frames of the raw video in the second coordinate system comprises:
generating bounding boxes around the plurality of objects in the plurality of frames of the raw video by processing user inputs comprising the bounding boxes that are drawn around the plurality of objects by a user;
identifying classes of the plurality of objects in the plurality of frames of the raw video;
identifying a position of each of the plurality of objects in the plurality of frames of the raw video based on a horizontal pixel distance along an X-axis (310) between an origin point (306) of the raw image sensor (106) and a reference point (312) in a corresponding object, and a vertical pixel distance along a Y-axis (316) between the origin point (306) and the reference point (312);
identifying a width of each of the plurality of objects in the plurality of frames of the raw video based on pixel coordinates of a top left corner (320) and a top right corner (322) of a bounding box (304) enclosing a corresponding object selected from the plurality of objects in the plurality of frames of the raw video;
identifying a length of each of the plurality of objects in the plurality of frames of the raw video based on pixel coordinates of the top left corner (320) and a bottom left corner (324) of the bounding box (304) enclosing a corresponding object selected from the plurality of objects in the plurality of frames of the raw video; and
generating the ground truth information related to the plurality of objects in the plurality of frames of the raw video in the pixel coordinate system, wherein the generated ground truth information comprises sizes of the generated bounding boxes, the identified classes, the position of each of the plurality of objects, the width of each of the plurality of objects, and the length of each of the plurality of objects.

4. The method as claimed in claim 3, wherein converting the ground truth information from the second coordinate system to the first coordinate system comprises:
converting the generated ground truth information comprising the position, the width, and the length of each of the plurality of objects in the plurality of frames of the raw video from the pixel coordinate system to the world coordinate system based on one or more intrinsic calibration parameters and one or more extrinsic calibration parameters of the raw image sensor (106), wherein the one or more intrinsic calibration parameters comprise one or more of a focal length, an optical center, and radial distortion coefficients of a lens of the raw image sensor (106), and wherein the one or more extrinsic calibration parameters comprise one or more of a translation matrix and a rotation matrix that are used to convert the ground truth information from the pixel coordinate system to the world coordinate system, and
storing the ground truth information related to the plurality of objects in the plurality of frames of the raw video in the world coordinate system in a ground truth database (130).

5. The method as claimed in claim 1, wherein the reference perception system (104) corresponds to an off-the-shelf perception system that is available in the market and is already tested and deployed in the selected space (110) comprising real-world vehicles, wherein the target perception system (102) corresponds to a perception system that is developed subsequent to the reference perception system (104) and is yet to be deployed in the selected space (110) comprising real-world vehicles, and
wherein the object-related information identified by each of the reference perception system (104) and the target perception system (102) comprises one or more of bounding boxes generated around the plurality of objects, classes, positions, widths, and lengths of the plurality of objects.

6. The method as claimed in claim 1, wherein determining the accuracy level of results output by the reference perception system (104) comprises:
comparing the object-related information that is identified by the reference perception system (104) from each frame selected from the plurality of frames of the reference video with the ground truth information that is identified by the perception electronic control unit (122) from each corresponding frame selected from the plurality of frames of the raw video;
determining a precision score of the reference perception system (104) using each of the plurality of frames of the reference video based on a number of object-related information that are accurately identified and a number of object-related information that are inaccurately identified by the reference perception system (104) from a corresponding frame selected from the plurality of frames of the reference video;
determining an average precision score of the reference perception system (104) based on the corresponding precision score of the reference perception system (104) determined using each of the plurality of frames of the reference video;
determining a recall score of the reference perception system (104) using each of the plurality of frames of the reference video based on a number of object-related information that are accurately identified and a number of object-related information that are unidentified by the reference perception system (104) from a corresponding frame selected from the plurality of frames of the reference video;
determining an average recall score of the reference perception system (104) based on the corresponding recall score of the reference perception system (104) determined using each of the plurality of frames of the reference video; and
determining the accuracy level of the reference perception system (104) based on the determined average precision score and the determined average recall score of the reference perception system (104).

7. The method as claimed in claim 1, wherein determining the accuracy level of results output by the target perception system (102) comprises:
comparing the object-related information and the ground truth information that are identified by the target perception system (102) and the perception electronic control unit (122), respectively from each frame selected from the plurality of frames of the raw video with each other;
determining a precision score of the target perception system (102) using each of the plurality of frames of the raw video based on a number of object-related information that are accurately identified and a number of object-related information that are inaccurately identified by the target perception system (102) from a corresponding frame selected from the plurality of frames of the raw video;
determining an average precision score of the target perception system (102) based on the corresponding precision score of the target perception system (102) determined using each of the plurality of frames of the raw video;
determining a recall score of the target perception system (102) using each of the plurality of frames of the raw video based on a number of object-related information that are accurately identified and a number of object-related information that are unidentified by the target perception system (102) from a corresponding frame selected from the plurality of frames of the raw video;
determining an average recall score of the target perception system (102) based on the corresponding recall score of the target perception system (102) determined using each of the plurality of frames of the raw video; and
determining the accuracy level of the target perception system (102) based on the determined average precision score and the determined average recall score of the target perception system (102).

8. The method as claimed in claim 1, wherein determining the accuracy level of results output by the reference perception system (104) comprises:
determining an area of intersection between a first bounding box and a second bounding box, wherein the first bounding box is generated by the reference perception system (104) around an object in a frame of the reference video, wherein the second bounding box is generated by the perception electronic control unit (122) around the same object in the corresponding frame of the raw video, wherein the area of intersection is determined based on X-Y coordinates of top left corners of the first bounding box and the second bounding box;
determining an area of union between the first bounding box and the second bounding box based on an area of the first bounding box, an area of the second bounding box, and the determined area of intersection between the first bounding box and the second bounding box; and
determining a bounding box generation accuracy of the reference perception system (104) based on the determined area of intersection and the determined area of union between the first bounding box and the second bounding box.

9. The method as claimed in claim 1, wherein determining the accuracy level of results output by the target perception system (102) comprises:
determining an area of intersection between a first bounding box and a second bounding box, wherein the first bounding box is generated by the target perception system (102) around an object in a frame of the raw video, wherein the second bounding box is generated by the perception electronic control unit (122) around the same object in the same frame of the raw video, wherein the area of intersection is determined based on X-Y coordinates of top left corners of the first bounding box and the second bounding box;
determining an area of union between the first bounding box and the second bounding box based on an area of the first bounding box, an area of the second bounding box, and the determined area of intersection between the first bounding box and the second bounding box; and
determining a bounding box generation accuracy of the target perception system (102) based on the determined area of intersection and the determined area of union between the first bounding box and the second bounding box.

10. The method as claimed in claim 1, wherein identifying object-related information related to the plurality of objects in the reference video in the first coordinate system comprises:
converting the object-related information that is identified by the reference perception system (104) from the first coordinate system to the second coordinate system based on one or more intrinsic calibration parameters and one or more extrinsic calibration parameters of the reference image sensor (108), wherein the one or more intrinsic calibration parameters comprise one or more of focal lengths and an optical center of the reference image sensor (108), and wherein the one or more extrinsic calibration parameters comprise one or more of a translation matrix and a rotation matrix that are used to convert the object-related information from the first coordinate system to the second coordinate system, and
displaying values of the object-related information identified by each of the target perception system (102), the reference perception system (104), and the perception electronic control unit (122) in the second coordinate system on a display unit (132) for enabling a user to manually compare the accuracy level of the target perception system (102) with the accuracy level of the reference perception system (104).

11. A perception unit benchmarking system (100), comprising:
a perception electronic control unit (122) that is operatively coupled to a target perception system (102) comprising a raw image sensor (106) and a reference perception system (104) comprising a reference image sensor (108), wherein the raw image sensor (106) is vertically displaced from the reference image sensor (108) in a selected space (110) such that both the reference image sensor (108) and the raw image sensor (106) comprise the same horizontal field of view, wherein the perception electronic control unit (122):
receives object-related information in a first coordinate system that is identified by the reference perception system (104) from a reference video of the surroundings of the selected space (110), wherein the received object-related information comprises information related to a plurality of objects in a plurality of frames of the reference video;
receives object-related information in a second coordinate system that is identified by the target perception system (102) from a raw video of the surroundings of the selected space (110), wherein the received object-related information comprises information related to the plurality of objects in the plurality of frames of the raw video, wherein the raw video and the reference video are captured simultaneously using the raw image sensor (106) and the reference image sensor (108), respectively deployed in the selected space (110);
identifies ground truth information related to the plurality of objects in the plurality of frames of the raw video in the second coordinate system by processing the raw video;
converts the ground truth information and the object-related information that is identified by the target perception system (102) from the second coordinate system to the first coordinate system;
determines an accuracy level of results output by the reference perception system (104) by comparing the object-related information that is identified in the first coordinate system by the reference perception system (104) with the ground truth information that is identified by the perception electronic control unit (122) and is subsequently converted to the first coordinate system;
determines an accuracy level of results output by the target perception system (102) by comparing the object-related information that is identified by the target perception system (102) and is subsequently converted to the first coordinate system with the ground truth information that is identified by the perception electronic control unit (122) and is subsequently converted to the first coordinate system; and
trains the target perception system (102) with the ground truth information that is identified in the second coordinate system by the perception electronic control unit (122) when the determined accuracy level of the target perception system (102) is less than a designated threshold, when the determined accuracy level of the target perception system (102) is less than the determined accuracy level of the reference perception system (104), or a combination thereof, for automatically improving the accuracy level of the target perception system (102).

12. The perception unit benchmarking system (100) as claimed in claim 11, wherein each of the raw image sensor (106) and the reference image sensor (108) comprises one or more of a camera, a light detection and ranging sensor, a radio detection and ranging sensor, an ultrasonic imaging system, and a thermographic camera.

13. The perception unit benchmarking system (100) as claimed in claim 11, wherein the perception electronic control unit (122) is implemented as a learning system that resides in or more of a local storage system, a remote storage system, and a cloud storage system communicatively coupled to the selected space (110), wherein the perception electronic control unit (122):
automatically trains the target perception system (102) with the ground truth information comprising one or more of sizes of bounding boxes generated around the plurality of objects, and classes, positions, widths, and lengths of the plurality of the objects in the plurality of frames of the raw video when the determined accuracy level of the target perception system (102) is less than one of the designated threshold and the determined accuracy level of the reference perception system (104).

14. The perception unit benchmarking system (100) as claimed in claim 11, wherein the selected space (110) comprises one or more of a vehicle, a retail space, an industrial area, a factory, a military area, a manufacturing facility, a storage facility, a warehouse, a home premises, a public place, a hospital area, and a supermarket.

15. The perception unit benchmarking system (100) as claimed in claim 11, wherein the perception unit benchmarking system (100) comprises one or more of an obstacle detection system, a lane-keep assist system, a traffic-sign detection system, a traffic-light detection system, a surveillance system, an industrial operation monitoring system, a medical system, and an electronic device.

Documents

Application Documents

#	Name	Date
1	202341061659-POWER OF AUTHORITY [13-09-2023(online)].pdf	2023-09-13
2	202341061659-FORM-9 [13-09-2023(online)].pdf	2023-09-13
3	202341061659-FORM-26 [13-09-2023(online)].pdf	2023-09-13
4	202341061659-FORM 3 [13-09-2023(online)].pdf	2023-09-13
5	202341061659-FORM 18 [13-09-2023(online)].pdf	2023-09-13
6	202341061659-FORM 1 [13-09-2023(online)].pdf	2023-09-13
7	202341061659-FIGURE OF ABSTRACT [13-09-2023(online)].pdf	2023-09-13
8	202341061659-DRAWINGS [13-09-2023(online)].pdf	2023-09-13
9	202341061659-COMPLETE SPECIFICATION [13-09-2023(online)].pdf	2023-09-13