Abstract: The present invention provides a system and method which provides an end-to-end solution to generate scenarios from reference monocular RGB videos and eliminating the need for expensive sensor setups in vehicles for data acquisition. The present invention automates entire process scenario generation and only requires human intervention for optimization and fine-tuning.
Description:FORM-2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10 and rule 13)
Title: A SYTEM AND METHOD FOR SCENARIO GENERATION FOR AUTONOMOUS VEHICLES
APPLICANT DETAILS:
(a) NAME: SIMDAAS AUTONOMY PRIVATE LIMITED
(b) NATIONALITY: Indian
(c) ADDRESS: H No 528 Nankari Bagia Pradhan Gate IIT Kanpur, IIT,
Kalyanpur, Kanpur Nagar, Uttar Pradesh, India
PREAMBLE TO THE DESCRIPTION:
The following specification particularly describes the nature of this invention and the manner in which it is to be performed.
A SYTEM AND METHOD FOR SCENARIO GENERATION FOR AUTONOMOUS VEHICLES
FIELD OF INVENTION:
The present invention relates to a system and method which provides an end-to-end solution to generate scenarios from reference monocular RGB videos and from wild videos without camera pose information and eliminating the need for expensive sensor setups in vehicles for data acquisition.
BACKGROUND OF THE INVENTION:
The following background discussion includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication expressly or implicitly referenced is prior art.
Currently, there are two processes adopted to obtain diverse scenarios for AV and ADAS testing. The first process is to manually collect real-world data using a variety of sensors attached on top of a vehicle that is operated to drive for long periods of time. The collected data may also require curation and post-processing to ensure that the data acquired is of desired quality. Although this is a direct approach, it is expensive, laborious and time-consuming.
The second process requires significant time and effort from humans. For example, to generate a scenario in a simulation, a script writer must create a scenario script from reference data and run it in the simulation. The script's variables—such as vehicle velocities, positions, acceleration, and orientation—must be manually adjusted through a trial-and-error process until the desired scenario plays out as intended. Additionally, the road and lane layouts must be hardcoded into the script. A 3D modelling artist must ensure that the entities within the scenario, including the environment and vehicles are designed accurately to match with the reference data.
The present invention provides a system and a method that provides an end-to-end pipeline to generate scenarios from reference monocular RGB videos and from wild videos without camera pose information, thus eliminating the need for expensive sensor setups in vehicles for data acquisition.
OBJECTIVES OF THE INVENTION:
The primary object of the present invention is to overcome the problem stated in the prior art.
Another object of the present invention is to provide a system and a method that provides an end-to-end solution to generate scenarios from reference monocular RGB videos and wild videos without camera pose information, thus eliminating the need for expensive sensor setups in vehicles for data acquisition.
SUMMARY OF THE INVENTION:
The present invention provides a system for scenario generation for autonomous vehicles comprising:
a) a user interface where a user uploads a monocular RGB video;
b) a perception stack unit comprises plurality of components, where each component is configured to extract a specific feature such as depth, camera pose from the input monocular RGB video, thereafter the components run perception tasks such as semantic segmentation, localization, object tracking, lane detection etc to form a scenario;
c) a trajectory reconstruction unit receives data from the perception stack unit is configured to reconstructs trajectories from a beginning to an end of the scenario;
d) a code generator unit uses the trajectories data from the trajectory reconstruction unit to construct OD and OS scripts and ensuring the required standards and syntaxes are followed; and
e) a simulation refinement unit receives the OD and OS scripts from the code generator and runs a simulation, where during the simulation, one or more sensors are attached to an ego vehicle, capturing the scenario from their perspectives and generating a synthetic data;
wherein the generated synthetic data given to the perception stack unit to extract a specific feature such as depth, camera pose from the synthetic data and comparing the data from the original simulation by the simulation refinement module, where the total difference is treated as an error and corrected directly in the OS and OD files by the simulation refinement module.
In an embodiment, the supplementary data such as an IMU sensor data, a depth camera data if available to the user and its metadata such as a camera intrinsics, a focal length is provided through the user interface.
In an embodiment, the components share an internal representation but predict only a single feature type from the monocular RGB video.
In an embodiment, the components are configured to use the supplementary data to predict a more accurate output.
In an embodiment, the simulation, one or more sensors such as cameras, depth cameras, IMU sensors are attached to the ego vehicle, capturing the scenario from their perspectives and generating a synthetic data. Further, the sensors that record data in simulation can be of any type, physical or virtual and can be programmed in simulation as per the need.
In an embodiment, the corrected OS and OD files are then replayed in the simulation, and this process is repeated until the difference falls below a specified threshold.
In an embodiment, the output from each the components are stored as an intermediate output in a staging database and may be stored in different formats.
In an embodiment, the intermediate outputs from the staging database, are stored in a logs database where the logs database is configured to provide option debugging, evaluation and fine-tuning of scenario.
The present invention provides a method for scenario generation for autonomous vehicles comprising steps of:
a) providing a monocular RGB video through a user interface by a user;
b) extracting a specific feature such as depth, camera pose from the input monocular RGB video by a perception stack unit, thereafter the components run perception tasks such as semantic segmentation, localization, object tracking, lane detection etc to form a scenario;
c) receiving scenario data from the perception stack unit by a trajectory reconstruction unit, where the trajectory reconstruction unit reconstructs trajectories from a beginning to an end from the scenario data;
d) constructing OD and OS scripts by a code generator unit based on the trajectories data it received from the trajectory reconstruction; and
e) running a simulation by a simulation refinement unit based on the OD and OS scripts it receives from the code generator, where during the simulation, one or more sensors are attached to an ego vehicle, capturing the scenario from their perspectives and generating a synthetic data;
wherein the generated synthetic data given to the perception stack unit to extract a specific feature such as depth, camera pose from the synthetic data and comparing the data from the original simulation by the simulation refinement module, where the total difference is treated as an error and corrected directly in the OS and OD files by the simulation refinement module.
DETAILED DESCRIPTION OF DRAWINGS:
To further clarify advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of their scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings in which:
Fig. 1: illustrates the system and method of the present invention.
DETAILED DESCRIPTION:
For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.
It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.
The terms “comprises”, “comprising”, “includes”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus proceeded by “comprises... a” does not, without more constraints, preclude the existence of other elements or additional elements in the system or method.
The present invention discloses a provide a system and a method that provides an end-to-end solution to generate scenarios from reference monocular RGB videos.
In an embodiment, the present invention system for scenario generation for autonomous vehicles comprising:
a) a user interface where a user uploads a monocular RGB video;
b) a perception stack unit comprises plurality of components, where each component is configured to extract a specific feature such as depth, camera pose from the input monocular RGB video, thereafter the components run perception tasks such as semantic segmentation, localization, object tracking, lane detection etc to form a scenario;
c) a trajectory reconstruction unit receives data from the perception stack unit is configured to reconstructs trajectories from a beginning to an end of the scenario;
d) a code generator unit uses the trajectories data from the trajectory reconstruction unit to construct OD and OS scripts and ensuring the required standards and syntaxes are followed; and
e) a simulation refinement unit receives the OD and OS scripts from the code generator and runs a simulation, where during the simulation, one or more sensors are attached to an ego vehicle, capturing the scenario from their perspectives and generating a synthetic data;
wherein the generated synthetic data given to the perception stack unit to extract a specific feature such as depth, camera pose from the synthetic data and comparing the data from the original simulation by the simulation refinement module, where the total difference is treated as an error and corrected directly in the OS and OD files by the simulation refinement module.
In an embodiment, the supplementary data such as an IMU sensor data, a depth camera data if available to the user and its metadata such as a camera intrinsics, a focal length is provided through the user interface.
In an embodiment, the components share an internal representation but predict only a single feature type from the monocular RGB video.
In an embodiment, the components are configured to use the supplementary data to predict a more accurate output.
In an embodiment, the simulation, one or more sensors such as cameras, depth cameras, IMU sensors are attached to the ego vehicle, capturing the scenario from their perspectives and generating a synthetic data. Further, the sensors that record data in simulation can be of any type, physical or virtual and can be programmed in simulation as per the need.
In an embodiment, the corrected OS and OD files are then replayed in the simulation, and this process is repeated until the difference falls below a specified threshold.
In an embodiment, the output from each the components are stored as an intermediate output in a staging database and may be stored in different formats.
In an embodiment, the intermediate outputs from the staging database, are stored in a logs database where the logs database is configured to provide option debugging, evaluation and fine-tuning of scenario.
In an embodiment, the present invention provides a method for scenario generation for autonomous vehicles comprising steps of:
a) providing a monocular RGB video through a user interface by a user;
b) extracting a specific feature such as depth, camera pose from the input monocular RGB video by a perception stack unit, thereafter the components run perception tasks such as semantic segmentation, localization, object tracking, lane detection etc to form a scenario;
c) receiving scenario data from the perception stack unit by a trajectory reconstruction unit, where the trajectory reconstruction unit reconstructs trajectories from a beginning to an end from the scenario data;
d) constructing OD and OS scripts by a code generator unit based on the trajectories data it received from the trajectory reconstruction; and
e) running a simulation by a simulation refinement unit based on the OD and OS scripts it receives from the code generator, where during the simulation, one or more sensors are attached to an ego vehicle, capturing the scenario from their perspectives and generating a synthetic data;
wherein the generated synthetic data given to the perception stack unit to extract a specific feature such as depth, camera pose from the synthetic data and comparing the data from the original simulation by the simulation refinement module, where the total difference is treated as an error and corrected directly in the OS and OD files by the simulation refinement module.
As shown in the figure 1, the present invention provides a system and method which utilizes a deep learning-based perception stack to extract scenario-specific information from a monocular RGB video and generate scenario files that can be played in a simulation. The process flow and its sequence are shown in figure 1, with the numbering indicating each step.
In an embodiment, the user primarily uploads a RGB video, supplementary data (like IMU sensor data, depth camera data etc., if available) and its metadata (for e.g. camera intrinsics, focal length) through an interface or an API call. Providing supplementary data is optional however in some case meta data can be also optional it depends upon camera configuration with which video was taken can be of any configuration, type etc. Further, the uniqueness of the pipeline is to function without it, which has not been done yet by any. It may be noted that the largest repositories of vehicle-bound RGB videos come without supplementary data.
In an embodiment, the perception stack unit comprises a set of components, each of which is designed to extract a specific feature (like depth, camera pose etc.) from an input RGB video. The components then run perception tasks such as semantic segmentation, localization, object tracking, lane detection etc. Each of the components may share internal representations but predict only a single feature type from the video. Additionally, the components are designed to use supplementary information to predict more accurate outputs if available.
In an embodiment, the final outputs from each model/component are stored as intermediate outputs in the staging database and may be stored in different formats. For instance, the metric depth estimation model generates per-frame depth maps with pixel values ranging from 0 to 200 meters, representing real-world distances from the camera to objects corresponding to the pixels. These maps are stored as 16-bit image files in the png format. In contrast, the object tracking model produces a text file containing object IDs and image position coordinates for each frame.
Further, the intermediate outputs from the staging database, complete trajectories and simulator log files are stored in a logs database. This database may be used later debugging, evaluation and fine-tuning of parameters throughout the entire architecture.
In an embodiment, the trajectory reconstruction module/unit makes vehicle trajectories form the core of a scenario. However, the trajectory data generated by the perception stack—part of the intermediate outputs—is often noisy and incomplete, as vehicle positions and orientations are computed frame by frame. To ensure completeness, temporal consistency, and adherence to vehicle kinematics, the trajectory reconstruction module reconstructs trajectories from the beginning to the end of the scenario for all vehicles. The resulting outputs are fully reconstructed trajectories containing essential information such as ego velocities, lane offset values and ego-relative positions.
In an embodiment, the code generator unit uses the complete trajectories and intermediate outputs from the staging data base to construct OD and OS scripts, ensuring the required standards and syntaxes are followed. In addition to these scripts, miscellaneous data like vehicle models, environment maps and other simulation parameters are generated.
In an embodiment, the simulator and simulation refinement module/unit perform simulation and refinement process. due to the probabilistic nature of deep learning models, the generated trajectories may not match with the ones observed from the video and may need validation and refinement. The compiled OD and OS scripts, along with miscellaneous data and user settings, are passed as input to the simulation engine, which runs the simulation. During the simulation, one or more sensors (cameras, depth cameras, IMU sensors etc.) are attached to the ego vehicle, capturing the scenario from their perspectives.
Further, the synthetic data generated is then processed in the perception stack. The intermediate outputs corresponding to these synthetically generated data are compared with those from the original simulation in the simulation refinement module. The total difference is treated as an error and corrected directly in the OS and OD files. The corrected OS and OD files are then replayed in the simulation, and this process is repeated until the difference falls below a specified threshold.
Further, Once the OS and OD files are refined, the data is made available to the user via API or a user interface. The user may then choose to run the scenario simulation with the refined OS and OD files. The OS and OD file format is the current industry standard for sharing scenarios, but the method of the present invention work for any structured format.
The foregoing description of the invention has been set merely to illustrate the invention and is not intended to be limiting. Since modifications of the disclosed embodiments incorporating the substance of the invention may occur to person skilled in the art, the invention should be construed to include everything within the scope of the invention.
, C , Claims:We Claim:
1. A system for scenario generation for autonomous vehicles comprising:
a) a user interface where a user uploads a monocular RGB video;
b) a perception stack unit comprises plurality of components, where each component is configured to extract a specific feature such as depth, camera pose from the input monocular RGB video, thereafter the components run perception tasks such as semantic segmentation, localization, object tracking, lane detection etc to form a scenario;
c) a trajectory reconstruction unit receives data from the perception stack unit is configured to reconstructs trajectories from a beginning to an end of the scenario;
d) a code generator unit uses the trajectories data from the trajectory reconstruction unit to construct OD and OS scripts and ensuring the required standards and syntaxes are followed; and
e) a simulation refinement unit receives the OD and OS scripts from the code generator and runs a simulation, where during the simulation, one or more sensors are attached to an ego vehicle, capturing the scenario from their perspectives and generating a synthetic data;
wherein the generated synthetic data given to the perception stack unit to extract a specific feature such as depth, camera pose from the synthetic data and comparing the data from the original simulation by the simulation refinement module, where the total difference is treated as an error and corrected directly in the OS and OD files by the simulation refinement module.
2. The system for scenario generation for autonomous vehicles as claimed in claim 1, wherein a supplementary data such as an IMU sensor data, a depth camera data if available to the user and its metadata such as a camera intrinsics, a focal length is provided through the user interface.
3. The system for scenario generation for autonomous vehicles as claimed in claim 1, wherein each of the components share an internal representation but predict only a single feature type from the monocular RGB video.
4. The system for scenario generation for autonomous vehicles as claimed in claim 1, wherein each of the components is configured to use the supplementary data to predict a more accurate output, where supplementary data includes alternate view videos, such as stereo/depth (two-view videos) or multiple-view (360-degree videos).
5. The system for scenario generation for autonomous vehicles as claimed in claim 1, wherein during the simulation, one or more sensors such as cameras, depth cameras, IMU sensors are attached to the ego vehicle, capturing the scenario from their perspectives and generating a synthetic data.
6. The system for scenario generation for autonomous vehicles as claimed in claim 1, wherein the corrected OS and OD files are then replayed in the simulation, and this process is repeated until the difference falls below a specified threshold.
7. The system for scenario generation for autonomous vehicles as claimed in claim 1, wherein an output from each the components are stored as an intermediate output in a staging database and may be stored in different formats.
8. The system for scenario generation for autonomous vehicles as claimed in claim 1, wherein the intermediate outputs from the staging database, are stored in a logs database where the logs database is configured to provide option debugging, evaluation and fine-tuning of scenario.
9. A method for scenario generation for autonomous vehicles comprising steps of:
a) providing a monocular RGB video through a user interface by a user;
b) extracting a specific feature such as depth, camera pose from the input monocular RGB video by a perception stack unit, thereafter the components run perception tasks such as semantic segmentation, localization, object tracking, lane detection etc to form a scenario;
c) receiving scenario data from the perception stack unit by a trajectory reconstruction unit, where the trajectory reconstruction unit reconstructs trajectories from a beginning to an end from the scenario data;
d) constructing OD and OS scripts by a code generator unit based on the trajectories data it received from the trajectory reconstruction; and
e) running a simulation by a simulation refinement unit based on the OD and OS scripts it receives from the code generator, where during the simulation, one or more sensors are attached to an ego vehicle, capturing the scenario from their perspectives and generating a synthetic data;
wherein the generated synthetic data given to the perception stack unit to extract a specific feature such as depth, camera pose from the synthetic data and comparing the data from the original simulation by the simulation refinement module, where the total difference is treated as an error and corrected directly in the OS and OD files by the simulation refinement module.
| # | Name | Date |
|---|---|---|
| 1 | 202511024182-STATEMENT OF UNDERTAKING (FORM 3) [18-03-2025(online)].pdf | 2025-03-18 |
| 2 | 202511024182-REQUEST FOR EARLY PUBLICATION(FORM-9) [18-03-2025(online)].pdf | 2025-03-18 |
| 3 | 202511024182-FORM-9 [18-03-2025(online)].pdf | 2025-03-18 |
| 4 | 202511024182-FORM FOR STARTUP [18-03-2025(online)].pdf | 2025-03-18 |
| 5 | 202511024182-FORM FOR SMALL ENTITY(FORM-28) [18-03-2025(online)].pdf | 2025-03-18 |
| 6 | 202511024182-FORM 1 [18-03-2025(online)].pdf | 2025-03-18 |
| 7 | 202511024182-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [18-03-2025(online)].pdf | 2025-03-18 |
| 8 | 202511024182-DRAWINGS [18-03-2025(online)].pdf | 2025-03-18 |
| 9 | 202511024182-DECLARATION OF INVENTORSHIP (FORM 5) [18-03-2025(online)].pdf | 2025-03-18 |
| 10 | 202511024182-COMPLETE SPECIFICATION [18-03-2025(online)].pdf | 2025-03-18 |
| 11 | 202511024182-STARTUP [19-03-2025(online)].pdf | 2025-03-19 |
| 12 | 202511024182-FORM28 [19-03-2025(online)].pdf | 2025-03-19 |
| 13 | 202511024182-FORM 18A [19-03-2025(online)].pdf | 2025-03-19 |
| 14 | 202511024182-MARKED COPIES OF AMENDEMENTS [30-05-2025(online)].pdf | 2025-05-30 |
| 15 | 202511024182-FORM 13 [30-05-2025(online)].pdf | 2025-05-30 |
| 16 | 202511024182-AMMENDED DOCUMENTS [30-05-2025(online)].pdf | 2025-05-30 |
| 17 | 202511024182-Proof of Right [17-06-2025(online)].pdf | 2025-06-17 |
| 18 | 202511024182-FORM-26 [17-06-2025(online)].pdf | 2025-06-17 |