Method And System For Quality Control Of Vehicle Based On Intelligent

< Back

Method And System For Quality Control Of Vehicle Based On Intelligent Character Recognition

Abstract: A method and system for intelligent character recognition is disclosed. The method, performed by a server system, includes receiving a plurality of images of at least an alpha-numeric code. The method further includes processing the plurality of images based at least on implementation of a deep-learning model to identify at least one or more characters in the alpha-numeric code. The deep-learning model utilizes bounding boxes to perform said identification. Furthermore, the method includes determining whether the identity of at least one or more characters in the plurality of images are identical or different. Moreover, the method includes performing quality control (QC) check of the vehicle based on matching of at least one or more characters in the plurality of images. Figure 3

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

12 July 2024

Publication Number

31/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Mahindra and Mahindra Ltd.

Mahindra Towers, Dr. G. M. Bhosale Marg, Worli, Mumbai, Maharashtra, 400018, India

Inventors

1. ASKI, Vidyadhar Jinnappa

S/O Jinnappa Aski, Kurabagodi, Harugeri (P), Raibag (T), Belagavi (Dt), Karnataka-591220, India

2. SHIVHARE, Sohit Kumar

House no: 18/36, Naya Pura Cantt, Guna, Madhya Pradesh, 473001, India

3. JOSHI, Bhavya

E-154, GCW Township, Ultratech Cement Ltd, Kovaya Rajula, Dist. Amreli, Gujarat-365551, India

Specification

FORM 2
THE PATENTS ACT, 1970 (39 OF 1970) & THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See section 10 and rule 13)
“METHOD AND SYSTEM FOR QUALITY CONTROL OF VEHICLE BASED ON INTELLIGENT CHARACTER
RECOGNITION”
We, Mahindra and Mahindra Ltd., an Indian National, of Mahindra Towers, Dr. G. M. Bhosale Marg, Worli, Mumbai, Maharashtra, 400018, India.
The following specification particularly describes the invention and the manner in which it is to be performed.

FIELD OF INVENTION
The present disclosure relates generally to automated quality control (QC) operations, and particularly relates to a method and system for performing intelligent character recognition for performing quality control (QC) check of vehicles.
BACKGROUND OF THE INVENTION
The following description of related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the overall field of the invention, and not as admissions of prior art.
In the highly competitive and regulated vehicle manufacturing industry, maintaining stringent quality control (QC) standards is imperative to ensure the safety, reliability, and compliance of vehicles. Quality control checks encompass a wide range of procedures aimed at verifying that all aspects of the vehicle meet the specified standards and regulatory requirements. One critical aspect of QC in vehicle manufacturing is the verification of Vehicle Identification Numbers (VINs) across different components of the vehicle. The VIN is a unique code used to identify individual motor vehicles. It is essential that the VINs on the chassis, engine, and VIN plate match exactly to ensure traceability and integrity. In other words, the VIN serves as a fingerprint for the vehicle, providing information about the vehicle's make, model, year of manufacture, and place of production. Certain discrepancies in VINs can lead to severe issues, including regulatory non-compliance, potential fraud, and challenges in warranty claims.
However, the conventional process of VIN matching involves several critical steps.

Generally, the chassis or frame of the vehicle has a stamped VIN in a location that is difficult to tamper with or alter. During the manufacturing process, QC inspectors must manually verify that this VIN matches the VIN on the engine and VIN plate. The engine block also carries a VIN, which must correspond with the chassis VIN. It is crucial to ensure that these numbers match for maintaining the traceability of parts and compliance with regulations. Additionally, the VIN plate, typically found on the dashboard or door frame, is easily accessible and used by authorities to quickly identify the vehicle. The conventional QC processes include manual checks to ensure that the VIN plate matches both the chassis and engine VINs. However, the manual process is prone to human error. This can result in mismatched VINs going unnoticed until later stages, causing delays and additional costs. In some cases, fraudulent activities may involve tampering with VINs to disguise stolen vehicles or misrepresent a vehicle’s history.
Therefore, it is apparent from the aforementioned problems and limitations, that there exists a need to provide for a method and system to perform automated quality control (QC) check of the vehicles.
SUMMARY OF THE INVENTION
This section is provided to introduce certain objects and aspects of the present invention in a simplified form that are further described below in the description. This summary is not intended to identify the essential features of the invention nor is it intended for use in determining or limiting the scope of the claimed subject matter.
In an aspect, a method for intelligent character recognition is disclosed. The method includes receiving by a server system a plurality of images of at least an alpha¬numeric code. The alpha-numeric code is printed or engraved on at least one of vehicle identification number (VIN) plate, chassis, or engine block of a vehicle. The method further includes processing by the server system the plurality of images

based at least on implementation of a deep-learning model to identify at least one or more characters in the alpha-numeric code. The deep-learning model utilizes bounding boxes to perform said identification. Furthermore, the method includes determining by the server system whether the identity of at least one or more characters in the plurality of images are identical or different. Moreover, the method includes performing by the server system quality control (QC) check of the vehicle based on matching of at least one or more characters in the plurality of images.
In an aspect, the processing includes creating by the server system bounding boxes around at least one or more characters of the alpha-numeric code based at least on implementation of a text detection model. The processing further includes computing by the server system via the text detection model, horizontal and vertical co-ordinates of the bounding boxes. Furthermore, the processing includes calculating by the server system the angle of each bounding box based at least on orientation of each bounding box. Moreover, the processing includes aligning by the server system each bounding box along a same angle.
In an aspect, the method includes arranging at least one or more characters based at least on spatial co-ordinates of the bounding boxes.
In an aspect, the method includes sorting by the server system the bounding boxes based at least on their spatial arrangement. In addition, the method includes implementing by the server system the text detection model to sequentially read at least one or more characters within each bounding box.
In an aspect, the method includes implementing by the server system fuzzy approximation logic to determine similar characters in two or more bounding boxes, wherein the fuzzy approximation logic is implemented based on a set of parameters.
In an aspect, the set of parameters includes at least one of neighbouring context, adjacent characters, position of characters, and semantic rules.

In an aspect, the deep-learning model is a convolution neural network (CNN).
In an aspect, the deep-learning model is trained based on a dataset including training images. The training images are annotated on a character level.
In an aspect, the deep-learning model implements a U-shaped encoder decoder architecture.
In another aspect, a server system is disclosed. The server system includes at least a processor, at least a memory, and a communication interface coupled to the processor and the memory. The memory stores instructions, which when executed by the processor, causes the server system to receive a plurality of images of at least an alpha-numeric code. The alpha-numeric code is printed or engraved on at least one of vehicle identification number (VIN) plate, chassis, or engine block of a vehicle. The server system is then caused to process the plurality of images based at least on implementation of a deep-learning model to identify at least one or more characters in the alpha-numeric code. The deep-learning model utilizes bounding boxes to perform said identification. Further, the server system is caused to determine whether the identity of at least one or more characters in the plurality of images are identical or different. Furthermore, the server system is caused to perform quality control (QC) check of the vehicle based on matching of at least one or more characters in the plurality of images.
In an aspect, to perform the process step, the server system is caused to create bounding boxes around at least one or more characters of the alpha-numeric code based at least on implementation of a text detection model. Further, the server system is caused to compute via the text detection model, horizontal and vertical co-ordinates of the bounding boxes. Furthermore, the server system is caused to calculate the angle of each bounding box based at least on orientation of each bounding box. Moreover, the server system is caused to align each bounding box

along a same angle.
In an aspect, the server system is caused to arrange at least one or more characters based at least on spatial co-ordinates of the bounding boxes.
In an aspect, the server system is caused to sort the bounding boxes based at least on their spatial arrangement. Also, the server system is caused to implement the text detection model to sequentially read at least one or more characters within each bounding box.
In an aspect, the server system is caused to implement fuzzy approximation logic to determine similar characters in two or more bounding boxes. The fuzzy approximation logic is implemented based on a set of parameters.
In an aspect, the set of parameters includes at least one of neighbouring context, adjacent characters, position of characters, and semantic rules.
In an aspect, the deep-learning model is a convolution neural network (CNN).
In an aspect, the deep-learning model is trained based on a dataset including training images. The training images are annotated on a character level.
In an aspect, the deep-learning model implements a U-shaped encoder decoder architecture.
The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated herein, and constitute a part of this invention, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present invention. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that invention of such drawings includes the invention of electrical components, electronic components or circuitry commonly used to implement such components.
Figure 1 illustrates an environment related to an embodiment of the present invention;
Figure 2 illustrates a simplified block diagram of a server system, in accordance with an embodiment of the present invention; and
Figure 3 illustrates a flowchart of a method for intelligent character recognition, in accordance with an embodiment of the present invention.
The foregoing shall be more apparent from the following more detailed description of the invention.
DETAILED DESCRIPTION OF THE INVENTION
In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, that embodiments of the present invention may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination

of other features. An individual feature may not address any of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein. Example embodiments of the present invention are described below, as illustrated in various drawings.
The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth.
Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its

termination can correspond to a return of the function to the calling function or the main function.
The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive—in a manner similar to the term “comprising” as an open transition word—without precluding any additional or other elements.
Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one

or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
As used herein, a “processor” or “processing unit” includes one or more processors, wherein processor refers to any logic circuitry for processing instructions. A processor may be a general-purpose processor, a special-purpose processor, a conventional processor, a digital signal processor, a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, a low-end microcontroller, Application Specific Integrated Circuits, Field Programmable Gate Array circuits, any other type of integrated circuits, etc. The processor may perform signal coding data processing, input/output processing, and/or any other functionality that enables the working of the system according to the present disclosure. More specifically, the processor or processing unit is a hardware processor.
As used herein, “connect”, “configure”, “couple” and its cognate terms, such as “connects”, “connected”, “configured” and “coupled” may include a physical connection (such as a wired/wireless connection), a logical connection (such as through logical gates of semiconducting device), other suitable connections, or a combination of such connections, as may be obvious to a skilled person.
As used herein, “send”, “transfer”, “transmit”, and their cognate terms like “sending”, “sent”, “transferring”, “transmitting”, “transferred”, “transmitted”, etc. include sending or transporting data or information from one unit or component to another unit or component, wherein the content may or may not be modified before or after sending, transferring, transmitting.
As used herein, “database” “memory unit”, “storage unit” and/or “memory” refers to a machine or computer-readable medium including any mechanism for storing information in a form readable by a computer or similar machine. For example, a

computer-readable medium includes read-only memory (“ROM”), random access memory (“RAM”), magnetic disk storage media, optical storage media, flash memory devices or other types of machine-accessible storage media.
As used herein ‘computer readable media’ refers to both volatile and non-volatile media, removable and non-removable media, any available medium that may be accessed by the computing device. By way of example and not limitation, computer readable media comprise computer storage media and communication media.
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present disclosure.
Figure 1 illustrates an environment [100] related to an embodiment of the present invention. It should be understood that the environment [100], illustrated and hereinafter described, is merely illustrative of an arrangement for describing some exemplary embodiments, and therefore, should not be taken to limit the scope of the embodiments. As such, it should be noted that at least some of the components described below in connection with the environment [100] may be optional and thus in some embodiments may include more, less, or different components than those described in connection with subsequent Figures 2 to 3.
The environment [100] generally depicts a device [104] associated with a user [102], a server system [106], a database [108] associated with the server system [106], and an application [112], connected by a communication network, such as a network [110]. In one embodiment, the application [112] is installed in the device [104]. In addition, the environment [100] includes a vehicle [114].
Various entities in the environment [100] may connect to the network [110] in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram

Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, future communication protocols or any combination thereof.
For example, the network [110] may include multiple different networks, such as a private network made accessible by the server system [106] and a public network (e.g., the Internet, etc.) through which the server system [106] may communicate.
The user [102] may correspond to any individual, organization, representative of a corporate entity, a non-profit organization, or any other person who accesses the application [112] on the device [104]. In one embodiment, the user [102] runs the application [112] on the device [104] with facilitation of a network, such as the network [110]. The user [102] may correspond to an operator that works on the assembly line of the manufacturing of the vehicle [114].
The device [104] is associated with the user [102]. In one embodiment, the device [104] may correspond to any suitable electronic or computing device with at least a camera. In some examples, the device [104] may correspond to a smartphone, a personal computer, a laptop, a personal digital assistant (PDA), an electronic tablet, a desktop computer, a smart device such as smart TV or smart appliance, a smartwatch, etc., among other suitable electronic devices.
The application [112] may refer to an application software that is configured to perform specific tasks or provide specific functionality to the user [102] on digital devices, such as the device [104]. In some examples, the digital application [112] may correspond to social media application, messaging application, productivity application, entertainment application, gaming application, navigation application, and the like.
In one preferred embodiment, the application [112] enables the user [102] to capture

and upload images and videos of metal surfaces. For instance, the application [112] is configured to access the camera of the device [104]. The user [102] can then capture the images and videos of vehicle identification number (VIN) plate of the vehicle [114] through the device [104]. In one embodiment, the application [112] allows the user [102] to upload the images and videos of the metal surfaces of the vehicle [114].
In one example, the user [102] may download and/or install the application [112] from the server system [106]. In another example, the user [102] may download and/or install the application [112] from a remote server (not shown in figures).
In an embodiment, the server system [106] is deployed as a standalone server or can be implemented in cloud as software as a service (SaaS). In another embodiment, the server system [106] is deployed as a distributed server. In an embodiment, the server system [106] provides or hosts the application [112]. In an embodiment, the application [112] is executed in the device [104].
In an embodiment, an instance of the application [112] is accessible in the device [104]. In one implementation, the application [112] connects with the server system [106] with facilitation of the network [110]. The application [112] allows the user [102] to perform quality control (QC) checks of the vehicle [114].
The server system [106] is configured to perform various QC checks of the vehicle [114]. In particular, the server system [106] is configured to perform intelligent character recognition to identify whether the VIN number on the VIN plate of the vehicle [114] matches with the VIN number on the engine block and the chassis of the vehicle [114].
Further, the server system [106] is configured to determine whether the identity of at least one or more characters in the plurality of images are identical or different. To perform the determination, the server system [106] is configured to apply

various logics, algorithms, and/or functions to determine whether the VIN number matches in the plurality of images.
Furthermore, the server system [106] is configured to perform quality control (QC) check of the vehicle [114] based on matching of at least one or more characters in the plurality of images.
The database [108] may be adapted to store information, such as, but not limited to, the plurality of images, and the like. In addition, the database [108] may also include metadata and/or information associated with the application [112], and the like.
In an implementation, the database [108] may be associated with the server system [106]. In an implementation, the database [108] can be accessed, viewed, managed, and/or updated with facilitation of a database management system (DBMS), relational database management system (RDBMS), or the like.
It is noted that the application [112] may include one or more interfaces (e.g., home screen, settings menu, camera menu, etc.). In addition, the placement of menus or buttons on each interface may be different.
The number and arrangement of systems, devices, and/or networks shown in Figure 1 are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks, and/or differently arranged systems, devices, and/or networks than those shown in Figure 1. Furthermore, two or more systems or devices shown in Figure 1 may be implemented within a single system or device, or a single system or device shown in Figure 1 may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of the environment [100] may perform one or more functions described as being performed by another set of systems or another set of devices of the environment [100].

Figure 2 illustrates a simplified block diagram of a server system [200], in accordance with an embodiment of the present invention. For example, the server system [200] is identical to the server system [106] as described in Figure. 1. In some embodiments, the server system [200] is embodied as a standalone physical server and/or having a cloud-based and/or SaaS-based (software as a service) architecture. The server system [200] is configured to perform the method for intelligent character recognition to perform QC check of the vehicle [114].
The server system [200] includes a computer system [202] and a database [204]. The computer system [202] includes at least one processor, such as a processor [206] for executing instructions, a memory [208], a communication interface [210], a bus [212], and a storage interface [214]. The bus [212] enables entities of the computer system [202] to communicate with each other. The database [204] is an example of the database [108] of Figure 1.
In an implementation, the database [204] may be integrated into the computer system [202], potentially utilizing hard disk drives. In another implementation, the database [204] may be external/remote to the computer system [202]. The storage interface [214] provides the processor [206] with access to the database [204] through various adapter options.
It is noted that while the computer system [202] is illustrated with a single processor (i.e., the processor [206]), the computer system [202] can potentially include multiple processors. The processor [206] executes computer-readable instructions to perform operations related to intelligent character recognition. Various processor options, such as application-specific integrated circuit (ASIC) processor, reduced instruction set computing (RISC) processor, complex instruction set computing (CISC) processor, field-programmable gate array (FPGA), and the like can be employed.

The storage interface [214] is any component capable of providing the processor [206] with access to the database [204]. The storage interface [214] may include, for example, an Advanced Technology Attachment (ATA) adapter, a Serial ATA (SATA) adapter, a Small Computer System Interface (SCSI) adapter, a RAID controller, a SAN adapter, a network adapter, and/or any component providing the processor [206] with access to the database [204].
The memory [208] stores the computer-readable instructions necessary for the processor [206]. Examples of the memory [208] include a random-access memory (RAM), a read-only memory (ROM), a removable storage drive, a hard disk drive (HDD), and the like. It should be understood that the memory [208] can also be realized in the form of a database server or cloud storage in conjunction with the server system [200].
The processor [206] is connected to the communication interface [210], allowing the computer system [202] to communicate with remote devices (not shown in figures) such as the device [104], or the database [204] on the network [110]. In one scenario, the processor [206] is configured to perform QC check, enabling multiple entities to utilize various functionalities described in the disclosure.
It is to be noted that the server system [200] as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the server system [200] may include fewer or more components than those depicted in Figure 2.
The processor [206] Is depicted to include a reception engine [216], a processing engine [218], and a quality check (QC) engine [220]. It should be noted that components described herein can be configured in a variety of ways, including electronic circuitries, digital arithmetic and logic blocks, and memory systems in combination with software, firmware, and embedded technologies.

The reception engine [216] includes suitable logic and/or interfaces to receive the plurality of images of at least the alpha-numeric code. The alpha-numeric code is printed or engraved on at least one of vehicle identification number (VIN) plate, chassis, or engine block of the vehicle [114]. In one embodiment, the alpha¬numeric code corresponds to the VIN number of the vehicle [114]. Generally, VIN number is a unique code assigned to a vehicle (e.g., the vehicle [114]) when it is manufactured. In addition, VIN number serves as a fingerprint of the vehicle [114] since no two vehicles can have the same VIN number.
In one implementation, the plurality of images include the image of the VIN number on the VIN plate, the image of the VIN number on the engine block, and the image of the VIN number on the chassis of the vehicle [114].
In an embodiment, the reception engine [216] receives the plurality of images from the device [104]. For instance, the reception engine [216] receives the plurality of images from the camera of the device [104]. In one embodiment, the reception engine [216] receives a plurality of videos from the device [104]. The reception engine [216] then converts the plurality of videos to the plurality of images.
The reception engine [216] then transmits the plurality of images to the processing engine [218]. The processing engine [218] includes suitable logic and/or interfaces to process the plurality of images based at least on implementation of a deep-learning model to identify at least one or more characters in the alpha-numeric code. In one implementation, the deep-learning model utilizes bounding boxes to perform said identification.
In one implementation, the deep-learning model is a convolution neural network (CNN).
To process the plurality of images, the processing engine [218] is configured to

create bounding boxes around at least one or more characters of the alpha-numeric code based at least on implementation of a text detection model. For instance, the processing engine [218] is configured to identify and isolate text regions within the plurality of images.
In one implementation, the text detection model is based on a 50-layer ResNet, enhanced with deformable convolutional networks (DCN) in stages 2, 3, and 4. In addition, the text detection model utilizes a Feature Pyramid Network (DBFPN) neck for improved feature map scaling.
In one implementation, the deep-learning model is trained based on a dataset including training images. The training images are annotated on a character level. In one implementation, the text detection model is trained on 5000 images and tested on 500 images for 750 epochs. In one implementation, Momentum optimizer is used for training the text detection model with an initial learning rate of 0.0003 and DecayLearningRate strategy.
Also, during training of the text detection model, data augmentation techniques (such as flipping, affine transformations, resizing, etc.) are used. The text detection model uses Differential Binarization Loss (DBLoss) with Binary Cross-Entropy Loss (BCELoss), achieving peak performance at epoch 635 with 99.64% precision, 99.01% recall, and 99.32% hmean.
For text recognition, the server system [200] annotates the training images word by word, resulting in 17000 images, and trains over 500 epochs using an Adam optimizer with a Cosine learning rate starting at 0.00005, incorporating L2 regularization and early phase warmup. In addition, a MultiLoss approach combining Connectionist Temporal Classification Loss (CTCLoss) and Neural Representation and Transformation Loss (NRTRLoss) is used, achieving best performance at epoch 466 with a 98.88% accuracy rate and a Normalized Edit Distance (norm_edit_dis) of 0.996.

The Cosine learning rate helps in fine-tuning during the training phase, allowing for larger updates at the beginning and smaller, more precise updates as the text detection model approaches optimal accuracy. The Cosine learning rate ensures that the text detection model does not overshoot the minimal loss point.
Next, the processing engine [218] is configured to compute via the text detection model, horizontal and vertical co-ordinates of the bounding boxes. For instance, the processing engine [218] is configured to determine exact position of each bounding box.
Next, the processing engine [218] is configured to calculate the angle of each bounding box based at least on orientation of each bounding box. In one example, the angle of each bounding box can vary depending on how the characters are positioned within the plurality of images. For instance, characters might get tilted or rotated due to the way the plurality of images are captured. By determining the angle of each bounding box, the processing engine [218] understands how the text is oriented in the plurality of images.
Further, the processing engine [218] is configured to align each bounding box along a same angle. In particular, the processing engine [218] adjusts the orientation of the bounding boxes so that they are uniformly aligned. This alignment ensures that all characters are positioned consistently, facilitating easier reading and further processing.
In one implementation, the processing engine [218] is configured to arrange at least one or more characters based at least on spatial co-ordinates of the bounding boxes. The processing engine [218] is configured to sort the bounding boxes based at least on their spatial arrangement. Next, the processing engine [218] is configured to implement the text detection model to sequentially read at least one or more characters within each bounding box.

In one embodiment, the processing engine [218] is configured to implement fuzzy approximation logic to determine similar characters (e.g., “1” and “I”, “o” and “0”, etc.) in two or more bounding boxes. The fuzzy approximation logic is implemented based on a set of parameters. The set of parameters includes at least one of neighbouring context, adjacent characters, position of characters, and semantic rules. In one implementation, the plurality of images is cropped around the aligned text to prepare the same for recognition.
The QC engine [220] includes suitable logic and/or interfaces to determine whether the identity of at least one or more characters in the plurality of images are identical or different. In particular, the QC engine [220] performs QC check of the vehicle [114] based on matching of at least one or more characters in the plurality of images.
In particular, the QC engine [220] matches the processed image of the VIN plate with the processed image of the engine block and the processed image of the chassis. In case the VIN number in all the three (i.e., the VIN plate, the engine block, and the chassis) matches, the QC engine [220] is configured to validate the match. Thereafter, the QC engine [220] is configured to display on a display (for example, of the device [104]) that the vehicle [114] has passed the quality check.
In case the VIN plate number matches with only one of the engine block or the chassis, the QC engine [220] is configured to detect this mismatch as a partial match. The QC engine [220] is then configured to flag the issue and display a notification on the device [104] to notify the user [102].
In case the VIN plate number does not match both the engine block number and the chassis number, the QC engine [220] is configured to detect this mismatch as a complete mismatch. The QC engine [220] is then configured to flag the issue and display a notification on the device [104]. In this manner, the server system [200] is configured to perform the QC of the vehicle [114].

In one example, the server system [200] is configured to implement a pre-trained model (e.g., the text detection model) to detect and localize individual characters or groups of characters on the plurality of images using object detection techniques like Fast R-CNN.
For instance, the user [102] may access the application [112]. In an embodiment, the user [102] may upload the plurality of images stored on the device [104]. In another embodiment, the user [102] may upload the plurality of images from a cloud storage. In yet another embodiment, the user [102] may capture the plurality of images from the camera of the device [104] in real-time. The plurality of images is then uploaded to the server system [106].
It is noted that to perform the successful QC check, same alpha-numeric code must be printed or engraved on the VIN plate, chassis, and the engine block of the vehicle [114].
Next, the server system [200] is configured to process the plurality of images based at least on implementation of the deep-learning model to identify at least one or more characters in the alpha-numeric code. The deep-learning model utilizes bounding boxes to perform said identification.
Initially, the server system [200] is configured to detect characters or character groups, resulting in bounding boxes for each. Then, the server system [200] is configured to extract the coordinate information for each bounding box, defining the rectangular region containing the character.
To correctly reconstruct the VIN sequence, the server system [200] is configured to sort the bounding boxes by their spatial arrangement on the VIN plate. The primary sorting criteria is based on the y-coordinate, which groups characters residing on the same horizontal line. Within each horizontal group, a secondary sorting based

on the x-coordinate arranges the characters from left to right. This step ensures that the characters are read in the correct order to form the complete VIN sequence.
After sorting the bounding boxes, the server system [200] is configured to sequentially read the characters within each bounding box following the sorted order. This step assembles the complete VIN sequence, accounting for potential line breaks or irregular spacing between characters. The accurate assembly of the VIN sequence is critical for reliable vehicle [114] identification.
In one example, the server system [200] is configured to incorporate additional logic to handle challenges such as confusion between similar-looking characters, glare, low resolution, or complex backgrounds. In one implementation, the server system [200] is configured to implement fuzzy approximation logic and fine-tuning on custom dataset (i.e., the training images) to mitigate such issues. This logic ensures that the text detection model can effectively interpret characters despite potential ambiguities, leading to accurate VIN recognition even in challenging scenarios.
More specifically, the server system [200] is configured to implement fuzzy approximation logic to handle character confusion cues effectively. In one implementation, the deep-learning model is trained to identify patterns representing potential confusion cues within the VIN character sequence. For each identified cue, the deep-learning model is configured to analyze the surrounding context, including adjacent characters and semantic rules of VIN structure. In one implementation, the server system [200] is configured to extract relevant features such as stroke thickness and spatial relationships to quantify the degree of similarity between confusing characters.
Instead of a binary classification, the server system [200] is configured to assign fuzzy membership degrees to each possible interpretation of the confusing character. Using fuzzy logic rules, the server system [200] is configured to combine

evidence from different features and contextual cues to generate a fuzzy output representing confidence in each interpretation.
Based on this output, the server system [200] is configured to select the most probable interpretation or considers multiple interpretations weighted by their confidence levels, ensuring reliable VIN recognition.
In one implementation, an L2 regularizer is used to mitigate the risk of overfitting, especially given the depth of the network and the complexity of the training dataset. This helps in maintaining the ability of the text detection model to generalize well when exposed to new data outside the training dataset.
In one implementation, the server system [200] is configured to implement a U-shaped encoder-decoder architecture to perform intelligent character recognition. In one embodiment, the encoder includes a conventional Convolutional Neural Network (CNN) backbone, such as ResNet, to extract multi-level feature maps at different scales. This extraction is important to capture the rich and hierarchical representations of the input image data.
The decoder has an integrated Adaptive Scale Fusion (ASF) module. The integrated ASF module is configured to process the extracted multi-scale feature maps. The ASF module takes these feature maps and applies parallel dilated convolutions with varying dilation rates (e.g., 1, 2, 5, 7). Thus, the server system [200] is configured to capture a range of visual patterns, from fine-grained textual details to broader contextual information.
The outputs of these dilated convolutions are adaptively fused using global guided filters. In particular, the ASF module computes pixel-wise weighted averages of the dilated features. The weights for these averages are dynamically predicted from a gating branch, which processes the concatenated dilated features and generates spatially-varying weight maps. These weight maps selectively emphasize multi-

scale information for each pixel location, integrating multi-scale cues into unified, scale-insensitive feature representations.
The fused features from the ASF module are then fed into subsequent decoder layers. The decoder is configured to progressively recover higher-resolution feature maps through upsampling and skip connections from the encoder.
Finally, an output segmentation head processes the refined feature maps to predict text/non-text probabilities and quadrilateral bounding boxes that enclose text instances, such as alphanumeric characters on VIN plates. The output segmentation head is responsible for the final detection and localization of text within the input images.
Figure 3 illustrates a flowchart [300] of a method for intelligent character recognition, in accordance with an embodiment of the present invention.
The flowchart [300] initiates at step [302].
Following step [302], at step [304], the method includes receiving by the server system [200] the plurality of images of at least the alpha-numeric code. The alpha-numeric code is printed or engraved on at least one of vehicle identification number (VIN) plate, chassis, or engine block of the vehicle.
At step [306], the method includes processing by the server system [200] the plurality of images based at least on implementation of the deep-learning model to identify at least one or more characters in the alpha-numeric code. The deep-learning model utilizes bounding boxes to perform said identification.
To perform the processing step, the method includes creating by the server system [200] bounding boxes around at least one or more characters of the alpha-numeric code based at least on implementation of a text detection model. In addition, the

method includes computing by the server system [200] via the text detection model, horizontal and vertical co-ordinates of the bounding boxes. Further, the method includes calculating by the server system [200] the angle of each bounding box based at least on orientation of each bounding box. Furthermore, the method includes aligning by the server system [200] each bounding box along a same angle.
At step [308], the method includes determining by the server system [200] whether the identity of at least one or more characters in the plurality of images are identical or different.
At step [310], the method includes performing by the server system [200] quality control (QC) check of the vehicle based on matching of at least one or more characters in the plurality of images.
The flowchart [300] terminates at step [312].
Although the invention is described herein to be implemented by the server system [300], the present invention encompasses that the some or all of the inventive features of the invention may be implemented in the device [104].
While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter to be implemented merely as illustrative of the invention and not as limitation.

I/We Claim
1. A method for intelligent character recognition, comprising:
receiving by a server system a plurality of images of at least an alpha-numeric code, wherein the alpha-numeric code is printed or engraved on at least one of vehicle identification number (VIN) plate, chassis, or engine block of a vehicle;
processing by the server system the plurality of images based at least on implementation of a deep-learning model to identify at least one or more characters in the alpha-numeric code, wherein said deep-learning model utilizes bounding boxes to perform said identification;
determining by the server system whether the identity of at least one or more characters in the plurality of images are identical or different; and
performing by the server system quality control (QC) check of the vehicle based on matching of at least one or more characters in the plurality of images.
2. The method as claimed in claim 1, wherein the processing comprises:
creating by the server system bounding boxes around at least one or more characters of the alpha-numeric code based at least on implementation of a text detection model;
computing by the server system via the text detection model, horizontal and vertical co-ordinates of the bounding boxes;
calculating by the server system the angle of each bounding box based at least on orientation of each bounding box; and
aligning by the server system each bounding box along a same angle.
3. The method as claimed in claim 2, comprises arranging at least one or more
characters based at least on spatial co-ordinates of the bounding boxes.

4. The method as claimed in claim 2, comprising:
sorting by the server system the bounding boxes based at least on their spatial arrangement; and
implementing by the server system the text detection model to sequentially read at least one or more characters within each bounding box.
5. The method as claimed in claim 1, comprising:
implementing by the server system fuzzy approximation logic to determine similar characters in two or more bounding boxes, wherein the fuzzy approximation logic is implemented based on a set of parameters.
6. The method as claimed in claim 5, wherein the set of parameters comprises at least one of neighbouring context, adjacent characters, position of characters, and semantic rules.
7. The method as claimed in claim 1, wherein the deep-learning model is a convolution neural network (CNN).
8. The method as claimed in claim 1, wherein the deep-learning model is trained based on a dataset comprising training images, wherein the training images are annotated on a character level.
9. The method as claimed in claim 1, wherein the deep-learning model implements a U-shaped encoder decoder architecture.
10. A server system, comprising:
at least a processor;
at least a memory; and

a communication interface coupled to the processor and the memory, wherein the memory stores instructions, which when executed by the processor, causes the server system to:
receive a plurality of images of at least an alpha-numeric code, wherein the alpha-numeric code is printed or engraved on at least one of vehicle identification number (VIN) plate, chassis, or engine block of a vehicle;
process the plurality of images based at least on implementation of a deep-learning model to identify at least one or more characters in the alpha-numeric code, wherein said deep-learning model utilizes bounding boxes to perform said identification;
determine whether the identity of at least one or more characters in the plurality of images are identical or different; and
perform quality control (QC) check of the vehicle based on matching of at least one or more characters in the plurality of images.
11. The server system as claimed in claim 10, wherein to perform the process
step, the server system is caused to:
create bounding boxes around at least one or more characters of the alpha-numeric code based at least on implementation of a text detection model;
compute via the text detection model, horizontal and vertical co-ordinates of the bounding boxes;
calculate the angle of each bounding box based at least on orientation of each bounding box; and
align each bounding box along a same angle.
12. The server system as claimed in claim 11, wherein the server system is
caused to arrange at least one or more characters based at least on spatial co¬
ordinates of the bounding boxes.

13. The server system as claimed in claim 11, wherein the server system is
caused to:
sort the bounding boxes based at least on their spatial arrangement; and
implement the text detection model to sequentially read at least one or more characters within each bounding box.
14. The server system as claimed in claim 10, wherein the server system is
caused to:
implement fuzzy approximation logic to determine similar characters in two or more bounding boxes, wherein the fuzzy approximation logic is implemented based on a set of parameters.
15. The server system as claimed in claim 14, wherein the set of parameters comprises at least one of neighbouring context, adjacent characters, position of characters, and semantic rules.
16. The server system as claimed in claim 10, wherein the deep-learning model is a convolution neural network (CNN).
17. The server system as claimed in claim 10, wherein the deep-learning model is trained based on a dataset comprising training images, wherein the training images are annotated on a character level.
18. The server system as claimed in claim 10, wherein the deep-learning model implements a U-shaped encoder decoder architecture.

Documents

Application Documents

#	Name	Date
1	202421053431-STATEMENT OF UNDERTAKING (FORM 3) [12-07-2024(online)].pdf	2024-07-12
2	202421053431-REQUEST FOR EXAMINATION (FORM-18) [12-07-2024(online)].pdf	2024-07-12
3	202421053431-REQUEST FOR EARLY PUBLICATION(FORM-9) [12-07-2024(online)].pdf	2024-07-12
4	202421053431-FORM-9 [12-07-2024(online)].pdf	2024-07-12
5	202421053431-FORM 18 [12-07-2024(online)].pdf	2024-07-12
6	202421053431-FORM 1 [12-07-2024(online)].pdf	2024-07-12
7	202421053431-FIGURE OF ABSTRACT [12-07-2024(online)].pdf	2024-07-12
8	202421053431-DRAWINGS [12-07-2024(online)].pdf	2024-07-12
9	202421053431-DECLARATION OF INVENTORSHIP (FORM 5) [12-07-2024(online)].pdf	2024-07-12
10	202421053431-COMPLETE SPECIFICATION [12-07-2024(online)].pdf	2024-07-12
11	Abstract1.jpg	2024-07-26
12	202421053431-FORM-26 [19-09-2024(online)].pdf	2024-09-19
13	202421053431-Proof of Right [04-12-2024(online)].pdf	2024-12-04
14	202421053431-ORIGINAL UR 6(1A) FORM 1-240625.pdf	2025-06-25