A System/Method For Object Identification By Using Deep Reinforcement

< Back

A System/Method For Object Identification By Using Deep Reinforcement Machine Learning

Abstract: The object recognition with Reinforced Learning is a novel and crucial method for producing high-quality results, applicable to a wide variety of systems. The training sophisticated machine learning models on massive datasets is now possible using the proposed method. In our invention, deep reinforcement learning is used to identify objects in images by first extracting visual attributes and then using a trained agent to identify the objects. The proposed model follows the tried-and-true procedure of "integrate, fixate, and evaluate," and it exhaustively searches for potential target sites only after reaching that conclusion. The suggested model's state is represented as a tuple made up of three components: the history of the observed region (Ht), the history of the evidence region (Et), and the history of the fixation (Ft). St = (Ht, Et, Ft) is a tuple that summarises the observations and actions taken thus far in the search sequence. The proposed model utilises a novel feature extraction method for extracting global features and obtaining local features from the region of interest using a hybrid of traditional classifiers. 4 Claims & 2 Figures

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

30 September 2023

Publication Number

42/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

MLR Institute of Technology

Laxman Reddy Avenue, Dundigal-500043

Inventors

1. Dr. Venkata Nagaraju Thatha

Department of Information Technology, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

2. Mr. B. VeeraSekharReddy

Department of Information Technology, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

3. Mr. G. Satyanarayana

Department of Information Technology, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

4. Mr. D. Sandeep

Department of Information Technology, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

Specification

Description:A SYSTEM/METHOD FOR OBJECT IDENTIFICATION BY USING DEEP REINFORCEMENT MACHINE LEARNING
Field of Invention
Humans are able to draw quick conclusions about items thanks to their ability to examine, recognise, and classify them, and this is made possible by the power that computers have to define tasks computationally based on what we see in the actual world. When there are numerous things in a single photograph, determining what each one is can be a difficult undertaking. This technique recognises the instance of object and composes a window over object for localization, and then uses the instance score to characterize the class of object through classification. There is a lot of research out there that tries to solve the problem of object detection, but it's difficult to implement on resource-constrained devices because of the need to minimise computing cost and training time of the model. Different algorithms exist in the literature to solve the control problem presented by the approach, which involves defining a geometrical box to enclose an item for localisation over a sequence of stages. Locating an object's precise location in a picture involves focused research to move the focus from an estimate to the actual coordinates of the object.
Background of the Invention
Automatic object detection in photos and videos that is both quick and lightweight has many potential applications. The manufacturing sector has been consistently churning out computational and hardware solutions over the past five years, such as gadgets with exceptional processing and storage capacities. However, most object detection algorithms necessitate either a powerful processor or a considerable amount of available memory, making it difficult for devices with limited resources to perform the detection in real-time without access to a centralised server. When executed on a laptop's CPU, FLOD-Net averaged 113 milliseconds per image in memory, making it well-suited for portable devices. While reducing space, processing power, and memory, a CNN model can quickly detect and recognise objects. Due to these restrictions, mobile devices can achieve high framerate object identification without the need for a graphics processing unit (GPU). A simpler and shallower CNN model will be even more energy efficient and produce less waste heat when executed on more capable hardware. Displays the FLOD-Net being used to identify the foods on a plate as an example of its practical application. The goal here is to identify what's on the plate so it can be labelled. For the sake of a quicker detection, the accuracy with which the projected bounding boxes are positioned need not be sacrificed, and this will not affect the object recognition in any way (CN110414432B). The FLOD-Net compromises detection precision for speed by using a shorter CNN model and doing only a single pass through the picture. This allows it to compete with state-of-the-art networks in terms of evaluation time. FLOD-Net has only 10 layers, while most object detection networks have over 30. The suggested design assumes a fixed-size convolution mask over the feature maps to carry out a faster detection, and this is achieved by the fair use of varying aspect ratio sliding windows for the detection. While this lack of specificity may be problematic for some tasks, the FLOD-Net's lightweight environment is ideal for those that do not require accurate placement in order to make their detections.
The YOLO method is a novel way to object recognition. Classifiers were previously incentivized to perform object detection. Object identification is recast in the YOLO framework as a regression issue to discrete bounding boxes and associated class probabilities. Bounding boxes and class probabilities are predicted directly from entire images using a single neural network. Because the entire detection process is a single network, it can be optimised from start to finish before the detection is even shown. The integrated YOLO architecture is lightning fast. The standard YOLO model processes 45 frames per second in real time. Fast YOLO, a lightweight variant of the network, manages to process 155 frames per second while doubling the mAP of competing real-time detectors. While YOLO is not as accurate at localisation as state-of-the-art detection systems, it is much less likely to make erroneous detections where none exist. Figure 2.2 below depicts the YOLO perspective, which is made up of several convolutional layers. You Only Look Once (YOLO) models only require a single, fully-connected neural network for back-propagation-based, end-to-end training. It is a single model that incorporates the two phases of previous algorithms—object detection and localisation. However, YOLO reframes the object identification problem as a regression work by decoupling bounding boxes in space and associating class probabilities. Common object representations are what the network is taught to learn. It accepts the entire image as input and predicts all of the classes' bounding boxes simultaneously. After segmenting the image into a SxS grid, B bounding boxes are created and assigned anticipated confidence scores. If an object's centre is inside a certain grid cell, that cell will generate a prediction for the object's B box and a confidence score for that prediction. Intersection over-Union (IOU) is the difference between the predicted box and the actual box, and it is used to derive the confidence score. This allows us to calculate the confidence score, Pr (Object) IOU. YOLO's quickness is its greatest advantage over other methods. Its superior speed over competing approaches makes it a good fit for use in real-time scenarios. However, there is a cost in terms of precision. Since YOLO prioritises speed over accuracy, it makes more localization mistakes (JP7246382B2).
In many techniques for recognising objects, the support vector machine serves as the classifier. In machine learning, data classification is a common job. The goal of a classification problem is to have the learner approximating a function that maps the input vector data to some set of class labels. The input-output samples of the function are examined in a supervised context. Training data refers to the input-output examples needed to learn the classification function. When it comes to solving the object detection problem, Support Vector Machines (SVM) are a popular supervised learning technique. They are theoretically sound and have had remarkable empirical success in a number of contexts. In order for the judgement function to correctly categorise the unseen illustration data, Support Vector Machines are trained. The ability to correctly categorise information for which no examples exist is known as generalisation. The capacity of SVMs to generalise well is a key factor in their widespread use. Unless otherwise provided, the SVM will treat the case as a binary one (positive or negative).

Summary of the Invention
object recognition with Reinforced Learning is a novel and crucial method for producing high-quality results, applicable to a wide variety of systems. Training sophisticated machine learning models on massive datasets is now possible using the proposed method. Computer vision applications like surveillance and conventional navigation rely heavily on the concept of object identification. Current tough research problems in computer vision include methods for identifying objects in dynamic scenes, where both humans and automobiles may be present. These strategies are in place to combat terrorism, crime, and public safety threats, and to provide efficient traffic management. The proposed effort entails the development of an effective item identification system for use in dynamic settings. Object recognition and behaviour analysis both benefit greatly from video motion detection. In order to tackle the problem of identification accuracy and time efficiency, it is necessary to extract picture features and train the model using a practical training technique. So that they may be simply implemented in resource-constrained device, it is also required to store various data and model in least space. By reducing the time it takes to evaluate an image's average, we can provide a more reliable method of verification.
Brief Description of Drawings
Figure 1: Faster Region-based Convolutional Neural Networks (Faster-RCNN) Model.
Figure 2: Proposed Object Detection Deep Reinforcement Learning - network (ODDRLnet)
Detailed Description of the Invention
Fast R-CNN's typical architecture is depicted in figure. The inputs to the method are an image and the resulting regions of interest. Similar to R-CNN, an outside method is used to create the RoIs. A convolutional neural network (CNN) with some convolutional and max pooling layers is used to process the image. The output of these layers, a convolutional feature map, is fed into a region-of-interest pooling layer. By doing so, a feature vector of a constant length is derived from the feature map for each RoI. Next, the feature vectors are fed into fully connected layers, which are then connected to two output layers: a Softmax layer, which generates probability estimates for the object classes, and a real-valued layer, which generates bounding box co-ordinates computed using regression (i.e., refinements to the initial candidate boxes).
The genuine publication claims that training time for Fast R-CNN can be reduced by a factor of nine compared to R-CNN. The RoI pooling layer and the fully-connected layers of the network can be trained using the back-propagation technique and stochastic gradient descent, respectively. A pre-trained network is typically utilised as a starting point before further adjustments are made. Training is performed on N images in batches. Each image in the minibatch is used to draw a random region of interest (RoI) from. If the intersection of the samples from the RoI is greater than the union with the ground-truth box, then we classify the samples into that group. All other RoIs will blend into the scenery. RoIs from comparable images can be used for classification together, saving time and resources. The unique image is flipped horizontally with a frequency of 0.5 to enrich the data. Together, the true class of the sampled RoI and the offset of the sampled bounding box from the true bounding box are optimised using a multi-task loss function, which is used to fine-tune the Softmax classifier and the bounding box regressors. Classification with fast R-CNN takes less than a second on a state-of-the-art GPU, which is a significant improvement over standard RCNN. This is mostly as a result of applying the same feature map to all RoIs. As the detection time gets shorter, the performance of the region proposal generating technique becomes increasingly important in terms of the overall computing time. Therefore, the process of creating RoIs may become a computational bottleneck. The time required to evaluate the convolutional layers can also be governed by the time required to calculate the fully-connected layers when there are several RoIs. By employing truncated singular value decomposition to compress the fully-connected layers, classification times can be improved by around 30%. However, these lead to a little drop in accuracy.
In object detection, increasing recall is accomplished by generating enough regions for all genuine items to be displayed in the final phase, or by generating more region proposals. Since it is the responsibility of the object detector to recognise accurate regions based on the output of the region proposal generator, the generator is less concerned with precision. But the number of suggestions made has an effect on the outcome. Dense set generation and sparse set generation are the two primary steps in the development of regions. The goal of the dense set technique is to construct, by brute force, a set of bounding boxes that covers every conceivable place of the object. To do this, simply move a detection window across the picture. However, a fast object detector is necessary because looking through each point in the image is computationally expensive. In addition, measurements must be taken due to the wide variety of window types. To limit the number of potential objects, most sliding window approaches use a large step size and a small set of fixed aspect ratios. Most proposed regions in a large collection have no interesting elements. After the phase of object detection is done, these suggestions must be abandoned. If a detection result falls below a specified confidence threshold or its confidence value falls below a local maximum, the result is suppressed (non-maximum suppression). Once the object detection phase is complete, the region proposal generator can place the regions in a class-agnostic manner and delete low-ranking regions instead of discarding them. This results in a relatively small number of object detections . Similar to dense set approaches, post-detection thresholding and non-maximal suppression can be used to further improve detection quality. Algorithms for sparse sets can be grouped into two categories: unsupervised and supervised. Selective Search , which employs an iterative integration of super pixels, is the most often used unsupervised approach. There are even variations on the same technique. Sliding-window objectivity assessment is another option. One popular method of doing this is called Edge Boxes, which counts the number of edges within a bounding box and subtracts the number of edges that overlap the box boundary to arrive at an objectness score. A third class of strategies, based on seed segmentation, also exists. Region proposal creation is handled as a classification or regression issue by supervised algorithms. Support vector machine is one such machine learning algorithm. A convolutional network can also be used to generate the regions of interest. Multi-Box is an instance of bounding box computation using a convolutional neural network. Faster R-CNN is one of the best object identification methods since it employs the same neural network for both region proposal generation and detection. These procedures are what we refer to as integrated approaches.
With the goal of reducing the amount of time needed to solve the problem, we present a dynamic model called object detection deep reinforcement learning ODDRL-Net.The proposed mask may rotate in any direction, making feature-based object detection quick and easy. Agent proposals are used to position the mask at any starting point, and then the location of the item is determined based on the rewards the agent receives. To simplify the process of formulating localization and classification, we apply a sequence of transformations to the mask for identifying objects that have image features. Masks are free to travel in any direction and scale themselves up or down depending on the detection requirements, allowing for more efficient object location. Since the order in which transformations occur is crucial, an agent will evaluate the positive region to determine what to do next. In order to keep the most of the item in the visible zone and as little of the backdrop as possible, it is essential that we emphasise that the agent must execute the transformation in this way. Sliding window approach is used by many researchers, which uses a fixed size window for scanning objects in one direction. Our proposed object detection, on the other hand, has no fixed path, and the mask can choose any action to move in any direction. Candidate regions are proposed for an item based on a limited amount of information; nevertheless, unlike other region proposal algorithms, we outline a systematic reasoning process for our recommendations. We also present a dynamic action algorithm that, in contrast to regression, takes into account the context of the current region and adjusts the bounding box accordingly so that future actions can be more precisely directed at the intended target.
An enhanced and effective model, the object detection deep reinforcement learning - network (ODDRL-Net) predicts rapid object identification (see figure). Model employs the same three convolution layers—convol1, convol2, and convol3—as models that rely on features for object recognition, such as VGG-M net etc. When Sang doo Yun first began developing their model, they employed VGG-M net as a pretrained model as the basis for the initialization of their architecture. When figuring out output, we go on to the next fully linked FC4 layer, which combines dropout Relu with 512 nodes. This layer's output is combined with action dynamics totaling dynamics. In the preceding section, we presented a method for dealing with the scenario, and that method is implemented in the final FC5 layer, which has a Softmax classification score with action probabilities that decides what action must be done if the agent repeats an action and returns to the prior state in which oscillation occurred. While moving from one state to another, the agent updates its state and chooses its next action in a predetermined order. The steps taken by the suggested architecture are laid out in table 1, and the proposed model requires less processing power than the existing architecture. To achieve dynamic model implementation, the proposed architecture adapts various parameters and the nature of the system while using the specialised idea and region proposal algorithm described in the literature. The suggested model was trained and tested using a specialised version of the Pascal Voc Dataset. The PASCAL Visual Object Classes (VOC) challenge dataset is commonly used as a standard for object detection benchmarks. The PASCAL VOC dataset was used as the primary training set. From 2005 to 2012, there was an annual PASCAL VOC competition. Twenty object classes (including human, six types of animals, six categories of household items, and seven types of transportation, including cars), annotated on 9,963 photos, were included in the dataset when it was first released in 2007. The last time the test data's ground truth was available was in 2007's challenge. By the end of the year, there were 54,900 photos in the collection, with around half of those serving as test and validation data. Both the number and variety of jobs varied from year to year, but for the sake of this thesis, we focused solely on information from the annual object detection competition.
4 Claims & 2 Figures , Claims:The scope of the invention is defined by the following claims:

Claim:
A System/Method for Object identification by using deep Reinforcement Machine learning comprising the steps of:
a) Designed a method to extract features of image for identifying object
b) Designed a method to find object location and Training the reinforcement learning detector
c) Designed a method for Object identification and calculation of time required for detection.
2. The System/Method for Object identification by using deep Reinforcement Machine learning as claimed in claim1, led to the design of a faster RCNN to extract features of image.
3. The System/Method for Object identification by using deep Reinforcement Machine learning as claimed in claim1, a novel feature extraction method for extracting global features and obtaining local features from the region of interest a hybrid the traditional classifiers is used.
4. The System/Method for Object identification by using deep Reinforcement Machine learning as claimed in claim1, a dynamic model named object detection deep reinforcement learning ODDRL-Net is designed.

Documents

Application Documents

#	Name	Date
1	202341065918-REQUEST FOR EARLY PUBLICATION(FORM-9) [30-09-2023(online)].pdf	2023-09-30
2	202341065918-FORM-9 [30-09-2023(online)].pdf	2023-09-30
3	202341065918-FORM FOR STARTUP [30-09-2023(online)].pdf	2023-09-30
4	202341065918-FORM FOR SMALL ENTITY(FORM-28) [30-09-2023(online)].pdf	2023-09-30
5	202341065918-FORM 1 [30-09-2023(online)].pdf	2023-09-30
6	202341065918-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [30-09-2023(online)].pdf	2023-09-30
7	202341065918-EVIDENCE FOR REGISTRATION UNDER SSI [30-09-2023(online)].pdf	2023-09-30
8	202341065918-EDUCATIONAL INSTITUTION(S) [30-09-2023(online)].pdf	2023-09-30
9	202341065918-DRAWINGS [30-09-2023(online)].pdf	2023-09-30
10	202341065918-COMPLETE SPECIFICATION [30-09-2023(online)].pdf	2023-09-30