Sign In to Follow Application
View All Documents & Correspondence

Automate Warehouse Inventory System Based On Unmanned Aircraft And Artificial Intelligence Technology

Abstract: The present invention provides an automate warehouse inventory system (100) based on unmanned aircraft and artificial intelligence technology comprise of, an input module (1) for receiving a set of input images related to inventory from a camera placed in an unmanned aircraft (3), a digital processing module (2) connected with the input module (1) for performing a set of operations on the input images to detect one or more inventory products via one or more artificial intelligence techniques, an output module linked with the digital processing module (2) for displaying the detected one or more inventory products to an operator of the system (100).

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
09 February 2023
Publication Number
07/2023
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
bd@ipquad.com
Parent Application

Applicants

PRAGYA
GL BAJAJ INSTITUTE OF TECHNOLOGY AND MANAGEMENT, GREATER NOIDA, 332001
ARADHNA SAINI
GL BAJAJ INSTITUTE OF TECHNOLOGY AND MANAGEMENT, GREATER NOIDA, 332001
KIRTI
GL BAJAJ INSTITUTE OF TECHNOLOGY AND MANAGEMENT, GREATER NOIDA, 332001
RIFA NIZAM KHAN
JAMIA MILLIA ISLAMIA UNIVERSITY, DELHI- 110025
HIMANI SHARMA
NOIDA INSTITUTE OF ENGINEERING & TECHNOLOGY, GREATER NOIDA, 332001
GARIMA DHAWAN
NOIDA INSTITUTE OF ENGINEERING & TECHNOLOGY, GREATER NOIDA, 332001
ANUSHKA SHIVHARE
NOIDA INSTITUTE OF ENGINEERING & TECHNOLOGY, GREATER NOIDA, 332001
NIDHI GUPTA
GL BAJAJ INSTITUTE OF TECHNOLOGY AND MANAGEMENT, GREATER NOIDA, 332001

Inventors

1. PRAGYA
GL BAJAJ INSTITUTE OF TECHNOLOGY AND MANAGEMENT, GREATER NOIDA, 332001
2. ARADHNA SAINI
GL BAJAJ INSTITUTE OF TECHNOLOGY AND MANAGEMENT, GREATER NOIDA, 332001
3. KIRTI
GL BAJAJ INSTITUTE OF TECHNOLOGY AND MANAGEMENT, GREATER NOIDA, 332001
4. RIFA NIZAM KHAN
JAMIA MILLIA ISLAMIA UNIVERSITY, DELHI- 110025
5. HIMANI SHARMA
NOIDA INSTITUTE OF ENGINEERING & TECHNOLOGY, GREATER NOIDA, 332001
6. GARIMA DHAWAN
NOIDA INSTITUTE OF ENGINEERING & TECHNOLOGY, GREATER NOIDA, 332001
7. ANUSHKA SHIVHARE
NOIDA INSTITUTE OF ENGINEERING & TECHNOLOGY, GREATER NOIDA, 332001
8. NIDHI GUPTA
GL BAJAJ INSTITUTE OF TECHNOLOGY AND MANAGEMENT, GREATER NOIDA, 332001

Specification

FIELD OF THE INVENTION
[0001] The present invention relates to an inventory management system. More particularly, the present invention is directed towards an automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology which performs pre-processing steps and image augmentation methods to increase the performance of the system.
BACKGROUND OF THE INVENTION
[0002] Logistics plays a crucial role in today’s global economy - every company depends on reliable and intact logistics processes to create value. At the same time, logistics is subject to very high margin pressure. On the one hand, improved service is expected from logistics service providers, but on the other hand, customers do not want to pay extra for it. If the margin is to be maintained or even increased, this can only be achieved through savings in internal processes.
[0003] For many companies, digital transformation and the use of so-called smart technologies can be the solution for optimizing processes and procedures since many logistics processes still involve a lot of manual effort, including inventory, which is essential for every company. In practice and science, this is often referred to as Industry 4.0. Four main design principles apply to use cases in this area: Networking, information transparency, technical assistance and decentralized decisions. The approach pursued in this work can also be classified in these principles. Through the combined use of camera drones and methods for processing the data, such as AI, human abilities can be imitated. The processing of visual signals. The result is that previously manual activities, such as the inventory process in a warehouse, could be automated.
[0004] The goal of the inventory is to record the type and number of internal goods and to quantify this inventory with an exact value. The actual process of inventory can differ not only in company-specific factors but also in the fact that two types of inventories are common. Thus, a distinction can be made between physical inventory, i.e., counting physical goods, and the non-physical inventory, where, for example, financial goods or bank balances are recorded.
[0005] With the help of an inventory, deficits, faulty processes, non-optimal flows, or even theft can be detected. During an inventory, there is always a large amount of personnel effort involved. Depending on the inventory’s size and scope, several people are often exclusively occupied with the manual counting of the goods. Inventory not only causes personnel costs, but also disrupts operational processes. Therefore, companies find themselves in a dichotomy between transparency and costs, which is why they often work with samples whose findings can then be extrapolated to the entire inventory. This business conflict can be resolved by automating the manual localization and identification of products and goods during inventory, resulting in greater transparency at lower costs.
[0006] Existing approaches are based on the use of sensor technology and often only consider a very specific sub-area (e.g. industries). In this paper, a generic approach is followed, which exclusively involves the visual representation of the products and is based on deep learning methods. To also automate the acquisition of the images and thus to evaluate a vehicle for the operationalization of the approach, a drone is also used, which has already proven itself in comparable applications.
[0007] Often RFID readers are attached to a drone, and the drone flies over storage locations and tracks the individual goods. Drones can detect storage locations that would otherwise be difficult to reach (e.g., very high storage locations in a high rack) and minimize the risk for the employees. In summary, the identification of loads using RFID, especially in combination with a drone, has proven to save costs and automate inventory processes. However, this always presupposes that the loads are equipped with RFID tags, representing a further cost factor in logistics
[0008] There is a necessary consensus that image processing in logistics can be a great advantage for the traceability and monitoring of goods. If camera-drones are used in inventory management, they are often used as a medium for reading optical product annotations, such as one-dimensional barcodes or QR Codes.
[0009] However, these approaches assume that those annotations exist. Considering packed products, barcodes are often available and do not pose a problem. But with unpackaged goods or empties, optical annotations are rarely present.
[0010] Here is a need for solutions that focus on the optical representation of goods. Although both AI-based image processing and the use of drones are considered to have great potential in logistics, there are hardly any publications available that combine these two approaches. Frei Stetter and Hummel (2019) outlined an approach to drone-based inventory in libraries. They flew off bookshelves and identified book spines using computer vision techniques. As soon as a book is in the centre of the image, the title of the book is read. This is a particular indoor use case. Despite the fundamental similarity, a use case from a library cannot necessarily be transferred to larger industrial warehouses.
[0011] Especially concerning disruptive factors (e.g., environmental influences, such as changing weather, which can lead to different lighting), recordings in libraries are less affected. Dörr et al. show an approach that deals with a similar use case in the warehouse area. The goal of the approach is product structure recognition based on image data. Different convolutional neural networks are used based on top of each other. For the training of the models a separate dataset was built. The very recent publications show that the combination of drone-based inventory and AI image processing is currently subject of research.
[0012] Due to above mentioned drawbacks, there is a need to provide an automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology which uses some advanced techniques to increase the performance of the system.
OBJECTS OF THE INVENTION
[0013] The primary object of the present invention is to overcome the shortcomings of the prior art.
[0014] The present invention aims to provide an automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology.
[0015] The present invention aims to provide an automate warehouse inventory system which performs pre-processing steps and image augmentation methods to increase the performance of the system.
[0016] The present invention aims to provide an automate warehouse inventory system to identify the location of pallets on recorded images.
[0017] The present invention aims to provide an automate warehouse inventory system that uses horizontal flipping and image scaling as an image augmentation technique to enhance the performance of the system.
SUMMARY OF THE INVENTION
[0018] The present invention relates to an automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology which performs pre-processing steps and image augmentation methods to increase the performance of the system to identify the location of pallets on recorded images.
[0019] According to an embodiment, the present invention relates to an automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology, comprising of, an input module for receiving a set of input images related to inventory from a camera placed in an unmanned aircraft, a digital processing module connected with the input module for performing a set of operations on the input images to detect one or more inventory products via one or more artificial intelligence techniques, an output module linked with the digital processing module for displaying the detected one or more inventory products to an operator of the system.
[0020] According to another embodiment, the present invention relates to an automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology, in which the digital processing module is configured to: apply an annotation technique on the set of input images and the annotated images are exported and converted into a dataset format, divide the annotated images into a test data set and a training data set, training the system using the training data set to produce an output of initial results which are adjusted to improve performance of the system, merge a set of classes of the results to increase quantity and quality of the detection of one or more inventory products, reduce an area from the initial results upon merging the set of classes, perform an image augmentation on the results upon reducing the area to detect one or more inventory products from the set of input images precisely.
[0021] According to another embodiment, the present invention includes an additional feature in which the output produced by the system is interfaced into the WMS via RF prompts, staging tables, XML, socket (TCP/IP), file transfer protocol (FTP), or flat file.
[0022] According to another embodiment, the present invention includes another feature in which the camera placed in an unmanned aircraft records a video stream which is stored or broadcasted on a server which is further used for auditing the products as well as training the system.
[0023] While the invention has been discussed and shown with particular emphasis on the preferred form, it should be obvious that other variants are feasible and would come within the scope of the present invention.
BRIEF DESCRIPTION OF THE DRAWING
[0024] These and other features, aspects, and advantages of the present invention will become better understood through the following description, appended claims, and accompanying drawings where:
[0025] Figure 1 is a block diagram of the automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology in accordance with an embodiment of the present invention.
[0026] Figure 2 is a pictorial view of exemplary sample from the final dataset in accordance with an embodiment of the present invention.
[0027] Figure 3 is a pictorial view of reduction of the image contents in accordance with an embodiment of the present invention.
[0028] Figure 4 is a pictorial view of the effect of augmentations in accordance with an embodiment of the present invention.
[0029] Figure 5 is a pictorial view of the quality of the predictions in accordance with an embodiment of the present invention.
[0030] Figure 6 is a pictorial view of the visualization of predictions with errors in accordance with an embodiment of the present invention.
DESCRIPTION OF THE INVENTION
[0031] The following description includes the preferred best mode of one embodiment of the present invention. It will be clear from this description of the invention that the invention is not limited to these illustrated embodiments but that the invention also includes a variety of modifications and embodiments thereto. Therefore, the present description should be seen as illustrative and not limiting. While the invention is susceptible to various modifications and alternative constructions, it should be understood, that there is no intention to limit the invention to the specific form disclosed, but, on the contrary, the invention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention as defined in the claims.
[0032] In any embodiment described herein, the open-ended terms "comprising," "comprises,” and the like (which are synonymous with "including," "having” and "characterized by") may be replaced by the respective partially closed phrases "consisting essentially of," consists essentially of," and the like or the respective closed phrases "consisting of," "consists of, the like.
[0033] As used herein, the singular forms “a,” “an,” and “the” designate both the singular and the plural, unless expressly stated to designate the singular only.
[0034] The present invention is directed towards an automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology which performs pre-processing steps and image augmentation methods to increase the performance of the system.
[0035] According to an embodiment, the present invention relates to an automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology, comprising of, an input module for receiving a set of input images related to inventory from a camera placed in an unmanned aircraft, a digital processing module connected with the input module for performing a set of operations on the input images to detect one or more inventory products via one or more artificial intelligence techniques, an output module linked with the digital processing module for displaying the detected one or more inventory products to an operator of the system.
[0001] According to another embodiment, the present invention relates to an automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology, in which the digital processing module is configured to: apply an annotation technique on the set of input images and the annotated images are exported and converted into a dataset format, divide the annotated images into a test data set and a training data set, training the system using the training data set to produce an output of initial results which are adjusted to improve performance of the system, merge a set of classes of the results to increase quantity and quality of the detection of one or more inventory products, reduce an area from the initial results upon merging the set of classes, perform an image augmentation on the results upon reducing the area to detect one or more inventory products from the set of input images precisely.
[0002] According to another embodiment, the present invention includes an additional feature in which the output produced by the system is interfaced into the WMS via RF prompts, staging tables, XML, socket (TCP/IP), file transfer protocol (FTP), or flat file.
[0003] According to another embodiment, the present invention includes another feature in which the camera placed in an unmanned aircraft records a video stream which is stored or broadcasted on a server which is further used for auditing the products as well as training the system.
[0036] Referring to Figure 1, a block diagram of the automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology is depicted. The system (100) comprise of, an input module (1) for receiving a set of input images related to inventory from a camera placed in an unmanned aircraft (3), a digital processing module (2) connected with the input module (1) for performing a set of operations on the input images to detect one or more inventory products via one or more artificial intelligence techniques, an output module linked with the digital processing module (2) for displaying the detected one or more inventory products to an operator of the system (100).
[0037] The digital processing module (2) is configured to: apply an annotation technique on the set of input images and the annotated images are exported and converted into a dataset format, divide the annotated images into a test data set and a training data set, training the system (100) using the training data set to produce an output of initial results which are adjusted to improve performance of the system (100), merge a set of classes of the results to increase quantity and quality of the detection of one or more inventory products, reduce an area from the initial results upon merging the set of classes, perform an image augmentation on the results upon reducing the area to detect one or more inventory products from the set of input images precisely.
[0038] The present invention includes use case which are considered here focuses on the automated recognition of beverage pallets on images. Since there is no public dataset for this specific problem, a custom dataset is used. Exemplary sample from the final dataset is depicted in Figure 2.
[0039] A drone of the model DJI Phantom pro 4 v22 was used to make the video recordings in the warehouses of the practice partner. The drone was selected because it was already available for the research project, so there was no need to purchase a new drone. In addition, the characteristics of this drone are very common, which is why the findings are equally transferable to other drone models. The video recordings were made in Full-HD resolution, a frame rate of 30 frames per second and using the built-in image stabilization.
[0040] To create the greatest possible variance in the data, the recordings were made over four days and at two locations of the beverage dealer. The aim was to record the weather’s influence on the images (e.g., brightness) by recording the images under different conditions. During the project, two days with sunny weather and one day each with cloudy and very cloudy weather were used for the recording. Special attention was also paid to the recorded scenes. It was tired to capture all pallet locations of the outdoor warehouse. It could not be avoided that not all types of pallets (e.g., different manufacturers of beverages) are represented equally often in the data because the different pallets’ stock varies very much in reality.
[0041] After finishing the video recordings, the video data had to be processed. For this purpose, the filmed sequences were viewed manually, and irrelevant scenes were removed (e.g., the drone’s starting and landing sequences). Since there are only marginal differences in subsequent frames at 30 frames per second, only every 30th frame of the video clips was transferred to the final data set when the training data set was created to avoid potential overfitting when training the neural network.
[0042] After the frames’ automated extraction, they were manually sighted, and faulty or low-quality images were removed. The final dataset consists of 336 images, separated into train and test set. The test set only contains images from pallet stacks that are not included in the train set to avoid memorization. This includes pallets with beverages of previously unseen brands and breweries.
[0043] After image acquisition, annotation has been applied using the tool label studio from Heartex3 . This tool was used because it offers many possibilities for annotating images and was already used in another context within the research project. In this process, each pallet that was largely visible was annotated using polygons instead of only bounding boxes to make use of the annotation masks later on. Additionally, each polygon was given a class to differentiate between two types of pallets, pallets containing cases of beer and pallets containing other beverages. The resulting annotations were exported and converted to the COCO dataset format as it is one of the standard formats for object detection and segmentation in images that are widely supported by most frameworks. This results in a training set containing 284 images with 5261 annotated polygons and a test set containing 52 images and 1471 polygons.
[0044] The experiment was conducted as follows. First a baseline model was used to test the impact of various modifications of the input data on the prediction accuracy. Then, several models using different architectures, selected based on defined criteria, were trained to evaluate their performance applying the identified modifications using the baseline model. Finally, the best performing model was used to perform a qualitative evaluation and to identify possible errors.
[0045] The model used during the following experiments is a Mask R-CNN using a ResNet50 Backbone, implemented using Detectron2. The model was pre-trained using the MSCOCO17 dataset to try to make up for the low number of images in the used dataset, described in 3. It was then trained over up to 4000 iterations, where each iteration used a batch of twelve images. Evaluations of the model were performed every 250 iterations during training, and once the training was completed. During training and testing, the images were resized to 1000x750 and not further modified. To validate each experiment’s results, it was repeated multiple times; the following metrics are averages over all runs. For all experiments the same train-test-split was used.
[0046] The initial baseline model achieved an average precision of 27.06 and 27.59 evaluating bounding boxes and segmentation respectively, using the test portion of the dataset. As these results are far from sufficient to predict the pallets’ position on images accurately, significant adjustments were necessary to improve the performance of the model.
[0047] During the first tests, it became clear that the model had difficulties in classifying the pallets given the annotated classes (beer and other beverages). Not only was the per-class-precision much higher on the pallets marked as containing cases of beer (33.647 compared to 20.463, evaluating bounding boxes), some pallets were often wrongly classified. While this problem’s origin likely lies within the training data that contained more annotations and variants of pallets of beer cases, it is challenging to balance the classes to reduce this spread since nearly all images contain pallets of both categories.
[0048] The classes have been merged to create a model to predict only the bounding box and segmentation mask of a pallet, not further classifying its contents to circumvent this problem. If applied like this, the classification task must be processed by a different system, possibly using classifier or brand detectors. This work has not yet further pursued the creation of such a solution. A model trained on a classless dataset achieves an average precision of 45.95 and 46.70 on bounding boxes and masks, as noted in Table 1. A following manual inspection of the predictions also confirmed the increase of the quantity and quality of the predictions.
Table 1
Original Class Merging Left Right-Cut Merging Cut
No augmentation 27.06/27.59 45.95/46.70 47.68/48.93 70.39/72.19
(a) FlipLR 30.03/31.13 46.58/47.48 53.58/55.44 73.94/76.01
(b) ScaleXY125 45.01/46.27 67.57/69.28 54.62/56.45 78.97/81.06
ScaleXY150 49.83/51.00 73.35/75.54 56.01/57.49 78.86/80.93
(c) Rotate10 25.48/26.23 41.89/42.81 52.87/53.46 77.07/77.63
Rotate20 25.31/26.06 42.96/43.95 50.11/51.35 76.08/76.71
Rotate30 24.95/25.83 43.56/44.99 47.49/48.43 75.57/76.04
(d) Contrast 26.35/26.89 40.22/40.71 48.97/50.05 68.69/70.15
(e) JpegCompression 26.12/27.06 42.21/43.29 49.35/50.65 67.83/69.35
(f) MotionBlur 25.27/26.30 43.61/44.76 44.89/46.30 71.19/73.01
(g) CropPad25 47.22/48.51 70.81/72.89 55.63/57.39 78.68/80.58
CropPad50 46.80/47.91 68.97/70.92 52.47/54.07 75.50/77.28
(h)PerspectiveTransform 31.54/32.10 53.13/53.99 54.10/55.42 78.11/79.30
(i) FlipLR-ScaleXY150 51.33/52.71 73.84/75.78 57.09/58.85 81.53/83.41
(j) CropPad25-ScaleXY150 47.08/48.30 71.25/73.24 57.99/59.28 78.13/79.67
(k) Rotate20-ScaleXY150 46.71/47.76 70.92/72.57 56.12/56.50 79.51/79.93
[0049] Another problem of the model was the detection of small pallets on the edges of the images. It is likely caused by the low number of small objects in the training data and distortion of the camera lens on the edges of the image. This problem can be addressed by cutting off parts on the left and right edges of the recordings. This should not harm the quality of the inventory process as the drone flight is planned to fly so that each row is at least once near the centre of the recorded images and therefore not lost in this process. An example of such reduction of the image contents is visible in Figure 3 where 50% of the image have been removed, in equal parts on each side. During the training and evaluation process, the input images were resized to 750x750 instead of 1000x750, to better match the aspect ratio of the modified images. The resulting model achieves values far better than before the removal. This is expected as the problem is simplified significantly. The model achieves a mean Average Precision (map) of 47.68 and 46.70 evaluating predicted bounding boxes and segmentation, respectively.
[0050] Further, different image augmentation methods were tested and evaluated based on the performance of the models trained using augmented images. The augmentation of the original images was applied using imgaug during the loading of the image batch of each iteration. Each image of the batch was augmented individually with random parameters in given boundaries. As imgaug offers a wide variety of methods to augment images, some were selected to evaluate its performance on the given problem. In this case augmentations were chosen to simulate variations of real recordings, such as Motion Blur, Perspective Transformation, Contrast and JPEG Compression, to accommodate for movement of the drone, varying lighting and weather conditions, while also considering methods used traditionally to adapt the image, Rotation, ScaleXY, FlipLR and CropAndPad. The following methods were chosen for comparison.
(a) FlipLR - Performs a horizontal flip with a probability of 0.5.
(b) ScaleXY{150,125} - Scales the width and the height of the images with using random alues from [0.5, 1.5] for ScaleXY150 and [0.75, 1.25] for ScaleXY125.
(c) Rotate{10,20,30} - Rotates the image around its center using random degrees up to 10 for Rotate10, 20 for Rotate20 and 30 for Rotate30. Rotations can applied in both directions.
(d) Contrast - Increases or decreases the contrast of the image using random values from [0.5, 1.4].
(e) JpegCompression - Reduce the quality of the image by applying Jpeg compression using a random degree from [0.7, 0.95].
(f) MotionBlur - Creates a motion blur effect with a kernel of size 7x7.
(g) CropPad{25,50} - Remove random percentage of images from all edges and pad the image to its original size. Using random values from [-0.25, 0.25] for CropAndPad25 and from [-0.5, 0.5] for CropAndPad50.
(h) PerspectiveTransform - Transforms the image, as if the camera had a different perspective using random scales from [0.01, 0.1].
(i) FlipLR-ScaleXY150 - Combines FlipLR and ScaleXY150.
(j) CropPad25-ScaleXY150 - Combines CropAndPad25 and ScaleXY150.
(k) Rotate20-ScaleXY150 - Combines Rotate20 and ScaleXY150
[0051] An overview of the effect of these augmentations is displayed in Figure 4. Some of the tested augmentations were omitted in the figure as their effects are barely visible due to the size of the individual images or are variations or combinations of already displayed effects. It must be noted that after the application of one or multiple augmentations the bounding boxes were recomputed according to the transformed mask, to make them a minimal fit to the object again. Otherwise, some augmentations, such as Rotation, could create a bounding box according to the transformed bounding box, which could be too large to accurately locate the object. In some situations, this results in a small, but measurable, boost of performance of models using affected augmentations.
[0052] To evaluate the different possible modifications, we trained several models using the same settings and evaluated them on the shared test set. For the evaluation of the different methods, for simulation, compared both the map of the predicted bounding boxes and the map of the segmentations, even though they are very similar, displayed in Table 1.
[0053] The best performing model without the use of image augmentation was the model trained using the simplest variation of the images, utilizing the cutting of edges and merging of different classes. It achieved a precision of more than 70 and therefore performs better than most other models, including many models trained using image augmentation. Once image augmentation (Table 1) is considered, the model trained on augmented images is outperformed by several different models. Figure 6 is a pictorial view of the visualization of predictions with errors in accordance with an embodiment of the present invention.
[0054] While nearly all models using merged classes and cut images perform better, only a few models using a different image base produce comparable or better result. Especially interesting are the effects of specific augmentation methods.
[0055] While the Rotation augmentation decreased the accuracy of models using images with uncut edges, it increased the precision on images scaled to a 1:1 aspect ratio. Some methods seem almost always to reduce the prediction quality, such as Contrast (d), JpegCompression (e), and MotionBlur (f). While the idea behind using these methods was to make images slightly more corrupt to increase the ability to learn from realistic variations of these images, it mostly hurt performance. Other augmentation methods used, such as ScaleXY (b), FlipLR (a), and CropPad (g), seem to always improve the results of the trained model when used alone or in combination with other methods, contrary to the observation by Dörr et al.. This is supported by the fact that the almost always best-performing method used the combination of ScaleXY (b) and FlipLR (a).
[0056] While the results provided by the different Mask R-CNN models certainly provide valuable information, the model itself is no longer state-of-the-art in terms of precision. Therefore, three different models were selected and tested their performance using the results gained using the previous model. The first model we additionally tested is DetectoRS, whose innovative characteristic is the use of Recursive Feature Pyramids. It achieves near state-of-the-art performance on the MSCOCO17 dataset and was implemented and trained using MMDetecion Yolact is an architecture that is able to generate predictions of the recordings in near real time and was trained using the MMDetection framework aswell. While real time predictions are not necessary during the stocktaking process, Yolact was selected since these models could easily be used to serve different purposes within the same domain. The third and last model evaluated is DETR due to it’s innovative approach. DETR utilizes the transformer architecture introduced in the domain of NLP to generate instance segmentations. It was chosen to evaluate whether or not new and innovative approaches can be applied to the domain of palettes, and trained using the code.
[0057] Each of the additional models was tested and evaluated on the test dataset, using the merged and cut variant and the augmentation method (i) that showed to increase performance the most. The results are displayed in Table 2. It is clear that DetectoRS outperforms all other models by a significant margin in terms of precision. It achieves a mAP of 88.3 and 85.6 while taking 0.13 s per frame using a NVIDIA® GeForce® RTX 2080 Ti, making it also slower than all other models. In contrast, Yolact achieves the lowest map and, contrary to expectations, is not the fastest model, but is 0.01 s slower than Mask R-CNN. However, Yolact is by far the fastest model when using a CPU. DETR, with its new approach, is both slower and less precise than Mask R-CNN and does not stand out in any metric. In addition to evaluation based on metrics, timings, and model size, a manual qualitative evaluation was performed. This showed that the map value was consistent with the visual impression. In terms of both bounding boxes and segmentation, DetectoRS provides the best results. Mask R-CNN also delivers satisfactory results, while the quality of DETR and Yolact in particular falls off sharply. To get an impression of the quality of the predictions of DetectoRS, some of its predictions are visualized in Figure 5.
Table 2
Model Map (bbox/segm) Inference time (GPU) Inference time (CPU) Parameters
Mask R-CNN 81.5/83.4 0.04 s 1,72 s 43,937,313
DetectoRS 88.3/85.6 0.13 s Not supported 131,648,615
Yolact 50.3/60.6 0.05 s 0.59 s 34,727,123
DETR 74.8/81.8 0.12 s 2.10 s 42,835,552
[0058] The input module (1) used herein refers to an input means like camera, scanner, phone, tablet which receives the input images from the camera which is installed in the unmanned aircraft (3). The input module (1) also includes but not limited to keyboards, mouse, scanners, cameras, joysticks, readers, watches and microphones and further is not limited to any type of input means.
[0059] The unmanned aircraft (3) used herein refers to a drone unit and includes all types of drones like multi-rotor drones, fixed-wing drones, single-rotor drones, fixed-wing hybrid drone which may be equipped with communications electronics and further the mobility and versatility of drones may be used for business purposes.
[0060] The input images include real time video and the present invention as an improvement include a server which receives the real time video from the camera and stored that over a LAN/WAN.
[0061] The digital processing module (2) used herein refers to a hardware processing unit and includes a Programmable Logic Control unit (PLC), a microcontroller, a microprocessor, a computing device, a development board, and so forth and further include or otherwise cover any type of the processor including known, related art, and/or later developed technologies that may be capable of processing the received data. In a preferred embodiment of the present invention, the digital processing module (2) is an Arduino development board.
[0062] The present invention includes the database which is preferably a memory unit which is configured for storage and retrieval of input images and a pre-stored images and includes but not limited to, a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), a Programmable read-only memory (PROM), an Erasable Programmable read only memory (EPROM), an Electrically erasable programmable read only memory (EEPROM), a flash memory, and so forth and further to include or otherwise cover any type of the memory/database including known, related art, and/or later developed technologies.
[0063] The database of the present invention is not limited to above mentioned memory unit but also include a computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
[0064] A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
[0065] The components of the present invention are connected via a network which may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
[0066] A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
[0067] The output module (4) includes but not limited to LED, LCD display screen or any type of output module available in the prior art.
[0068] Therefore, the present invention provides an automate warehouse inventory system based on unmanned aircraft and artificial intelligence technology which performs pre-processing steps and image augmentation methods to increase the performance of the system.

I/ We Claim:

1. An automate warehouse inventory system (100) based on unmanned aircraft and artificial intelligence technology, comprising of:
i) an input module (1) for receiving a set of input images related to inventory from a camera placed in an unmanned aircraft (3).
ii) a digital processing module (2) connected with said input module (1) for performing a set of operations on said input images to detect one or more inventory products via one or more artificial intelligence techniques, wherein said digital processing module (2) is configured to:
a) apply an annotation technique on said set of input images and said annotated images are exported and converted into a dataset format;
b) divide said annotated images into a test data set and a training data set;
c) training said system (100) using said training data set to produce an output of initial results which are adjusted to improve performance of said system (100);
d) merge a set of classes of said results to increase quantity and quality of said detection of one or more inventory products;
e) reduce an area from said initial results upon merging said set of classes;
f) perform an image augmentation on said results upon reducing said area to detect one or more inventory products from said set of input images precisely;
iii) an output module (4) linked with said digital processing module (2) for displaying said detected one or more inventory products to an operator of said system (100).
2. The automate warehouse inventory system (100) based on unmanned aircraft and artificial intelligence technology as claimed in claim 1, wherein said unmanned aircraft (3) is preferably a drone.
3. The automate warehouse inventory system (100) based on unmanned aircraft and artificial intelligence technology as claimed in claim 1, wherein said input module (1) include input means like camera, scanner, phone, tablet.
4. The automate warehouse inventory system (100) based on unmanned aircraft and artificial intelligence technology as claimed in claim 1, wherein said digital processing module (2) is a hardware processing unit.
5. The automate warehouse inventory system (100) based on unmanned aircraft and artificial intelligence technology as claimed in claim 1, wherein said artificial intelligence techniques includes but not limited to merging of classes, reduction of image area, image augmentation.
6. The automate warehouse inventory system (100) based on unmanned aircraft and artificial intelligence technology as claimed in claim 1, wherein said output module (4) includes but not limited to LED, LCD display screen.
7. The automate warehouse inventory system (100) based on unmanned aircraft and artificial intelligence technology as claimed in claim 1, wherein said system detects said products in time ranging from 0.50 to 1.72 seconds.
8. The automate warehouse inventory system (100) based on unmanned aircraft and artificial intelligence technology as claimed in claim 1, wherein said system achieves efficiency ranging from 96 to 98%/.

Documents

Application Documents

# Name Date
1 202311008565-COMPLETE SPECIFICATION [09-02-2023(online)].pdf 2023-02-09
1 202311008565-STATEMENT OF UNDERTAKING (FORM 3) [09-02-2023(online)].pdf 2023-02-09
2 202311008565-DECLARATION OF INVENTORSHIP (FORM 5) [09-02-2023(online)].pdf 2023-02-09
2 202311008565-REQUEST FOR EARLY PUBLICATION(FORM-9) [09-02-2023(online)].pdf 2023-02-09
3 202311008565-DRAWINGS [09-02-2023(online)].pdf 2023-02-09
3 202311008565-PROOF OF RIGHT [09-02-2023(online)].pdf 2023-02-09
4 202311008565-FIGURE OF ABSTRACT [09-02-2023(online)].pdf 2023-02-09
4 202311008565-FORM 1 [09-02-2023(online)].pdf 2023-02-09
5 202311008565-FIGURE OF ABSTRACT [09-02-2023(online)].pdf 2023-02-09
5 202311008565-FORM 1 [09-02-2023(online)].pdf 2023-02-09
6 202311008565-DRAWINGS [09-02-2023(online)].pdf 2023-02-09
6 202311008565-PROOF OF RIGHT [09-02-2023(online)].pdf 2023-02-09
7 202311008565-DECLARATION OF INVENTORSHIP (FORM 5) [09-02-2023(online)].pdf 2023-02-09
7 202311008565-REQUEST FOR EARLY PUBLICATION(FORM-9) [09-02-2023(online)].pdf 2023-02-09
8 202311008565-COMPLETE SPECIFICATION [09-02-2023(online)].pdf 2023-02-09
8 202311008565-STATEMENT OF UNDERTAKING (FORM 3) [09-02-2023(online)].pdf 2023-02-09