Abstract: Provided a method for determining erroneous segmentation mask from an image using a deep learning model. The method includes (i) receiving the image and pre-processing the image, (ii) generating segmentation masks from the pre-processed image by comparing an intensity of a first pixel with the intensity of a second pixel and combining the first pixel and the second pixel when the intensity of the first pixel and the intensity of the second pixel are same, (iii) determining labels for the segmentation masks, (iv) generating one-hot encoded vector for the labels using a one-hot encoding method, (v) identifying, using a deep learning model, noisy label from the labels by evaluating the one-hot encoded vector of each label based on a performance measure, (vi) predicting the erroneous segmentation mask from the segmentation masks based on the identified noisy label using the deep learning model. FIGS. 5A&5B
DESC:BACKGROUND
Technical Field
[0001] The embodiments herein generally relate to datapoint importance and detection of mask annotation errors, and more specifically to a system and method for determining at least one erroneous segmentation mask from at least one image using a deep learning model.
Description of the Related Art
[0002] In general, for supervised training, machine learning (ML) models need to be provided with ground truth annotations. Large-labeled datasets are required for supervised machine learning models, especially in image analysis. For classification, this would be class labels, for segmentation, it would be masks (typically freehand shape annotations). Due to the manual process of collecting annotations, there could be various errors in the labels and masks due to inter-observer variability and erroneous predictions. This results in poor-quality data and poor training of machine learning (ML) models. To overcome this, various techniques have been proposed to identify noisy and error-prone labels for classification.
[0003] In computer vision, image segmentation is employed to differentiate between objects with the highest degree of accuracy. Generally, image segmentation is accomplished by using a 'connected pixel' algorithm to find an 'object matrix', in which the value of each pixel specifies which object that the pixel belongs to. Image segmentation creates a pixel-wise mask for each object in the image. Machine learning-based segmentation methods require annotated examples in the form of a segmentation mask. Existing segmentation methods yield non-trivial amounts of segmentation errors that may propagate to further analysis, yielding unreliable systemic noise that is difficult to quantify. Existing approaches employ Leave one out (LOO) cross-validation, influence functions (infinitesimal jackknife, etc.), and prototype methods. LOO cross-validation is prohibitively time-consuming, especially for deep learning and prototype methods are computationally intensive.
[0004] Therefore, there arises a need to address the aforementioned technical drawbacks in existing technologies for a system and method for detecting mask annotation errors for image segmentation.
SUMMARY
[0005] In view of the foregoing, an embodiment herein provides a system for determining at least one erroneous segmentation mask from at least one image using a deep learning model. The system includes an image capturing device that captures at least one image. The server receives the at least one image from the image capturing device. The server comprises a memory that stores a database and a set of modules, and a processor in communication with the memory. The processor retrieving executing machine-readable program instructions from the memory which, when executed by the processor, enable the processor to (i) pre-process the at least one image by (a) determining, using an encoder, a plurality of features of an object in the at least one image by capturing context information, (b) determining, using a decoder, a localization of the object based on the plurality of features, (ii) generate a plurality of segmentation masks from the at least one image that is pre-processed by comparing an intensity of a first pixel with an intensity of a second pixel of the at least one image and combining the first pixel and the second pixel when the intensity of the first pixel and the intensity of the second pixel are same, each segmentation mask comprises a specific portion of the image that is isolated from rest of the at least one image; (iii) determine a plurality of labels for the plurality of segmentation masks; (iv) generate one-hot encoded vector for each label using a one-hot encoding method, the one-hot encoded vector is a binary representation of each label; (v) identify, using a deep learning model, at least one noisy label from the plurality of labels by evaluating the one-hot encoded vector of each label based on a performance measure, the at least one identified noisy label is a label that is identified as an incorrect label from the plurality of labels; and (vi) predict the at least one erroneous segmentation mask from the plurality of segmentation masks based on the at least one noisy label that is identified using the deep learning model.
[0006] In some embodiments, the deep learning model is trained by (i) providing historical data points of the historical images with historical labels that are similar to ground truth data points, (ii) calculating and assigning historical performance measures for each historical data point recursively, and (iii) correlating the historical data points with the historical performance measures.
[0007] In some embodiments, the processor is configured to determine a similarity measure by comparing the at least one noisy label that is identified and the historical labels, wherein a range of the similarity measure is 0 and 1 values, the similarity measure is provided for identifying the at least one noisy label accurately using the deep learning model.
[0008] In some embodiments, the processor is configured to identify the at least one noisy label for the one-hot encoded vector when the similarity measure is close to 1 value.
[0009] In some embodiments, the processor is configured to assign a binary value as 0 or 1 for each one-hot encoded vector of each label using the one-hot encoding method.
[0010] In some embodiments, the processor is configured to determine the plurality of features by reducing size of the at least one image and increasing a depth of the at least one image using the encoder.
[0011] In some embodiments, the processor is configured to determine an output of the transposed convolutions by increasing the size of the at least one image and decreasing the depth of the at least one image, the transposed convolutions are implemented to upsample the plurality of features for determining the output.
[0012] In some embodiments, the processor is configured to determine the localization of the object in the at least one image by concatenating the output of the transposed convolutions with the plurality of features using skip connections.
[0013] In some embodiments, the processor is configured to implement two consecutive regular convolutions after every concatenation to assemble the output which is precise.
[0014] In one aspect, there is provided a method for determining at least one erroneous segmentation mask from at least one image using a deep learning model. The method includes capturing, using an image capturing device at least one image. The method includes pre-processing the at least one image by (i) determining, using an encoder, a plurality of features of an object in the at least one image by capturing context information, (ii) determining, using a decoder, a localization of the object based on the plurality of features. The method includes generating a plurality of segmentation masks from the at least one image that is pre-processed by comparing an intensity of a first pixel with an intensity of a second pixel of the at least one image and combining the first pixel and the second pixel when the intensity of the first pixel and the intensity of the second pixel are same. Each segmentation mask comprises a specific portion of the image that is isolated from rest of the at least one image. The method includes determining a plurality of labels for the plurality of segmentation masks. The method includes generating one-hot encoded vector for each label using a one-hot encoding method, the one-hot encoded vector is a binary representation of each label. The method includes identifying, using the deep learning model, at least one noisy label from the plurality of labels by evaluating the one-hot encoded vector of each label based on a performance measure. The at least one identified noisy label is a label that is identified as an incorrect label from the plurality of labels in the at least one image. The method includes predicting the at least one erroneous segmentation mask from the plurality of segmentation masks based on the at least one noisy label that is identified using the deep learning model.
[0015] The system improves to achieve shorter and smarter iteration of data labelling in the detection of mask annotation errors. The system trains machine learning models in image segmentation in an effective manner.
[0016] These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:
[0018] FIG. 1 illustrates a system for determining at least one erroneous segmentation mask from at least one image using a deep learning model in according to some embodiments herein;
[0019] FIG. 2 is a block diagram of a server in according to some embodiments herein;
[0020] FIG. 3 is a block diagram of a deep learning model in according to some embodiments herein;
[0021] FIG. 4 illustrates a graphical representation of comparison of nuclei segmentation in hematoxylin-eosin stained histology images using an existing system and the system of FIG. 1 according to some embodiments herein;
[0022] FIGS. 5A&5B are flow diagrams that illustrate a method for determining at least one erroneous segmentation mask from at least one image using a deep learning model in according to some embodiments herein;
[0023] FIG. 6 illustrates an exploded view of the image capturing device of FIG.1 according to some embodiments herein; and
[0024] FIG. 7 is a schematic diagram of a computer architecture in accordance with the
embodiments herein.
DETAILED DESCRIPTION OF THE DRAWINGS
[0025] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted .so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
[0026] As mentioned, there remains a need for a system and method for determining at least one erroneous segmentation mask from at least one image using a deep learning model. The embodiments herein achieve this by proposing a system and method determining at least one erroneous segmentation mask from at least one image using a deep learning model and using data shapley through datapoint importance assessment. Referring now to the drawings, and more particularly to FIGS. 1 through 7, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments.
[0027] FIG. 1 illustrates a system for determining at least one erroneous segmentation mask from at least one image using a deep learning model in according to some embodiments herein. The system 100 includes an image capturing device 102 and a server 104. The server 104 may be a handheld device, a mobile phone, a kindle, a PDA (Personal Digital Assistant), a tablet, a music player, a computer, an onsite/remote server, an electronic notebook or a smartphone. The image capturing device 102 is communicatively connected to the server 104 through network 106. The network may be wired or wireless network. The image capturing device 102 may be at least one of a camera, IR camera, thermal camera, night vision camera, optical sensor, mobile phone, Smartphone, or any kind of imaging device.
[0028] The image data receiving module 110 receives an image captured by the image capturing device 102 through the network 106.
[0029] The server 104 receives the at least one image and pre-process the at least one image by determining a plurality of features of an object in the at least one image by capturing context information using an encoder. The encoder may implement regular convolutions. The context information of the at least one image may include objects, arrangement of objects, location of objects, relative physical size to other objects. The context information of the at least one image may also include time or location of image that is captured. The regular convolutions are implemented for transforming an image by a matrix used for blurring over each pixel and its local neighboring pixels.
[0030] The server 104 determines a precise localization of the object based on the plurality of features using a decoder. The decoder may implement transposed convolutions.
[0031] The transposed convolutions are implemented to upsample the plurality of features of the at least one image for determining the output. The upsampling of the plurality of features includes an increase of a spatial resolution while keeping the two-dimensional representation of the at least one image. In some embodiments, the processor is configured to determine the output of the transposed convolutions by increasing the size of the at least one image and decreasing the depth of the at least one image.
[0032] The server 104 generates a plurality of segmentation masks from the pre-processed image by comparing an intensity of a first pixel with the intensity of a second pixel and combining the first pixel and the second pixel when the intensity of the first pixel and the intensity of the second pixel are same. Each segmentation mask is a specific portion of the image that is isolated from the rest of the at least one image. The server 104 may implement a segmentation model to determine a pixel-wise segmentation mask for each object in the image and thereby, segmentation masks for one or more objects in the image are determined. The segmentation model may be tuned so that the segmentation results are good. The segmentation model may be Unet, backbone-Unet or multiscale Unet.
[0033] The server 104 determine a plurality of labels for the plurality of segmentation masks that are generated.
[0034] The server 104 generate one-hot encoded vector for the plurality of labels using one-hot encoding method. The one-hot encoded vector is a binary representation of each label. One hot encoding is done for the presence or absence of an object in a local region. The following table 1 depicts a plurality of labels generated in an image. For example, the following table 2 shows the binary representation of labels after implementing the one-hot encoding method.
Table 1
Animal
Cat
Dog
Duck
Table 2
Cat Dog Duck
1 0 0
0 1 0
0 0 1
[0035] The server 104 predict the at least one erroneous segmentation mask from the plurality of segmentation masks based on at least one identified noisy label using a deep learning model.
[0036] In some embodiments, the deep learning model is trained by (i) providing historical data points of the historical images with historical labels that are similar to ground truth data points, (ii) calculating and assigning historical performance measures for each historical data point recursively, and (iii) correlating the historical data points with the historical performance measures.
[0037] In some embodiments, the processor is configured to determine a similarity measure by comparing the at least one noisy label that is identified and the historical labels, wherein a range of the similarity measure is 0 and 1 values, the similarity measure is provided for identifying the at least one noisy label accurately using the deep learning model.
[0038] In some embodiments, the processor is configured to identify the at least one noisy label for the one-hot encoded vector when the similarity measure is close to 1 value.
[0039] In some embodiments, the processor is configured to assign a binary value as 0 or 1 for each one-hot encoded vector of each label using the one-hot encoding method.
[0040] In some embodiments, the processor is configured to determine the plurality of features by reducing a size of the at least one image and increasing a depth of the at least one image using the encoder.
[0041] In some embodiments, the processor is configured to determine an output of the transposed convolutions by increasing the size of the at least one image and decreasing the depth of the at least one image, the transposed convolutions are implemented to upsample the plurality of features for determining the output.
[0042] In some embodiments, the processor is configured to determine the localization of the object in the at least one image by concatenating the output of the transposed convolutions with the plurality of features using skip connections.
[0043] The skip connections may be part of neural networks that skip some of the neural network layers and feed the output of one layer as the input.
[0044] In some embodiments, the processor is configured to implement two consecutive regular convolutions after every concatenation to assemble the output which is precise.
[0045] In some embodiments, data shapely application is used for identifying noisy labels. The data shapley application determines which data point ground truth masks are erroneous/mislabeled/noisy, thereby, identifying bad quality data points in the context of segmentation Machine Learning models become easier. The data shapley application may be a k-nearest neighbors (KNN)-shapley, gradient shapley or kernel shapley. The data shapley application may be a framework for evaluating data with the context of a supervised learning algorithm, for example, KNN. In KNN, there are n data points that are trained on classification and regression techniques. The KNN-shapley application acts as a metric to evaluate each training data point with respect to the performance of the classification and regression techniques. The server 104 loops through various layers of the segmentation model and pick the layer which gives the best discrimination for noisy labels in the image segmentation masks. The looping process may be simulated for picking the best layer. The server 104 implements the best embedding with the data shapley to identify the erroneous segmentation mask. Each layer of the segmentation model is examined for its ability to achieve good performance in terms of data shapley to identify noisy labels. The layer which does the best is selected, and segmentation data Shapley is run to identify the erroneous segmentation mask. In some embodiments, the one-hot encoded vector includes a category where 1 indicates that the individual object is of a particular type and 0 indicates that the individual object is not of a particular type.
[0046] FIG. 2 is a block diagram of a server in according to some embodiments herein. The server 104 includes a database 108, an image receiving module 202, a pre-processing module 204, a segmentation masks generating module 206, a labels generating module 208, a one-hot encoded vectors generating module 210, a noisy label identifying module 212, an erroneous segmentation predicting module 214, and a deep learning model 216.
[0047] The image receiving module 202 receives the at least one image. The pre-processing module 204 pre-processes the at least one image by determining a plurality of features of an object in the at least one image by capturing context information using regular convolutions. The pre-processing module 204 determines a localization of the object based on the plurality of features using transposed convolutions using an encoder. The segmentation masks generating module 206 generates a plurality of segmentation masks from the pre-processed image by comparing an intensity of a first pixel with the intensity of a second pixel and combining the first pixel and the second pixel when the intensity of the first pixel and the intensity of the second pixel are same. The at least one segmentation mask is a specific portion of the image that is isolated from rest of the at least one image.
[0048] The labels generating module 208 determines a plurality of labels for the plurality of segmentation masks. The one-hot encoded vector generation module 210 generates one-hot encoded vector for the plurality of labels using a one-hot encoding method. The one-hot encoded vector is a binary representation of each label. The noisy label identifying module 212 identifies at least one noisy label from the plurality of labels by evaluating the one-hot encoded vector of each label based on a performance measure.
[0049] The erroneous segmentation predicting module 214 predicts the at least one erroneous segmentation mask from the plurality of segmentation masks based on at least one identified noisy label using a deep learning model. The deep learning model 216 is trained by (i) providing historical data points of the historical images with historical labels that are similar to ground truth data points, (ii) calculating and assigning historical performance measures for each historical data point recursively, and (iii) correlating the historical data points with the historical performance measures.
[0050] FIG. 3 is a block diagram of a deep learning model 216 in according to some embodiments herein. The deep learning model 216 is trained by (i) providing historical data points of the historical images with historical labels that are similar to ground truth data points, (ii) calculating and assigning historical performance measures for each historical data point recursively, and (iii) correlating the historical data points with the historical performance measures.
[0051] The historical performance measures are calculated recursively by the following equation,
[0052] Given a single validation point xval with the label yval, the KNN classifier determines the top-K training points (xa1 , · · · , xaK ) most similar to xval and assigns the probability of x0=val taking the label yval as P [xval ? yval] = 1 K ? K i=1 1 [yai = yval].
[0053] FIG. 4 illustrates a graphical representation of comparison of nuclei segmentation in hematoxylin-eosin stained histology images using an existing system and the system 100 of FIG. 1 according to some embodiments herein. The graphical representation depicts the percentage of data points inspected on X axis, and the percentage of noisy labels on Y axis. The graphical representation depicts nuclei segmentation in hematoxylin-eosin stained histology images using the system 100 at 402. The graphical representation depicts nuclei segmentation in hematoxylin-eosin stained histology images using the existing system at 404.
[0054] The label noise scenario is that not all cell nuclei within a region are annotated. To simulate such a scenario, randomly 40% of the cells are removed in 20% of the segmentation masks. The detection performance of the system 100 performs better than the existing system.
[0055] FIGS. 5A&5B are flow diagrams that illustrate a method for determining at least one erroneous segmentation mask from at least one image using a deep learning model in according to some embodiments herein. At step 502, the method includes capturing, using an image capturing device at least one image. At step 504, the method includes pre-processing the at least one image by (i) determining, using an encoder, a plurality of features of an object in the at least one image by capturing context information, (ii) determining, using a decoder, a localization of the object based on the plurality of features. At step 506, the method includes generating a plurality of segmentation masks from the at least one image that is pre-processed by comparing an intensity of a first pixel with an intensity of a second pixel of the at least one image and combining the first pixel and the second pixel when the intensity of the first pixel and the intensity of the second pixel are same. Each segmentation mask comprises a specific portion of the image that is isolated from rest of the at least one image. At step 508, the method includes determining a plurality of labels for the plurality of segmentation masks. At step 510, the method includes generating one-hot encoded vector for each label using a one-hot encoding method, the one-hot encoded vector is a binary representation of each label. At step 512, the method includes identifying, using the deep learning model, at least one noisy label from the plurality of labels by evaluating the one-hot encoded vector of each label based on a performance measure. The at least one identified noisy label is a label that is identified as an incorrect label from the plurality of labels in the at least one image. At step 514, the method includes predicting the at least one erroneous segmentation mask from the plurality of segmentation masks based on the at least one noisy label that is identified using the deep learning model.
[0056] FIG. 6 illustrates an exploded view of the image capturing device 102 of FIG.1 according to some embodiments herein. The image capturing device 102 having a memory 602 having a set of computer instructions, a bus 604, a display 606, a speaker 608, and a processor 610 capable of processing a set of instructions to perform any one or more of the methodologies herein, according to an embodiment herein. The processor 610 may also enable digital content to be consumed in the form of a video for output via one or more displays 606 or audio for output via speaker and/or earphones 608. The processor 610 may also carry out the methods described herein and in accordance with the embodiments herein.
[0057] The embodiments herein may include a computer program product configured to include a pre-configured set of instructions, which when performed, can result in actions as stated in conjunction with the methods described above. In an example, the pre-configured set of instructions can be stored on a tangible non-transitory computer readable medium or a program storage device. In an example, the tangible non-transitory computer readable medium can be configured to include the set of instructions, which when performed by a device, can cause the device to perform acts similar to the ones described here. Embodiments herein may also include tangible and/or non-transitory computer-readable storage media for carrying or having computer executable instructions or data structures stored thereon.
[0058] Generally, program modules utilized herein include routines, programs, components, data structures, objects, and the functions inherent in the design of special-purpose processors, etc. that perform particular tasks or implement particular abstract data types. Computer executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
[0059] The embodiments herein can include both hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc.
[0060] A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
[0061] Input/output (I/O) devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
[0062] A representative hardware environment for practicing the embodiments herein is depicted in FIG. 7, with reference to FIGS. 1 through 6. This schematic drawing illustrates a hardware configuration of a server 104/computer system/ image capturing device 102 in accordance with the embodiments herein. The image capturing device includes at least one processing device 10 and a cryptographic processor 11. The special-purpose CPU 10 and the cryptographic processor (CP) 11 may be interconnected via system bus 14 to various devices such as a random access memory (RAM) 15, read-only memory (ROM) 16, and an input/output (I/O) adapter 17. The I/O adapter 17 can connect to peripheral devices, such as disk units 12 and tape drives 13, or other program storage devices that are readable by the system. The image capturing device can read the inventive instructions on the program storage devices and follow these instructions to execute the methodology of the embodiments herein. The image capturing device further includes a user interface adapter 20 that connects a keyboard 18, mouse 19, speaker 25, microphone 23, and/or other user interface devices such as a touch screen device (not shown) to the bus 14 to gather user input. Additionally, a communication adapter 21 connects the bus 14 to a data processing network 26, and a display adapter 22 connects the bus 14 to a display device 24, which provides a graphical user interface (GUI) 30 of the output data in accordance with the embodiments herein, or which may be embodied as an output device such as a monitor, printer, or transmitter, for example. Further, a transceiver 27, a signal comparator 28, and a signal converter 29 may be connected with the bus 14 for processing, transmission, receipt, comparison, and conversion of electric or electronic signals.
[0024] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the appended claims.
,CLAIMS:I/We Claim:
1. A system (100) for determining at least one erroneous segmentation mask from at least one image using a deep learning model (108), wherein the system (100) comprising:
an image capturing device (102) that captures at least one image;
a server (104) that receives the at least one image from the image capturing device (102), wherein the server (102) comprises,
a memory that stores a database and a set of modules;
a processor in communication with the memory, the processor retrieving executing machine-readable program instructions from the memory which, when executed by the processor, enable the processor to:
pre-process the at least one image by (i) determining, using an encoder, a plurality of features of an object in the at least one image by capturing context information, (ii) determining, using a decoder, a localization of the object based on the plurality of features;
generate a plurality of segmentation masks from the at least one image that is pre-processed by comparing an intensity of a first pixel with an intensity of a second pixel of the at least one image and combining the first pixel and the second pixel when the intensity of the first pixel and the intensity of the second pixel are same, wherein each segmentation mask comprises a specific portion of the image that is isolated from rest of the at least one image;
determine a plurality of labels for the plurality of segmentation masks;
generate one-hot encoded vector for each label using a one-hot encoding method, wherein the one-hot encoded vector is a binary representation of each label;
characterized in that, identify, using a deep learning model, at least one noisy label from the plurality of labels by evaluating the one-hot encoded vector of each label based on a performance measure, wherein the at least one identified noisy label is a label that is identified as an incorrect label from the plurality of labels; and
predict the at least one erroneous segmentation mask from the plurality of segmentation masks based on the at least one noisy label that is identified using the deep learning model.
2. The system as claimed in claim 1, wherein the deep learning model is trained by
providing historical data points of the historical images with historical labels that are similar to ground truth data points;
calculating and assigning historical performance measures for each historical data point recursively; and
correlating the historical data points with the historical performance measures.
3. The system as claimed in claim 1, wherein the processor is configured to determine a similarity measure by comparing the at least one noisy label that is identified and the historical labels, wherein a range of the similarity measure is 0 and 1 values, wherein the similarity measure is provided for identifying the at least one noisy label accurately using the deep learning model.
4. The system as claimed in claim 3, wherein the processor is configured to identify the at least one noisy label accurately when the similarity measure is close to 1 value.
5. The system as claimed in claim 1, wherein the processor is configured to assign a binary value as 0 or 1 for each one-hot encoded vector of each label using the one-hot encoding method.
6. The system as claimed in claim 1, wherein the processor is configured to determine the plurality of features by reducing a size of the at least one image and increasing a depth of the at least one image using the encoder.
7. The system as claimed in claim 4, wherein the processor is configured to determine an output of transposed convolutions by increasing the size of the at least one image and decreasing the depth of the at least one image, wherein the transposed convolutions are implemented to upsample the plurality of features of the at least one image for determining the output.
8. The system as claimed in claim 7, wherein the processor is configured to determine the localization of the object in the at least one image by concatenating the output of the transposed convolutions with the plurality of features using skip connections.
9. The system as claimed in claim 1, wherein the processor is configured to implement two consecutive regular convolutions after every concatenation to assemble the output which is precise.
10. A method for determining at least one erroneous segmentation mask from at least one image using a deep learning model (108), comprising:
capturing, using an image capturing device (102) at least one image and communicating to a server (104);
pre-processing the at least one image by (i) determining, using an encoder, a plurality of features of an object in the at least one image by capturing context information, (ii) determining, using a decoder, a localization of the object based on the plurality of features;
generating a plurality of segmentation masks from the at least one image that is pre-processed by comparing an intensity of a first pixel with an intensity of a second pixel of the at least one image and combining the first pixel and the second pixel when the intensity of the first pixel and the intensity of the second pixel are same, wherein each segmentation mask comprises a specific portion of the image that is isolated from rest of the at least one image;
determining a plurality of labels for the plurality of segmentation masks;
generating one-hot encoded vector for each label using a one-hot encoding method, wherein the one-hot encoded vector is a binary representation of each label;
characterized in that, identifying, using a deep learning model, at least one noisy label from the plurality of labels by evaluating the one-hot encoded vector of each label based on a performance measure, wherein the at least one identified noisy label is a label that is identified as an incorrect label from the plurality of labels; and
predicting the at least one erroneous segmentation mask from the plurality of segmentation masks based on the at least one noisy label that is identified using the deep learning model.
Dated this March 02nd , 2023
Arjun Karthik Bala
(IN/PA 1021)
Agent for Applicant
| # | Name | Date |
|---|---|---|
| 1 | 202141039843-STATEMENT OF UNDERTAKING (FORM 3) [02-09-2021(online)].pdf | 2021-09-02 |
| 2 | 202141039843-PROVISIONAL SPECIFICATION [02-09-2021(online)].pdf | 2021-09-02 |
| 3 | 202141039843-PROOF OF RIGHT [02-09-2021(online)].pdf | 2021-09-02 |
| 4 | 202141039843-POWER OF AUTHORITY [02-09-2021(online)].pdf | 2021-09-02 |
| 5 | 202141039843-FORM FOR STARTUP [02-09-2021(online)].pdf | 2021-09-02 |
| 6 | 202141039843-FORM FOR SMALL ENTITY(FORM-28) [02-09-2021(online)].pdf | 2021-09-02 |
| 7 | 202141039843-FORM 1 [02-09-2021(online)].pdf | 2021-09-02 |
| 8 | 202141039843-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [02-09-2021(online)].pdf | 2021-09-02 |
| 9 | 202141039843-EVIDENCE FOR REGISTRATION UNDER SSI [02-09-2021(online)].pdf | 2021-09-02 |
| 10 | 202141039843-DRAWINGS [02-09-2021(online)].pdf | 2021-09-02 |
| 11 | 202141039843-PostDating-(02-09-2022)-(E-6-226-2022-CHE).pdf | 2022-09-02 |
| 12 | 202141039843-APPLICATIONFORPOSTDATING [02-09-2022(online)].pdf | 2022-09-02 |
| 13 | 202141039843-DRAWING [02-03-2023(online)].pdf | 2023-03-02 |
| 14 | 202141039843-CORRESPONDENCE-OTHERS [02-03-2023(online)].pdf | 2023-03-02 |
| 15 | 202141039843-COMPLETE SPECIFICATION [02-03-2023(online)].pdf | 2023-03-02 |
| 16 | 202141039843-STARTUP [06-12-2024(online)].pdf | 2024-12-06 |
| 17 | 202141039843-FORM28 [06-12-2024(online)].pdf | 2024-12-06 |
| 18 | 202141039843-FORM 18A [06-12-2024(online)].pdf | 2024-12-06 |
| 19 | 202141039843-FER.pdf | 2025-01-14 |
| 20 | 202141039843-OTHERS [14-07-2025(online)].pdf | 2025-07-14 |
| 21 | 202141039843-FER_SER_REPLY [14-07-2025(online)].pdf | 2025-07-14 |
| 22 | 202141039843-CORRESPONDENCE [14-07-2025(online)].pdf | 2025-07-14 |
| 23 | 202141039843-CLAIMS [14-07-2025(online)].pdf | 2025-07-14 |
| 24 | 202141039843-ABSTRACT [14-07-2025(online)].pdf | 2025-07-14 |
| 1 | 202141039843E_10-01-2025.pdf |