Abstract: IDENTIFICATION OF PLANT LEAF DISEASE PREDICTION USING MACHINE LEARNING ALGORITHMS The timely identification and early prevention of crop diseases are essential for improving production. In this paper, deep convolutional neural-network (CNN) models are implemented to identify and diagnose diseases in plants from their leaves, since CNNs have achieved impressive results in the field of machine vision. Standard CNN models require a large number of parameters and higher computation cost. In this paper, we replaced standard convolution with depth=separable convolution, which reduces the parameter number and computation cost. The implemented models were trained with an open dataset consisting of 14 different plant species, and 38 different categorical disease classes and healthy plant leaves. To evaluate the performance of the models, different parameters such as batch size, dropout, and different numbers of epochs were incorporated. The implemented models achieved a disease-classification accuracy rates of 98.42%, 99.11%, 97.02%, and 99.56% using InceptionV3, InceptionResNetV2, MobileNetV2, and EfficientNetB0, respectively, which were greater than that of traditional handcrafted-feature-based approaches. In comparison with other deep-learning models, the implemented model achieved better performance in terms of accuracy and it required less training time. Moreover, the MobileNetV2 architecture is compatible with mobile devices using the optimized parameter. The accuracy results in the identification of diseases showed that the deep CNN model is promising and can greatly impact the efficient identification of the diseases, and may have potential in the detection of diseases in real-time agricultural systems.
Claims:1. A system of Identification of Plant Leaf Disease Prediction Using Machine Learning Algorithms comprises four different DL models (InceptionV3, InceptionResnetV2, MobileNetV2, EfficientNetB0) for the detection of plant diseases using healthy- and diseased-leaf images of plants.
2. The system as claimed in claim 1, wherein to train and test the model, we used the standard Plant Village dataset with 53,407 images, which were all captured in laboratory conditions.
3. The system as claimed in claim 1, wherein this dataset consists of 38 different classes of different healthy- and diseased-leaf images of 14 different species; and After splitting the dataset into 80–20 (80% of whole data for training, 20% whole images for testing), we achieved the best accuracy rate of 99.56% in EfficientNetB0 model.
4. The system as claimed in claim 1, wherein on average, less time was required to train the images in the MobileNetV2 and EfficientNetB0 architectures, and it took 565 and 545 s/epoch, respectively, on colored images.
5. The system as claimed in claim 1, wherein in comparison with other deep-learning approaches, the implemented deep-learning model has better predictive ability in terms of both accuracy and loss; The required time to train the model was much less than that of other machine-learning approaches.
6. The system as claimed in claim 1, wherein the MobileNetV2 architecture is an optimized deep convolutional neural network that limits the parameter number and operations as much as possible, and can easily run on mobile devices.
, Description:TITLE OF THE INVENTION
IDENTIFICATION OF PLANT LEAF DISEASE PREDICTION USING MACHINE LEARNING ALGORITHMS
FIELD OF THE INVENTION
This invention relates to Identification of Plant Leaf Disease Prediction Using Machine Learning Algorithms.
BACKGROUND OF THE INVENTION
Introduction
The automated identification of plant diseases based on plant leaves is a major landmark in the field of agriculture. Moreover, the early and timely identification of plant diseases positively impacts crop yield and quality. Due to the cultivation of a large number of crop products, even an agriculturist and pathologist may often fail to identify the diseases in plants by visualizing disease-affected leaves. However, in the rural areas of developing countries, visual observation is still the primary approach of disease identification. It also requires continuous monitoring by experts. In remote areas, farmers may need to travel far to consult an expert, which is time-consuming and expensive. Automated computational systems for the detection and diagnosis of plant diseases assist farmers and agronomists with their high throughput and precision. In order to overcome the above problems, researchers have thought of several solutions. Various types of feature sets can be used in machine learning for the classification of plant diseases.
On the other hand, deep-learning-based techniques, particularly CNNs, are the most promising approach for automatically learning decisive and discriminative features. Deep learning (DL) consists of different convolutional layers that represent learning features from the data. Plant-disease detection can be accomplished using a deep-learning model. Deep learning also has some drawbacks, as it requires large amounts of data to train the network. If an available dataset does not contain enough images, performance is worse. Transfer learning has several advantages; for example, it does not needs a large amount of data to train the network. Transfer learning improves learning a new task through knowledge transfer from a similar task that had already been learned. Many studies used transfer learning in their disease-detection approach. The benefits of using transfer learning are a decrease in training time, generalization error, and computational cost of building a DL model. In this work, we use different DL models to identify plant diseases. The inception module can extract more specific and relevant features as it allows for simultaneous multilevel feature extraction. We replaced the standard convolution of an inception block with depthwise separable convolution to reduce the parameter number. Multiple feature extraction improves the performance of the model. In a residual network, it has a shortcut connection that basically feeds the previous layer output to the next layer, which strengthens features and improves accuracy. To evaluate performance on a lightweight memory-efficient interface, the MobileNet model is used. MobileNetV2 architecture can achieve high accuracy rates while keeping parameter number and computation as low as possible.
Related Work
The implementation of proper techniques to identify healthy and diseased leaves helps in controlling crop loss and increasing productivity. This section comprises different existing machine-learning techniques for the identification of plant diseases.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1: Basic architectures of implemented DL Models.
Fig. 2: Basic block diagram of InceptionResNetV2 model.
Fig. 3: (a) modified structures of InceptionResnet-A. (b) Structures of InceptionResnet-B of InceptionResnetV2 model.
Fig. 4: Structures of InceptionResNet-C in InceptionResNetV2.
Fig. 5: Basic Block diagram of EfficientNet model.
Fig. 6: Sample images of colour, grayscale segmented version of PlantVillage image dataset.
Fig. 7: (a) Performance accuracy of implemented model. (b) Performance loss of implemented model. (c) F1 score of InceptionV3. (d) Accuracy of InceptionResNetV2 grouped by training
Fig. 8: Performance accuracy with different dropout values.
Fig. 9: Example of correct classification from test image set.
DETAILED DESCRIPTION OF THE INVENTION
2.1. Shape- and Texture-Based Identification
In [30], the authors identified diseases using tomato-leaf images. They used different geometric and histogram-based features from segmented diseased portions and applied an SVM classifier with different kernels for classification. S.Kaur et al. identified three different soybean diseases using different color and texture features. In P Babu et al. used a feed-forward neural network and backpropagation to identify plant leaves and their diseases. S. S. Chouhan et al. used a bacterial-foraging-optimization-based radial-basisfunction neural network (BRBFNN) for the identification of leaves and fungal diseases in plants. In their approaches, they used a region-growing algorithm to extract features from a leaf on the basis of seed points having similar attributes. The bacterial-foraging optimization technique is used to speed up a network and improve classification accuracy.
2.2. Deep-Learning-Based Identification
Mohanty et al. used AlexNet and GoogleNet CNN architectures in the identification of 26 different plant diseases. Ferentinos et al. used different CNN architectures to identify 58 different plant diseases, achieving high levels of classification accuracy. In their approach, they also tested the CNN architecture with real-time images. Sladojevic et al. [26] designed a DL architecture to identify 13 different plant diseases. They used the Caffe DL framework to perform CNN training. Kamilaris et al. exhaustively researched different DL approaches and their drawbacks in the field of agriculture. In, the authors proposed a nine-layer CNN model to identify plant diseases. For experimentation purposes, they used the Plant Village dataset and data-augmentation techniques to increase the data size, and analyzed performance. The authors reported better accuracy than that of a traditional machine-learning-based approach.
For training and testing purposes, we used the standard open-access PlantVillage dataset, which consists of 54,305 numbers of healthy- and infected-plant leaves. Detailed database information, the number of classes and images in each class, their common and scientific names, and the disease-causing viruses are shown in Tables 5 and 6. The database contains 38 different classes of 14 different plant species with healthy- and disease-affected-leaf images. All images were captured in laboratory conditions. Figure 6 shows some sample leaf images from the PlantVillage datasets. In our experiment, we used three different formats of PlantVillage datasets. First, we ran the experiment with colored leaf images, and then with segmented leaf images of the same dataset. In the segmented images, the background was smoothed, so that it could provide more meaningful information that would be easier to analyze. Lastly, we used grayscale images of the same dataset to evaluate the performance of the implemented methods. All leaf images were divided into two sets, a training set and the testing set. To evaluate performance, we split leaf images into three different sets, namely 80–20 (80% training images and 20% testing images), 70–30 (70% training images and 30% testing images), and 60–40 (60% training images and 40% testing images).
A NASNet-based deep CNN architecture was used in to identify leaf diseases in plants, and an accuracy rate of 93.82% was achieved. Rice- and maize-leaf diseases were identified by Chen et al. using the INC-VGGN method. In their approach, they replaced the last convolutional layer of VGG19 with two inception layers and one global average pooling layer. A shallow CNN (SCNN) was used by Yang Li et al. in the identification of maize, apple, and grape diseases. First, they extracted CNN features and classified them using SVM and RF classifiers. Sethy et al. used different deep-learning models to extract features and classify them using an SVM classifier. Using ResNet50 with SVM, they achieved the highest performance accuracy. A VGG16, ResNet, and DenseNet model was used by Yafeng Zhao et al. to identify plant diseases from the plant village dataset. To increase the dataset size, they used a double generative adversarial network (DoubleGAN), which improved the performance results. A summary of the related work on plant-disease identification based on leaf images is shown in Table 1
Results
The implemented CNN architectures, as described in the previous section, used the parameters in Table 7. EfficientNetB0 achieved the best accuracy in comparison with that of InceptionV3, MobileNetV2, and InceptionResNetV2. To evaluate performance, we used different parameters, for example, performance accuracy, F1 score, precision, recall, training loss, and time required per epoch. As in our experiment, we used three different representations (i.e., color, grayscale, segmented) of PlantVillage image data, which showed different performance metrics in all cases. The color-image dataset performed better than those with grayscale and segmented images; the same number of CNN network parameters was maintained in all cases. Figure 7a–c shows the graphs for testing the accuracy, loss, and F1-score regarding the number of epochs for the implemented models. Figure 7d represents the accuracy graph of the InceptionResnetV2 model with different training and testing split images. A summary of the performance comparisons of the implemented models based on testing accuracy and testing loss is represented in Table 8. The performance metrics that are considered in our proposed work are as follows.
• Performance accuracy: the total number of correctly classified images to the total number of images.
• Loss function: how well the architecture models the data.
• Precision: the ratio of the number of correctly predicted observations (true positives) to the total number of positive predictions (true positives + false positives).
• Recall: the ratio of correctly predicted observations (true positives) to all observations in that class (true positives + false negatives).
• F1 score: the harmonic mean between precision and recall.
• Time requirement (in sec) per epoch for training each DL model.
Table 8 indicates that the implemented techniques achieved better performance in terms of the combination of accuracy and average time per epoch in comparison with that of other implemented techniques. The highest successful classification accuracy, obtained by EfficientNetB0, was 99.56%, and training time was much less as compared with that of the InceptionV3, InceptionResNetV2, and MobileNetV2 architectures. The decrease in time per epoch was because the number of parameters in these models was quite smaller than that of other existing models. A comparison between the number of parameters used in different models is highlighted in Table 1. The novelty of the implemented model lies in the fact that we used depthwise separable convolution, which reduces the network parameters. We considered different deep-learning models, such as a deep-learning model with an inception layer, deep learning with a residual connection, deep learning with depthwise separable convolution, and deep-learning models with depth, width, and resolution. We finetuned the network parameters to achieve better performance accuracy with less time, as is shown in Table 8.
To avoid overfitting, we phasewise divided the dataset into different training and testing ratios. In the case of 80% of training and 20% of testing image data, we achieved an accuracy of 98.42% in InceptionV3, 99.11% in InceptionResNetV2, 97.02% in MobilenetV2, and 99.56% in EfficientNetB0 for color images. After splitting the dataset into different training and testing ratios, there was not much variation in the accuracy of the models. Hence, they did not suffer from the problem of overfitting. The accuracy of all models for different image types with loss and number of epochs are shown in Table 9. Table 10 presents the precision, recall, and F1 score of the implemented models on splitting the dataset into 80–20% training and testing ratios. EffcientNetB0 had a precision value of 0.9953, recall of 0.9971, and F1 score of 0.9961, which were higher than those of the other models.
The accuracy of the model with respect to the number of predictions in the MobileNetV2 architecture decreased to 91% if we used a dropout value of 0.8. Figure 8 shows performance accuracy with respect to the different dropout values used in the network. Figure 9 shows correctly classified results from the test image dataset with their predicted and source class. The predicted class was returned with the confidence of that class.
Conclusion
There are many developed methods in the detection and classification of plant diseases using diseased leaves of plants. However, there is still no efficient and effective commercial solution that can be used to identify the diseases. In our work, we used four different DL models (InceptionV3, InceptionResnetV2, MobileNetV2, EfficientNetB0) for the detection of plant diseases using healthy- and diseased-leaf images of plants. To train and test the model, we used the standard PlantVillage dataset with 53,407 images, which were all captured in laboratory conditions. This dataset consists of 38 different classes of different healthy- and diseased-leaf images of 14 different species. After splitting the dataset into 80–20 (80% of whole data for training, 20% whole images for testing), we achieved the best accuracy rate of 99.56% in EfficientNetB0 model. On average, less time was required to train the images in the MobileNetV2 and EfficientNetB0 architectures, and it took 565 and 545 s/epoch, respectively, on colored images. In comparison with other deep-learning approaches, the implemented deep-learning model has better predictive ability in terms of both accuracy and loss. The required time to train the model was much less than that of other machine-learning approaches. Moreover, the MobileNetV2 architecture is an optimized deep convolutional neural network that limits the parameter number and operations as much as possible, and can easily run on mobile devices.
| # | Name | Date |
|---|---|---|
| 1 | 202141041021-STATEMENT OF UNDERTAKING (FORM 3) [09-09-2021(online)].pdf | 2021-09-09 |
| 2 | 202141041021-REQUEST FOR EARLY PUBLICATION(FORM-9) [09-09-2021(online)].pdf | 2021-09-09 |
| 3 | 202141041021-FORM-9 [09-09-2021(online)].pdf | 2021-09-09 |
| 4 | 202141041021-FORM 1 [09-09-2021(online)].pdf | 2021-09-09 |
| 5 | 202141041021-DRAWINGS [09-09-2021(online)].pdf | 2021-09-09 |
| 6 | 202141041021-DECLARATION OF INVENTORSHIP (FORM 5) [09-09-2021(online)].pdf | 2021-09-09 |
| 7 | 202141041021-COMPLETE SPECIFICATION [09-09-2021(online)].pdf | 2021-09-09 |