Classification of fruits using deep learning algor

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (931.5 KB, 21 trang )

<span class="text_page_counter">Trang 1</span><div class="page_container" data-page="1">

Classi cation of Fruits using Deep Learning

Keywords: Adam, Fruit classi cation, Gaussian lter, MobileNetV2, TSR transformation.

Posted Date: April 1st, 2022

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.

Read Full License

</div><span class="text_page_counter">Trang 2</span><div class="page_container" data-page="2">

Classi cation of variety of fruits is an important framework in agriculture eld for import and export. Many algorithms were employed for classi cation of fruits. To remove the noises from the images Gaussian lter is applied during the pre-processing. Bunch of fruits are classi ed into different classes such as apple, orange and banana. Also their quality is taken into account for preventing health hazards. In this work, the various combinations of fruits are classi ed into proper variety and after that the quality of the fruit is checked as whether it is defect or non-defect. For the rst part, Convolutional Neural

Network, AlexNet and MobileNetV2 are employed. MobileNetV2 achieved 100% accuracy for fruit type classi cation. In the second part, the same kind of fruits are fed into classi er for quality checking. The above said classi ers are used for defect classi cation also. For defect classi cation, MobileNetV2 gives 99.89% accuracy for orange and 100% for apple.

1. Introduction

Throughout the world people rely on fruits for survival. Fruits are used in a range of businesses, including food industry, thus they desire to deliver a high-quality product. The quality of fruits is vitally important to people's health. The most di cult part of the process is manually identifying fruit aws. Researchers are using an automated image processing technique to help them overcome these obstacles. Consumers will bene t get more from this type of research work. These types of inspections help determine market pricing. Computer vision systems and image processing are burgeoning academic elds, and they're a crucial component of fruit analysis. There are numerous methods and techniques for determining image quality. Three types of CNN models are used in this research to determine the optimum accuracy for fruit classi cation and fruit defect classi cation.

Juan M. Pounce et al., [1] proposed marker controlled watershed segmentation for olive fruit variety classi cation. They investigated six different models such as AlexNet, Inception-ResNetV2, Inception V1, Inception V3, ResNet-50 and ResNet-101. Based on the six models Inception ResNetV2 gave the accuracy as 95.91%. Hamdi Altaheri et al., [2] comprised their work into three classi cation models for date fruit type, maturity stage and harvesting decision. For type and maturity classi cation the multiclass classi er were carried out. AlexNet and VGGNet models were used. Among those models VGGNet acquired 99% accuracy. Two deep learning architectures named as light model of six CNN layers and VGG16 for classi cation the variety of fruits were proposed by M.Shamim Hossain et al., [3]. They categorized the fruit species based on their shape, colour and texture. Three subsystems termed as to estimate the date type, maturity and weight were proposed by Mohammed Faisal et al., [4]. The date type and maturity subsystems made use of four models such as ResNet, VGG-19, Inception V3 and NasNet. SVM regression was used in the date weight estimation. ResNet acquired 99% accuracy. Aifeng Ren et al., [5] gathered the data from apple and mango fruit slices in order to analyse and classify the moisture content over time. They have three domain in the feature extraction are frequency domain, time domain and time-frequency domain. Wavelet is a time-frequency domain characteristic that was used to analyse the short duration pulses with quick and unpredictable variations. SVM, KNN and Decision Tree are the machine learning

</div><span class="text_page_counter">Trang 3</span><div class="page_container" data-page="3">

models used for the classi cation. Among three D-Tree gave the accuracy for apple 95.45% and for mango 94.32%.

Guoxiong Zhou et al., [6] proposed the Fuzzy C-Means and the Non-Linear Programming algorithm in the combination with a multivariate image analysis to detect defective apple fruit. FCM is used in the

segmentation part. Multivariate image analysis is a technique that used in image analysis and it acquired accuracy as 98%. Gang Xue et al., [7] constructed an Attention based densely connected convolutional network with convolution auto encoder a hybrid deep learning based fruit image categorization

framework(CAE-AND). They experimented with three different models are ResNet-50, DenseNet-169 and ADN. The CAE-ADN model has the accuracy as 95.86%. Jose Naranjo-Torres et al., [8] proposed their work to recognize fruit using many models such as AlexNet, VGG16, MobileNet, Inception V3 and ResNet-50. They trained their model over 10epochs using stochastic descent gradient (SGD) algorithm. AlexNet gave the accuracy as 95.86%. Himer Avila-George et al., [9] classi ed gooseberry fruit by ripeness level utilising three colour spaces are RGB, HSV and l*a*b*. It was used to visually classify things. The models like ANN, D-Tree, SVM and KNN are trained among this SVM classi er gave the accuracy as 92.47%.

Autonomous vision based technologies using the open CV programme was proposed by Shital. A et al., [10]. This technique was used for fruit sorting, grading and aw identi cation. Shushree et al., [11] applied SFTA algorithm for extracting the features from the dataset. The DNN approach was used to train the dataset. The examination of sicknesses found in fruits was proposed by Ananthi. N et al., [12].

Preprocessing, feature extraction, and image de-noising techniques are investigated for analyzing fruit infections. The median lter was employed in the image de-noising process. Blob detection was used to improve the image and convert it to binary format. The common agglomeration problem was solved by K-Means segmentation. DNN Classi cation technique was used to classify fruits based on their taintness. Meshwa Patel et al., [13] identi ed the fruit quality in orange fruit using image processing with Support Vector Machine classi er. The rst stage in image processing was image preparation. The Gray Level Co-occurrence Matrix feature was used in feature extraction to remove extraneous components and simplify the process. In the image de-noising process, the Median lter was applied. Morphological image

processing was used for enhancing the images. Pushpavalli. M [14] advocated a computer vision technique to grade mango fruits. Image preprocessing, segmentation, feature extraction, selection, and classi cation were applied for better result. To minimize noise, a median lter was utilized. After

converting the image to binary representation, segmentation was performed. The OTSU method was used to transform the data. SoummoSupriya et al., [15] developed a machine vision-based expert system for fruit recognition in order to retrieve the cultural past. Image segmentation was carried out using K-Means clustering algorithm. The statistical and Grey Level Co-occurrence Matrix features were extracted from the images.

Yogesh et al., [16] introduced three primary steps such as model building, model testing, and model con guration. The apple and mangosteen fruits were analyzed using the CNN algorithm. For pears categorization, the ANN classi er was used. Strawberries were classi ed using SVM. It had a great level of precision. Deepti.C et al., [17] set out to discover the most e cient and cost-effective method of

</div><span class="text_page_counter">Trang 4</span><div class="page_container" data-page="4">

detecting arti cially ripened fruits. Functional needs, non-functional requirements, assumptions and dependencies, and restrictions were the four kinds of system requirements. To determine whether the fruit is phony or real, a CNN algorithm with appropriate distance and orientation was applied. RashminPriya et al., [18] proposed for the disease name in orange fruits. Shape, color, and texture are taken into account for feature extraction. Two types of lters are employed in segmentation such as the median lter and the box lter. Segmentation was done using OTSU method. For data mining and cluster analysis, the

K-Means method is utilized. ShaliniGnanavel et al., [19] set out to determine fruit quality and provide

organic consumption levels. A conductivity sensor can be used for a variety of purposes, including quality monitoring, detecting arti cially ripened fruit, and detecting pesticide residue levels in fruits.

Nareshkumar.C et al., [20] proposed an automatic vision-based technology for fruit problem detectionfor replacing the human approach. Image acquisition, image noise removal, and image segmentation were the three procedures covered by them. To remove background noise, the Gaussian lter was applied. The solution proposed by Neha et al., [21] used computer vision to detect bruising in a non-destructive manner in tomato aws. The computer vision system has a total of 12 levels: one for input, another for output, and ten for hidden layers. To extract features from the image, convolutional neural networks are utilized. A temperature sensor, humidity sensor, and light dependent resistor make up the IOT-based food storage unit monitoring system. The Wi-Fi module connects to the internet and updates the room's status. The CNN model extracts and categorizes features and patterns in images.

An image processing-based technique for identifying apple fruit quality loss was proposed by Ramya.C et al., [22]. There are three steps: Data gathering, CNN development, and data argumentation were all part of the data collection process. The pooling layer shrinks the image in the second step. The 7*7 matrix was converted into a 4*4 matrix by stacking the layers. The fully connected layer is useful for training the network, predicting outcomes, and categorizing inputs. The fruit illness was discovered in the nal stages. IshdeepSingla et al., [23] compared a number of approaches. Image acquisition, image

preprocessing, image segmentation, feature extraction, and classi cation were the ve steps required to train the apple. Filters, contrast, stretching, grouping, histogram equalization, grey scale conversion, and RGB to binary are used to improve the image. In the segmentation phase, morphological ltration, thresholding, and clustering were utilized. To help categorize the fruits, extracted the features such as color, shape, texture, and size. Sumati.M et al., [24] conferred an automatic vision based system for sorting and grading fruits by color and size. Color-based fuzzy reasoning is used to sort the mango fruit. A.K.Dubey et al., [25] de ned segmentation methodologies for detecting exterior faults in pome fruits. The notion of marker-controlled watershed segmentation was founded on ooding. Manjesh.R et al., [26] approach was capable of recognizing fruit based on shape, color, and texture. The texture was examined using GLCM, which considers pairings of pixels with certain values. The color histogram converts a color image to an HSV image while keeping the hue and saturation. For object detection, the HOG feature was used in vision and image processing. Vibhute.A.S et al., [27] proposed a technique for evaluating fruits based on their quality. Color detection was evaluated using the fruit's RGB values. The RGB image was transformed into HSV during color detection. The main and minor axis lengths were calculated using the Euclidian distance method for size detection. The outside parameters were used to grade and sort the

</div><span class="text_page_counter">Trang 5</span><div class="page_container" data-page="5">

objects. Color and size were detected via threshold detection. Kumar Mondal et al., [28] used a cutting-edge detection framework in their work. CNN shown a fruit recognition system based on visual data captured by a smartphone. During feature extraction, the network used a variety of convolutions and pooling methods to detect the features. The completely linked layer utilized as a classi er. Deepika Bairwa et al., [29] designed to classify fruits based on shape, color, and texture using image processing techniques. Pre-processing eliminates noise and corrects sorted or graded data. GLCM was used in extraction. Malini. V.L et al., [30] were created the data by shooting the fruits they were being spun by a motor and then eliminating frames. Technique proposed by Misha et al., [31] was completely automated and capable of grading a large number of fruits. That was used to predict the maturity and quality of purchased fruits. During image acquisition and pre-processing, Gaussian lter was employed to reduce the noise. Fuzzy segmentation reduces the appearance of the image. To improve the image quality, the intensity value is changed. Feature extraction and color features were considered by this system. The sparse representative classi er sparsely represented the fruit image using a part of the training data. The classi er can assign a class label to test samples directly based on training data.

2. Proposed Work

Farmers, purchasers, and shopkeepers can utilize the deep learning techniques used in this work to identify fruit type and quality. The purpose of this work is to eliminate health concerns associated with tainted fruits.

2.1 Fruit type classi cation

To expand the size of the datasets, the input images are augmented. The images are then preprocessed before being categorized using classi er models such as DCNN, AlexNet, and MobileNetV2 as gure 1 shows the block diagram for the fruit type classi cation.

The datasets for fruit type classi cation are sourced from Google Images. Here the different types like apple, orange, and banana are taken as classes. There are 668 images in total for this framework. 2.1.1 Preprocessing

The original input image is converted to gray scale image as shown in Fig. 2. Next the images are

subjected to translation process. Translation, Scaling, and Rotation (TSR Transformation) are performed as shown in Fig. 3.

Translation is the movement of an object in a straight line from one point to another, which is calculated using the equation (1) and (2). The object is moved from one coordinate location to another in this step. x1 = x+Tx (1)

</div><span class="text_page_counter">Trang 6</span><div class="page_container" data-page="6">

y1 = y+Ty (2)

Assume P is a point with coordinates of x and y (x, y). To translate a point from one coordinate position (x, y) to another (x1 y1), we multiply the original coordinate by the translation distances Tx and Ty. It will be translated as (x1 y1).

Scaling is a tool for adjusting or changing the size of items. Scaling factors are used to make the changes, which is calculated using the equation (3) and (4).

x1 = x *sx (3)

y1 = y * sy (4)

Sx in the x-direction and Sy in the y-direction are the two scaling factors. If the starting point is x and y. Sx and Sy are the scaling factors, and the values of the coordinates after scaling are x1 and y1.

Rotation is the process of adjusting an object's angle. The rotation can be done in either a clockwise or anticlockwise direction. We must give the angle of rotation and the rotation point for rotation. A pivot point is another name for a rotation point. It's a print that shows which object has been turned. is denoted as an angle.

Figure 3(a) shows the translation image, the scaling applied image is shown in gure 3(b) and the rotation is performed as shown in gure 3(c).

The Gaussian lter was applied to minimize noise as shown in the gure 4. A Gaussian blur of an image removes outlier pixels or high-frequency components. After the preprocessing stages, normalization is carried out as shown in gure 5. It improves the pixel range and intensity values.

2.2 Classi cation

After preprocessing, the images are fed into classi ers. In this work DeepCNN, AlexNet, and MobileNetV2 are employed as classi ers.

2.2.1 Deep convolutional neural network

A CNN is one of the most popular deep learning models. It deals with local and spatial features and patterns directly from raw data like as pictures, video, text, and sound using deep convolutional networks. DCNN learns features from data automatically, removing the need to manually extract them. Through a sequence of successive convolutional layers, it can create complex features by combining simple characteristics. The deeper layers learn to recognize complicated high-level features such as complete objects in a picture, while the early layers learn to recognize low-level elements such as edges and curves.

</div><span class="text_page_counter">Trang 7</span><div class="page_container" data-page="7">

Figure 6 displayed the DCNN architecture. The Conv2D function accepts four arguments: the rst is the number of lters, which is 32; the second is the shape of each lter, which is 3x3; the third one is input shape, which is 224; and the fourth one is type of image (RGB) of each image. The activation function 'ReLU' stands for a Recti ed Linear Unit function. If the function receives a negative value, it returns 0, but if it receives a positive value, it returns that value. It helps to keep the compute required to run the neural network from growing exponentially. y = max(0, x)is the formula used for ReLU activation function. To avoid over tting, three convolution layers followed by max-pooling layers is used. A dropout layer is inserted after the maxpool operation. Sparse Categorical Crossentropy as the loss function which is used to recompile the model. For a smoother curve, the lower learning rate of 0.000001 is considered. The model is trained and classi ed. Then the accuracy rate is calculated.

2.2.2 AlexNet

AlexNet model's input has dimensions of 227x227x3, which is followed by a Convolutional Layer with 96 lters of 11x11 dimensions, a'same' padding, and a stride of 4. The MaxPooling layer (3, 3) with a stride of 2, reducing the output size to 27x27x96, is followed by another Convolutional Layer with 256, (5,5)

lters and ‘same' padding, retaining the output height and width of the previous layer, resulting in a 27x27x256 output. The next step is to use MaxPooling once more, this time lowering the size to 13x13x256.Another Convolutional Layer with 384, (3,3) lters and the same padding is applied twice, yielding 13x13x384 as the output, followed by another Convolutional Layer with 256, (3,3) lters and the same padding, yielding 13x13x256 as the output. MaxPool is used, and the dimensions are lowered to 6x6x256. The layer is then attened and two fully connected layers with 4096 units each are created, each of which is connected to a 1000 unit softmax layer in gure 7. As per our requirements, the network is utilized to classify a large number of classes. However, because we have to categorize into six classes, we will build the output softmax layer with six units. The softmax layer calculates the probabilities for each class that an Input Image could belong to. Adam optimizer is used in this model. The model is trained and classi ed then the accuracy rate is calculated.

2.2.3 MobileNetV2

For Image Classi cation and Mobile Vision, MobileNetV2 is a CNN architecture model. MobileNetV2 requires relatively little computational power to execute or apply transfer learning. This makes it ideal for mobile devices, embedded systems, and PCs with limited computing e ciency or no GPU, without

affecting the accuracy of the results. It's also ideal for web browsers, which have limitations in terms of compute, graphics processing, and storage.

MobileNetV2 is based on a streamlined design that leverages depth wise separable convolutions to generate low weight deep neural networks, are proposed for mobile and embedded vision applications. We offer two simple global hyper-parameters that e ciently trade off latency and accuracy. Depthwise separable lters, also known as Depthwise Separable Convolution, are the foundation of MobileNet in

gure 8. Another component that improves performance is the network structure. In the order of training and testing, the fruit type and defect datasets are partitioned in the 80:20 range. The batch size is 32 and

</div><span class="text_page_counter">Trang 8</span><div class="page_container" data-page="8">

the image form is resized to 224*224. The last completely connected layer for fruit type classi cation is freezes, and three layers are placed before this layer, with the model called fruit model. There are 157 layers in total, with 3.5 million parameters. The MobileNetv2 model includes 1000 neurons, but in this study only three are used, depending on the categories learned. Two new layers are included for fruit defect classi cation, and the model is termed fruit defect model. There are 156 layers in total, with 3.5 million parameters. Recompile the model using Adam as the optimizer and categorical cross entropy as the loss function. In neural network models that predict a multinomial probability distribution, the

softmax function is utilized as the activation function in the output layer. The model is trained and classi ed. The accuracy rate is calculated.

2.2.4 Adam

Adam is a Neural Network optimization solver that is computationally e cient, takes little memory and is suited for problems with a lot of data and parameters. Adam is a stochastic gradient descent extension. Adam adjusts the learning rate for each weight of the neural network using estimates of the rst and second moments of gradient. Adam optimizer is utilized in three models such as DCNN, AlexNet and MobileNetV2.

The images are trained in this three models and fruit type is classi ed into three types such as apple, orange and banana.

3. Classi cation Of Images As Defect Or Non-defect

The fruit quality categorization is to help with import and export, as well as numerous fruit-related businesses. Fruit defects are easily spotted which saves time for workers and ensures that the fruits are fresh in a timely manner. This fruit quality classi cation method is used to locate defective fruits in a short amount of time.

3.1 Data Augmentation

The datasets for fruit defect classi cation are acquired from Google Images. The fruits chosen for this framework are apple and orange. There are 1728 images obtained for the apple datasets and 2331 images collected for the orange datasets. This framework distinguishes between defective and non-defective fruits. The datasets for this frameworkcomprise a total of 4059 images in total. To expand the dataset quantity data augmentation is performed. It reduces the over tting. Rotation, shear, zoom, brightness, and horizontal ip are all performed in the augmented images.

3.2 Classi cation

The images are fed into the above three classi er models like DCNN, AlexNet and MobileNetV2. In this framework, apple and orange fruits are labeled into two types such as defect and non-defect. The apple and orange fruits are classi ed into two types named as defect or non-defect.

</div><span class="text_page_counter">Trang 9</span><div class="page_container" data-page="9">

4. Results And Discussion

4.1 Result for fruit type classi cation

Three models are trained in the proposed framework for fruit type classi cation. The models trained for this dataset are DCNN, AlexNet and MobileNetV2. The learning rate of the Deep Convolutional Neural Network model is 0.000001 and it is trained over 110 epochs in the duration of 3hrs. The validation accuracy is 87.5 percent and the training accuracy is 100 percent. The AlexNet model is trained across 100 epochs, with a total time of 1hr40mins. The validation accuracy is 99.25 percent, whereas the training accuracy is 99.81 percent. The MobileNetV2 model is trained over a period of 12 minutes using 10 epochs. The validation accuracy is 100 percent and the training accuracy is 100 percent. According to these three models, the MobileNetV2 model has the best fruit type accuracy.It recognizes the fruit type andcategorizes as shown in gure 9.

The gure 10 shows the accuracy rate of training and validation with various epochs for fruit type classi cation.

The loss rate for training and validation process is shown in gure 11 with respect to epochs vs cross entropy for fruit type classi cation.

Table 1. Comparison of accuracy for fruit type classi cation with different epochs

Deep convolutional neural network 110 3hrs 87.5%

For the fruit type classi cation, there is a comparison accuracy for the three models as shown in table 1. The MobileNetV2 model, out of the three, provided 100 percent accuracy in a short period of time. For fruit type classi cation, the MobilNetV2 model is the best.The best model for fruit type classi cation as shown in gure 12 as a graphical representation based on its accuracy. Among the three models, the MobileNetV2 model has the best accuracy.

4.2Result for defect classi cation

Three models are trained in this work for the classi cation of fruits as defect or non-defect such as DCNN, AlexNet and MobileNetV2. For apple dataset,the learning rate of the Deep Convolutional Neural Network model is 0.000001, and the model is trained across 300 epochs in 3 hours and 25 minutes. The validation accuracy is 68 percent and the training accuracy is 84.55 percent. The AlexNet model is trained over 100 epochs in 1 hour and 6 minutes. The validation accuracy is 70.59 percent and the training

</div><span class="text_page_counter">Trang 10</span><div class="page_container" data-page="10">

accuracy is 99.36 percent. The MobileNetV2 model is trained over a 10-minute period using 20 epochs.The validation accuracy is 100 percent, whereas the training accuracy is 100 percent. When comparing the accuracy of the three models, the MobileNetV2 model has the best accuracy. This

paradigm is quite useful for distinguishing between defective and non-defective fruits as shown in gure 13.

The gure 14 shows the accuracy rate of training and validation with various epochs for defect fruit classi cation for apple.

The loss rate for training and validation process is shown in gure 15 with respect to epochs vs cross entropy for defect fruit classi cation for apple.

Table 2. Comparison of accuracy for the defect classi cation of apple fruit with different epochs

Deep convolutional neural network 300 3hrs25mins 68%

The table 2 shows the comparison of the proposed three models for fruit defect and non-defect classi cation based on accuracy. Apple and Orange are the two fruits for which three models were developed. For apple, the MobileNetV2 model provided the most accurate results. Apple gained 100 percent accuracy in a short amount of time. The best model for fruit defect classi cation for apple is shown in gure 16 as a graphical representation based on its accuracy.

For orange dataset, the Deep Convolutional Neural Network model has a learning rate of 0.000001 and takes 35 minutes to train across 10 epochs. The training accuracy is 72.63 percent, whereas the

validation accuracy is 90 percent. In 40 minutes, the AlexNet model is trained across 10 epochs. The training accuracy is 97 percent, while the validation accuracy is 53 percent. The MobileNetV2 model is trained using 10 epochs over a 22-minute timeframe. The training accuracy is 100 percent, whereas the validation accuracy is 99.89 percent. When the accuracy of the three models is compared, the

MobileNetV2 model comes out on top. This paradigm is very effective for determining if a fruit is defect or not gure 17.

The gure 18 shows the accuracy rate of training and validation with various epochs for defect fruit classi cation for orange.

</div>