project report introduction to artificial intelligent facial recognition and classification between vietnamese and foreigner

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.52 MB, 27 trang )

<span class="text_page_counter">Trang 1</span><div class="page_container" data-page="1">

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGYSCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

Hanoi, 2023

</div><span class="text_page_counter">Trang 2</span><div class="page_container" data-page="2">

4.2 Experimental result. . . . 19

4.3 Future development. . . . 23

</div><span class="text_page_counter">Trang 3</span><div class="page_container" data-page="3">

Facial recognition technology is a rapidly advancing field that has a significant impact onmany industries. This technology enables critical emerging applications such as security andsurveillance, authentication, and human-computer interaction. As the technology continuesto evolve, it will continue to transform various fields in the future.

In the ”Introduction to Artificial Intelligence – IT3160E” course, we gained fundamentalknowledge about facial recognition technology. In our project, our goal was to distinguishbetween Vietnamese and foreign faces using facial recognition algorithms.

This report provides an explanation of the theoretical basis of the algorithms we usedand how we implemented them. We then analyze the effectiveness of the models in correctlyclassifying Vietnamese and foreign faces and identify areas for improvement.

Our experiments involved using machine learning algorithms like convolutional neural works (CNNs) and support vector machines (SVMs). CNNs can automatically learn relevantfeatures from image data, while SVMs perform classification based on learned patterns.

net-We trained and tested the models on a dataset containing photos of Vietnamese andforeign faces. We evaluated the models based on their accuracy and ability to generalize tonew images. However, we faced some challenges, such as the need for a larger and morediverse dataset to improve performance.

Keywords: Facial Recognition – Renet50

</div><span class="text_page_counter">Trang 4</span><div class="page_container" data-page="4">

Chapter 1Introduction

1.1History of facial recognition technology

Facial recognition technology is a powerful tool that uses facial features to identify andverify individuals. It has numerous applications, including access control, surveillance, lawenforcement, and targeted advertising. One of its primary functions is access control, whereit is used to verify the identity of individuals. This technology has also become more accuratewith the help of machine learning and AI.

However, there are concerns about potential biases, privacy, and security issues. cial recognition algorithms can recognize facial landmarks like the eyes, eyebrows, nose, andmouth shape, and by measuring the distances and relative sizes of these features, a mathe-matical representation of the face is created for comparison and matching. These algorithmscan also introduce biases and errors in the identification process. As a result, governmentsand organizations must establish guidelines and regulations to ensure ethical and responsibledevelopment and use of the technology.

Fa-In conclusion, facial recognition technology has the potential to transform various fields,but it also raises important questions and concerns. While it has numerous applications inaccess control, surveillance, law enforcement, and targeted advertising, there are concernsabout potential biases, privacy, and security issues. By establishing ethical guidelines andresponsible development, we can ensure that this technology is used in a way that benefitssociety as a whole.

</div><span class="text_page_counter">Trang 5</span><div class="page_container" data-page="5">

1.2. CUSTOMER CLASSIFICATION 2

1.2Customer classification

Separating customers into Vietnamese and foreign groups can help businesses improve theirstrategies, marketing and services. Foreign customers in Vietnam offer new opportunities,so distinguishing them from Vietnamese customers enables businesses to develop customizedplans to meet their unique needs. Vietnamese and foreign customers often differ in language,culture, expectations and spending habits. A better understanding of these differences canenhance the experience for both customer types.

Facial recognition could potentially allow businesses to automatically categorize customersas Vietnamese or foreign upon entry, in real-time. While this raises privacy and bias concerns,if implemented responsibly it could supply data to optimize service for different nationalities.To build an effective categorization system, companies will need a large and diverse datasetof Vietnamese and foreign customer facial images to train their systems. They must alsoimplement policies, disclosures and consent processes to gain customer trust and addressethical issues around the technology’s use. This will be key to realizing the benefits ofcustomer categorization while minimizing potential harm.

1.3Problem description

The overall goal of this research is to create a facial recognition and classification model thatcan accurately differentiate between Vietnamese and foreign faces. The model will be trainedon a large dataset containing thousands of facial images of Vietnamese and foreign people.

The model will perform facial recognition and classification on two types of input images:stored facial images as well as real-time image frames continuously captured by cameras andother image capturing devices.

The intended functionality of the final model will be to provide key facial measurementsfor any given input image, include:

• Age• Gender

• Ethnicity as either Vietnamese or foreigner

</div><span class="text_page_counter">Trang 6</span><div class="page_container" data-page="6">

1.3. PROBLEM DESCRIPTION 3

• Emotional state

Through extensive training and testing, the goal is to optimize the model’s performancemetrics such as facial recognition accuracy and classification accuracy when distinguishingbetween Vietnamese and foreign faces. This designed functionality has promising applica-tions for use cases such as targeted advertising, access control, and customer insights andsegmentation.

</div><span class="text_page_counter">Trang 7</span><div class="page_container" data-page="7">

Chapter 2Model

Our project ill utilize three pre-trained model weights: age detection, gender detection, andemotion detection. We will specifically train the models to differentiate between Vietnameseand foreign individuals.

To train the model for distinguishing between Vietnamese and foreign individuals, weneed to gather and process the data, followed by model construction and training.

1. Data collection: We will collect a diverse dataset consisting of images or relevant datafrom both Vietnamese and foreign individuals. This dataset should adequately repre-sent the characteristics and variations present in both groups.

2. Data preprocessing: The collected data will undergo preprocessing steps such as imageresizing, normalization, and data augmentation techniques to enhance the quality andvariety of the dataset. This ensures that the model can generalize well to unseen data.

3. Model construction: We will design and build a suitable neural network architecturefor the task of differentiating between Vietnamese and foreign individuals. This ar-chitecture may involve various layers, such as convolutional layers for image analysis,followed by fully connected layers for classification.

4. Model training: The constructed model will be trained using the preprocessed dataset.The training process involves feeding the model with input data and adjusting its inter-nal parameters through an optimization algorithm (e.g., stochastic gradient descent).

</div><span class="text_page_counter">Trang 8</span><div class="page_container" data-page="8">

2.1. DATA COLLECTION AND DATA PREPROCESSING 5

The goal is to minimize the difference between the model’s predictions and the groundtruth labels in the training dataset.

During training, it’s crucial to validate the model’s performance on a separate validationdataset to monitor its progress and prevent overfitting. Fine-tuning the model’s hyperparam-eters, such as learning rate and regularization techniques, may also be necessary to optimizeits performance.

2.1Data Collection and Data preprocessing

img= cv2 imread(img).

img= cv2.cvtColor(img, cv2 COLOR_BGR2RGB).

boxes_face= face_recognition.face_locations(img)face_image=None

if len(boxes_face)!=0:forbox_faceinboxes_face:

</div><span class="text_page_counter">Trang 9</span><div class="page_container" data-page="9">

2.1. DATA COLLECTION AND DATA PREPROCESSING 6

box_face_fc=box_facex0,y1,x1,y0=box_facebox_face= np array([y0,x0,y1,x1]).

face_image= img[x0:x1,y0:y1]returnface_image

Then, to make the data usable for training model, we continue process the output datafromget face imgfunction:

face_image= get_face_img(image_path)ifface_imageis notNone:

# Tranform the image to tensor that can be fed into modeldata_transform=transforms Compose([.

transforms Resize((. 128 128, )),transforms ToTensor(),.

transforms Normalize((. 0.5,0.5,0.5), (0.5,0.5,0.5))])

input_tensor=data_transform(Image.fromarray(face_image)) unsqueeze( ). 0

Since the main framework we use to build our model is tensorflow, we need to transformnumpy array data to tensor so that it can be fed into the model. The data transformsfunction is being used to complete this task:

importtorchvision.transformsastransformsdata_transform=transforms Compose([.

transforms Resize((. 128 128, )),transforms ToTensor(),.

transforms Normalize((. 0.5,0.5,0.5), (0.5,0.5,0.5))])

Now, we can load the train data and test data and start processing:6

</div><span class="text_page_counter">Trang 10</span><div class="page_container" data-page="10">

2.2. NEURAL NETWORK STRUCTURE 7

train_data=ImageFolder("data/train",transform=data_transforms)train_loader=DataLoader(train_data, batch_size= 32, shuffle=True)test_data= ImageFolder("data/test",transform= data_transforms)test_loader=DataLoader(train_data, batch_size= 32, shuffle= False)

After completing all these steps, we can finally train our deep learning model with theprocessed data.

2.2Neural network structure

2.2.1ResNet50 model

ResNet (short for Residual Network ) is a popular deep learning model architecture knownfor its ability to train very deep neural networks effectively. It introduces skip connectionsthat allow the network to learn residual mappings, enabling the training of deeper modelswithout suffering from the vanishing gradient problem.

In summary, the code defines a ResNet model architecture with multiple residual blocks

</div><span class="text_page_counter">Trang 11</span><div class="page_container" data-page="11">

2.2. NEURAL NETWORK STRUCTURE 8

for image classification.

Initial convolutional layer

This begin block given the tensor and prepare it before feed into residual block.• Convolution: extract input tensor from 3 channels to 32 channels.

• Batch Normalization: normalize distribution between each.

• Activation: ReLU algorithm.

• Max Pooling: chose the great value of each kernels (2, 2) from receipted field (channelsof tensor).

Residual block

A residual block is a fundamental building block used in residual neural networks (ResNet).It is designed to address the vanishing gradient problem and enable the training of verydeep neural networks effectively. The key idea behind a residual block is to introduce skipconnections that allow the network to learn residual mappings, which are the differencebetween the input and output of a block (this construction will be ex).

This network use 8 residual blocks with 4 blocks down sampling the half of size.

</div><span class="text_page_counter">Trang 12</span><div class="page_container" data-page="12">

2.2. NEURAL NETWORK STRUCTURE 9

Global Average Pooling

Chose the mean value of each channel from tensor:

• This layer converts all channels of residual block’s output into notes of fully connectedlayer (512 channels = 512 notes).

• In the same way as the Flatten function, the Average Pooling operation aggregates theaverage value of the information. Unlike Flatten, which transforms the input into aonedimensional vector, Average Pooling reduces the spatial dimensions of the input bytaking the average value within each pooling region.

Fully Connected Network

The prediction refers to a one-hot vector that represents the predicted class probabilities,with each element in the vector corresponding to a class. The predicted probabilities fallwithin the range of (0, 1). The number of elements in the vector is equal to the number ofclasses in the problem. Hence, the activation function should be sigmoid function.

</div><span class="text_page_counter">Trang 13</span><div class="page_container" data-page="13">

2.2. NEURAL NETWORK STRUCTURE 10

Additionally, the final layer of the network, which produces the predictions, has the samenumber of nodes as the number of classes. Each node in the final layer corresponds to aspecific class, and the output of the network represents the probabilities of each class basedon the input data, Ex:

• pred<small>1</small>= [[0.873, 0 233]] →. pred<small>1</small>[0][0] > pred<small>1</small>[0][1]→Vietnamese.

• pred<small>2</small>= [[0.551, 0 773]] →. pred<small>2</small>[0][0] < pred<small>2</small>[0][1] → Foreigner.

2.2.2Residual Block

</div><span class="text_page_counter">Trang 14</span><div class="page_container" data-page="14">

2.2. NEURAL NETWORK STRUCTURE 11

Convolution block

This block applies another convolutional layer with a 3x3 kernel size and the specifiednumber of filters. It is followed by:

• Convolution.• Batch normalization.

ã 1 ì 1 convolution to match the dimensions of the shortcut with the number of filters.

• Batch normalization is applied to the shortcut.

</div><span class="text_page_counter">Trang 15</span><div class="page_container" data-page="15">

2.2. NEURAL NETWORK STRUCTURE 12

Addition and activation

The output of the identity block is added to the shortcut, and ReLU activation is appliedto the sum. Dropout regularization is then applied.

2.2.3Optimizer and Loss function

input_shape= 128 128 3( , , )num_classes= 2

optimizer= Adam(learning_rate= 0.0003)model=resnet_model(input_shape, num_classes)model.complied(optimizer= optimizer,

</div><span class="text_page_counter">Trang 16</span><div class="page_container" data-page="16">

2.2. NEURAL NETWORK STRUCTURE 13

loss= 'categorical_crossentropy',metrics=['accuracy'])Optimizer

Adam is a combination of Momentum and RMSprop. During training with thistype of data, we have observed that Adam provides higher accuracy compared to otheroptimization algorithms.

Learning rate (custom): 0.0003 because we find that this value is the best fit for data.

</div><span class="text_page_counter">Trang 17</span><div class="page_container" data-page="17">

Chapter 3Analysis

In this section, we will discuss the execution and flow of the project, focusing on the facerecognition application. The application offers two modes: image mode and real- time cam-era mode. In image mode, when the ”input image” parameter is received, the applicationautomatically retrieves the image file by accessing the provided file path. In real-time cameramode, the application activates the device’s camera to capture live video frames for process-ing.

The image or video frames are then passed into the f face info class, which contains eral sub-classes responsible for different tasks. These sub-classes include vietnamese detection,emotion detection, gender detection, and age prediction. Each sub-class loads itscorresponding model and provides predictions based on the input data.

# instanciar detectores

age_detector=f_my_age Age_Model().

gender_detector= f_my_gender Gender_Model().

race_detector=f_my_race Race_Model().

emotion_detector=f_emotion_detection.predict_emotions()14

</div><span class="text_page_counter">Trang 18</span><div class="page_container" data-page="18">

Within thef face infoclass, there is a face cropping function that extracts the facial regionfrom the input image. This step ensures that the processed image aligns with the face imagesused during model training, enhancing the accuracy of predictions.

Once the image is processed and the face is cropped, the sub-classes, including the trained models, perform their respective tasks. For example, the vietnamese detectionsub-class predicts the ethnicity of the detected face, while theemotion detectionsub-classpredicts the emotional state, and thegender detectionsub-class predicts the gender. Theage prediction sub-class estimates the age range of the individual.

pre-However, it’s important to note that the age, emotion, and gender models used in ourapplication rely on pre-trained weights. Therefore, the input to these models must conformto their specific requirements to generate accurate predictions.

Additionally, the application includes a box bound function, which creates a boundingbox around the detected face. This bounding box serves as a visual indicator on the imageor real-time camera feed, highlighting the recognized object.

box=data_face["bbx_frontal_face"]if len(box)== 0:

x0,y0,x1,y1=boximg=cv2 rectangle(img,.

(x0,y0),(x1,y1),( ,0 255 0 2, ), );thickness= 1

</div>