final project introduction to digital image processing

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.13 MB, 18 trang )

<span class="text_page_counter">Trang 1</span><div class="page_container" data-page="1">

VIETNAM GENERAL CONFEDERATION OF LABOR

<b>TON DUC THONG UNIVERSITYFACULTY OF INFORMATION TECHNOLOGY</b>

</div><span class="text_page_counter">Trang 2</span><div class="page_container" data-page="2">

2.1.3 Find the edges...9

2.1.4 Crop the chessboard...11

2.2 Transform image board to text board...14

3 Conclusion...16

4 Reference...17

</div><span class="text_page_counter">Trang 3</span><div class="page_container" data-page="3">

<b>1 Introduction</b>

<b>1.1 Overview</b>

Computer vision has revolutionized the way we approach problems in manydifferent fields. From autonomous vehicles to medical imaging, computer visionhas provided new and innovative solutions that have improved our lives incountless ways. One such application of computer vision is in the recognition andsolution of sudoku puzzles.

Sudoku is a popular puzzle game that requires the player to fill a 9x9 grid withnumbers such that each column, row, and 3x3 sub-grid contains all the digits from1 to 9. Solving sudoku puzzles requires a combination of logic and patience, andit can be a time-consuming and challenging task for even the most experiencedplayers.

The idea of using computer vision to solve sudoku puzzles is not new, but withthe advancements in computer vision techniques and the availability of powerfulcomputing resources, it is now possible to develop systems that can recognize andsolve sudoku puzzles with a high degree of accuracy. In this thesis, we willexplore the use of computer vision techniques to recognize and solve sudokupuzzles.

</div><span class="text_page_counter">Trang 4</span><div class="page_container" data-page="4">

this research is to develop a robust and efficient computer vision system that canaccurately recognize sudoku puzzles.

In conclusion, this thesis will contribute to the field of computer vision bydemonstrating the feasibility of using computer vision techniques to recognizeand sudoku puzzles. The results of this research will provide valuable insightsinto the challenges of recognizing and solving sudoku puzzles through computervision and the effectiveness of different approaches in overcoming thesechallenges. The end goal is to develop a system that can accurately recognize andsolve sudoku puzzles, making the process of solving sudoku puzzles faster, moreefficient, and more accessible to everyone.

<b>1.2 Problem formulation</b>

The input to the problem we have is a 9x9 sudoku chessboard image.

</div><span class="text_page_counter">Trang 5</span><div class="page_container" data-page="5">

We need to partition the chessboard image to obtain a binary image representingthe contours and numbers belonging to the chessboard. This result will be savedto the file “output.txt”

</div><span class="text_page_counter">Trang 6</span><div class="page_container" data-page="6">

Next we need to identify the cell containing the number or the empty cell, then convert the chessboard to text. Cells with numbers are marked with an “X” and other cells are marked with spaces. The results are saved to the file “output.txt”. Examples are as follows:

<b>2 Method</b>

<b>2.1 Chessboard segmentation</b>

The goal of this section is to crop the binary image of the crop of the chessboard.At the same time, it is necessary to bring it to the front view, serving to crop eachsquare of the chessboard and predict numbers later.

</div><span class="text_page_counter">Trang 7</span><div class="page_container" data-page="7">

<b>2.1.1 Preprocess</b>

To remove noise and increase accuracy, we will blur the input image with Gaussian Blur [1].

<small>Figure 3: Bluring image</small>

Next we will thresholding the image to bring it to binary. The algorithm used is adaptive Gaussian|Mean thresholding.

</div><span class="text_page_counter">Trang 8</span><div class="page_container" data-page="8">

What we are interested in are the lines and numbers on the chessboard. So we willconvert them to white pixels by inverting the image.

<small>Figure 5: Inverted image</small>

To make sure the lines are not broken, we will use the dilated morphologicaltransformation [3]

</div><span class="text_page_counter">Trang 9</span><div class="page_container" data-page="9">

We are interested in the main lines (bold outline around), here they are clear, cango to the next part.

<b>2.1.2 Detect outerbox.</b>

We will find out the main contours of the chessboard. The idea is to use the FloodFill algorithm, to find the connected component with the largest size. Specifically,we go each pixel of the chessboard, color the component connected to that pixel,and find the pixel position with the largest connected component size. That is themain contour of the chessboard. We'll call it "outerbox"

</div><span class="text_page_counter">Trang 10</span><div class="page_container" data-page="10">

Apply the erosion transformation to return the outerbox to its original state (before expanding).

<small>Figure 8: Erosed outerbox</small>

<b>2.1.3 Find the edges</b>

Apply Hough Transform to detect straight lines in the image.10

</div><span class="text_page_counter">Trang 11</span><div class="page_container" data-page="11">

However, this transformation produces many solutions for a straight line inpractice. So we need to cluster them to agree on 8 lines.

The result of the Hough Transform is straight lines with each line consisting of 2components: (r,θ). Where r is the distance from the position pixel (0, 0) to thatline, and θ is the angle made by that line to the horizontal axis.

</div><span class="text_page_counter">Trang 12</span><div class="page_container" data-page="12">

clustering, we need to normalize r, θ to standard values in the interval [0,1] byMin-Max Normalization technique, to get more accurate results. (The reason isthat since r is measured in pixels, the value is much larger than measured inradians)

After clustering is complete, we take the center of 8 clusters to make 8 lines tofind.

<small>Figure 11: 8 merged lines</small>

<b>2.1.4 Crop the chessboard.</b>

Next, we take out the 2 outermost horizontal and vertical borders of thechessboard, to find the 4 corners of the chessboard. The extraction of twohorizontal and vertical boundaries uses basic computational logic.

</div><span class="text_page_counter">Trang 13</span><div class="page_container" data-page="13">

Then find their 4 intersection points, using basic geometric calculations:

</div><span class="text_page_counter">Trang 14</span><div class="page_container" data-page="14">

Finally, threshold it to return the binary image. Then save the result.

<small>Figure 15: Binary image of segmented sudoku chessboard</small>

</div><span class="text_page_counter">Trang 15</span><div class="page_container" data-page="15">

<b>2.2 Transform image board to text board.</b>

Take the result from the previous section, resize it to 252 x 252 and cut out each 28 x 28 cell.

<small>Figure 16: Cropped cells</small>

We can rely on the number of white pixels greater than some threshold todetermine if a cell contains a number. However, the border of these cells containsa white border:

</div><span class="text_page_counter">Trang 16</span><div class="page_container" data-page="16">

So, we will only crop the inside of the cells.

<small>Figure 18: Inner cropped cells</small>

Then find a suitable threshold (statistically) to determine whether the cell has anumber or not. For example, in this figure I choose the threshold equal to 30. That

</div><span class="text_page_counter">Trang 17</span><div class="page_container" data-page="17">

is, if the image has more than 30 white pixels, it contains numbers, otherwise itdoes not contain numbers.

Finally, represent the result as text and then write it to a file.output.txt

<b>3 Conclusion</b>

In conclusion, this thesis has presented a comprehensive study on the use ofcomputer vision techniques to recognize sudoku puzzles. The proposed methodutilized a combination of image processing, and machine learning algorithms toaccurately detect and recognize the digits in a sudoku puzzle.

This study opens up a number of potential avenues for further research, such asimproving the recognition accuracy, increasing the robustness of the method todifferent types of distortions, and incorporating more advanced computer vision

</div><span class="text_page_counter">Trang 18</span><div class="page_container" data-page="18">

<b>4 Reference</b>

[1] K. Kaur and S. Kaur, "Gaussian Blur for Image Smoothing and Noise Reduction", Journal of Advanced Research in Dynamical and Control Systems, vol. 9, no. 2, pp. 782-787, 2017.

[2] Huang, Zhi-Kai, and Kwok-Wing Chau. "A new image thresholding method based on Gaussian mixture model." Applied mathematics and computation 205.2 (2008): 899-907.

[3] [4]

[5] Ahmed, M., Seraj, R., & Islam, S. M. S. (2020). The k-means algorithm: Acomprehensive survey and performance evaluation. Electronics 9, (8), 1295.

</div>