nghiên cứu phát triển một số kĩ thuật khử nhiễu ảnh dựa trên biểu diễn thưa và mô hình hồi quy tuyến tính

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (15.07 MB, 172 trang )

MẪU 14/KHCN

(Ban hành kèm theo Quyết định số 3839 /QĐ-ĐHQGHN ngày 24 tháng10 năm 2014
của Giám đốc Đại học Quốc gia Hà Nội)

ĐẠI HỌC QUỐC GIA HÀ NỘI

BÁO CÁO TỔNG KẾT
KẾT QUẢ THỰC HIỆN ĐỀ TÀI KH&CN
CẤP ĐẠI HỌC QUỐC GIA

Tên đề tài: Nghiên cứu phát triển một số kĩ thuật khử nhiễu ảnh dựa
trên biểu diễn thưa và mô hình hồi quy tuyến tính
Mã số đề tài: QG18.04
Chủ nhiệm đề tài: TS. Đỗ Thanh Hà

Hà Nội, 2020

PHẦN I. THÔNG TIN CHUNG
1.1. Tên đề tài: Nghiên cứu phát triển một số kĩ thuật khử nhiễu ảnh dựa trên biểu diễn thưa và mơ
hình hồi quy tuyến tính
1.2. Mã số: QG18.04
1.3. Danh sách chủ trì, thành viên tham gia thực hiện đề tài
TT

Chức danh, học vị, họ và tên

Đơn vị cơng tác

Vai trị thực hiện đề tài

1

Đỗ Thanh Hà, TS

ĐHKHTN

Nghiên cứu chính
(Chủ nhiệm đề tài)

2

Vũ Tiến Dũng, TS

ĐHKHTN

Thư ký khoa học

3

Nguyễn Thị Bích Thuỷ, TS

ĐHKHTN

Thành viên

4

Nguyễn Thị Minh Huyền, TS

ĐHKHTN

Thành viên

5

Hà Mỹ Linh, Nghiên cứu sinh

ĐHKHTN

Thành viên

6

Trần Thị Huyền, Cử nhân

Trường THPT Lý Thái Tổ, Hà Nội

Thành viên

1.4. Đơn vị chủ trì:
1.5. Thời gian thực hiện:
1.5.1. Theo hợp đồng: 24 tháng từ 03 tháng 01 năm 2018 đến 03 tháng 01 năm 2020
1.5.2. Gia hạn (nếu có):

đề tài không gia hạn

1.5.3. Thực hiện thực tế: 24 tháng từ 03 tháng 01 năm 2018 đến 03 tháng 01 năm 2020
1.6. Những thay đổi so với thuyết minh ban đầu (nếu có):
(Về mục tiêu, nội dung, phương pháp, kết quả nghiên cứu và tổ chức thực hiện; Nguyên nhân; Ý

kiến của Cơ quan quản lý)
1.7. Tổng kinh phí được phê duyệt của đề tài: 300 triệu đồng.
PHẦN II. TỔNG QUAN KẾT QUẢ NGHIÊN CỨU
Viết theo cấu trúc một bài báo khoa học tổng quan từ 6-15 trang (báo cáo này sẽ được đăng trên
tạp chí khoa học ĐHQGHN sau khi đề tài được nghiệm thu), nội dung gồm các phần:
1. Đặt vấn đề
Một trong những vấn đề quan trọng, thách thức đã và đang thu hút nhiều sự quan tâm của
các nhà khoa học trong cộng đồng xử lý ảnh và học máy đó là khử nhiễu. Mục đích là nâng cao chất
lượng ảnh bằng cách loại bỏ đi các nhiễu trên ảnh. Thực tế nhiễu trên ảnh thường xuất hiện trong
q trình qt ảnh, số hố ảnh hoặc như trong hệ thống xe ơtơ khơng người lái thì nhiễu trên ảnh
thu được từ camera thường xuất hiện do điều kiện mơi trường bên ngồi tác động: thời tiết có mưa
hoặc sương mù; máy ảnh trên xe bị bụi; hay hình ảnh thu được từ camera bị nhoè do quá trình
1

chuyển động của xe trên đường v.v. Nhiễu sẽ ảnh hưởng trực tiếp đến hiệu năng của các ứng dụng
như nhận dạng mẫu, trích chọn đặc trưng, phân vùng đối tượng trên ảnh, v.v. Do đó, khử nhiễu ảnh
đóng một vai trò quan trọng và thường được xem là bước tiền xử lý không thể thiếu trong các hệ
thống nhận dạng và các ứng dụng cụ thể như xe không người lái. Nhiều phương pháp giảm nhiễu đã
được nghiên cứu và cơng bố, tuy nhiên vấn đề này vẫn cịn là một thách thức, đặc biệt khi ảnh thu
được trong điều kiện xấu với độ nhiễu cao.
Nhìn chung, các phương pháp khử nhiễu được chia thành hai nhóm: nhóm các phương pháp
lọc miền khơng gian và nhóm các phương pháp lọc miền tần số. Các phương pháp lọc miền không
gian thường cho kết quả tốt, tuy nhiên các phương pháp này có nhược điểm đó là độ sắc nét của
cạnh trên ảnh sẽ bị mờ đi sau quá trình xử lý. Một số bộ lọc như Wiener thì lại cần thông tin về kiểu
nhiễu trên ảnh trong khi thông tin này thường khơng biết hoặc rất khó để ước lượng. Gần đây một
số bộ lọc mới như bộ lọc trung vị có trọng số, bộ lọc RCRS (Rank Conditioned Rank Selection) đã
được phát triển nhằm khắc phục nhược điểm của các bộ lọc trước đó. Tuy nhiên các bộ lọc mới
thường chỉ cho kết quả tốt với một số kiểu nhiễu nhất định và không một phương pháp nào đủ hiệu
quả với những kiểu nhiễu xuất hiện do quá trình in ấn, sao chép, quét ảnh hay do quá trình thu ảnh

trên camera của xe tự hành.
Gần đây, các phương pháp lọc miền tần số như biến đổi thưa và phân tích đa phân giải cũng
cho kết quả khử nhiễu khá tốt. Ngồi ra việc ứng dụng thành cơng biến đổi thưa để xoá nhiễu trên
các cạnh trong ảnh đã chỉ ra rằng biến đổi thưa có thể được sử dụng hiệu quả cho mục đích khử
nhiễu. Do biến đổi thưa biểu diễn ảnh dưới dạng tổ hợp tuyến tính các hàm cơ bản trên từ điển cho
trước, nên hiệu năng của các phương pháp này phụ thuộc vào hai yếu tố: yếu tố thứ nhất là hiểu biết
về kiểu ảnh nhằm đưa ra lựa chọn phù hợp cho các hàm cơ bản và yếu tố thứ hai là kiểu nhiễu trên
ảnh. Bên cạnh đó, các biến đổi thưa như curvelet, contourlet, wedgelet, v.v. là những từ điển được
định nghĩa trước dựa trên dạng ảnh cụ thể nên chúng không thể áp dụng được cho một kiểu ảnh mới
với mô hình nhiễu bất kì. Để khắc phục những hạn chế này, đề tài tập trung nghiên cứu ứng dụng
của biểu diễn thưa để khử nhiễu ảnh. Theo cách tiếp cận mới, các hàm cơ bản được xác định thông
qua quá trình học trên các ảnh nhiễu nên sẽ có khả năng thích nghi tốt hơn với các đặc tính của ảnh
cũng như các kiểu nhiễu tồn tại trên ảnh.
Kết quả từ đề tài sẽ là đóng góp mới cả về lý thuyết lẫn ứng dụng trong lĩnh vực xử lý ảnh.
Về mặt lý thuyết, thuật toán mới đề xuất cho phép giải bài toán khử nhiễu trên nhiều kiểu ảnh với
các kiểu nhiễu khác nhau. Thuật toán này hướng tới góp phần cải thiện đáng kể các kết quả khử
nhiễu của các thuật tốn trước đây. Ngồi ra việc xây dựng công thức xác định giá trị tham số năng
lượng nhiễu dựa theo mơ hình hồi quy tuyến tính cũng là một hướng nghiên cứu hoàn toàn mới
trong cộng đồng xử lý ảnh. Về mặt ứng dụng, đề tài sẽ xây dựng một chương trình phần mềm nhằm
2

sinh một cơ sở dữ liệu gồm các ảnh nhiễu, đóng vai trị là dữ liệu chuẩn cho việc kiểm định chất
lượng của các phương pháp khử nhiễu.
2. Mục tiêu
Để đạt được mục tiêu là đưa ra một phương pháp khử nhiễu mới hiệu quả, có khả năng ứng
dụng thực tế cao đặc biệt là có khả năng xử lý trong thời gian thực, tác giả đề xuất đề tài đã tìm
hiểu, nghiên cứu thế mạnh của các cách tiếp cận khác nhau. Bên cạnh đó, với hơn 7 năm kinh
nghiệm nghiên cứu về ứng dụng của biểu diễn thưa trong các bài toán xử lý ảnh, tác giả đề xuất đề
tài nhận thấy cách tiếp cận sử dụng biểu diễn thưa là một trong những hướng nghiên cứu tiềm năng

cho kết quả khử nhiễu tốt hơn các thuật toán khử nhiễu tốt nhất hiện nay như curvelet hay học sâu.
Ngồi ra thời gian tính tốn nhanh cũng là một lợi thế của biểu diễn thưa so với các phương pháp sử
dụng học sâu khi ứng dụng trong các bài toán thực như bài toán xe tự hành.
Mục tiêu chung:
-

Đề xuất một phương pháp khử nhiễu ảnh mới sử dụng biểu diễn thưa. Điểm mạnh của phương
pháp mới so với các cách tiếp cận khác đó là nó khơng cần bất kì giả thiết nào về kiểu nhiễu trên
ảnh và nó có thể được áp dụng một cách hiệu quả với ảnh thu được từ camera gắn trên xe tự
hành. Hơn nữa năng lượng nhiễu trên ảnh có thể được ước lượng sử dụng mơ hình hồi quy
tuyến tính. Đây là một hướng nghiên cứu hoàn toàn mới trong lĩnh vực khử nhiễu.

-

Xây dựng một chương trình phần mềm nhằm sinh một cơ sở dữ liệu gồm các ảnh nhiễu, đóng
vai trị là dữ liệu chuẩn cho việc kiểm định chất lượng của các phương pháp khử nhiễu cũng như
các ứng dụng xử lý ảnh khác.

Mục tiêu cụ thể:
-

Nghiên cứu, phân tích nhằm đánh giá ưu và nhược điểm của các phương pháp khử nhiễu sử
dụng từ điển định nghĩa sẵn như curvelet, coutourlet, wedgelet, v.v.

-

Nghiên cứu các mơ hình tạo nhiễu trên ảnh. Mục đích để hiểu rõ các phương thức phân bố
nhiễu, sự ảnh hưởng của các tham số và hàm lan truyền nhiễu trong việc tạo nhiễu. Trên cơ sở
đó hướng tới tìm ra một cơng thức cho phép ước lượng chính xác tham số năng lượng nhiễu.
Ngồi ra, qua q trình nghiên cứu tìm hiểu này, đề tài sẽ xây dựng một chương trình phần mềm

cho phép tạo một cơ sở dữ liệu gồm các ảnh nhiễu.

-

Nghiên cứu và tiến hành các thực nghiệm nhằm so sánh hiệu quả của các phương pháp tìm biểu
diễn thưa và các phương pháp xây dựng từ điển học. Xác định bài toán khử nhiễu là bài toán
3

BPDN (Basic Pursuit Denoising) trong đó từ điển là từ điển học, được xây dựng từ các mảnh
của ảnh nhiễu.
-

Sử dụng giá trị tham số năng lượng nhiễu ước lượng được ở trên và từ điển học nhằm đề xuất
thuật tốn khử nhiễu mới. Để giảm thời gian tính tốn trong các bài tốn tìm nghiệm thưa cũng
như giảm kích thước từ điển, phương pháp mới đề xuất sẽ làm việc trên các mảnh nhiễu của ảnh
thay vì làm việc trên tồn bộ ảnh nhiễu. Do đó việc khơi phục lại ảnh từ những mảnh sau khi
được khử nhiễu cũng cần được nghiên cứu.

3. Phương pháp nghiên cứu
-

Nghiên cứu tính chất của biểu diễn thưa; các phương pháp tìm biểu diễn thưa của ảnh gồm các
thuật toán greedy matching pursuits (Matching Pursuit (MP), Orthogonal-MP, Weak-MP),
Basis Pursuit (Iterative Reweighted Least Squares (IRLS), linear-programming). So sánh thực
nghiệm về độ chính xác, độ thưa của nghiệm thu được từ các phương pháp này.

-

Nghiên cứu thuật toán xây dựng từ điển như thuật toán K-SVD, MOD (Method of Optimal

Directions), ODL (Online Dictionary Learning) và RLS-DLA (the Recursive Least Squares
Dictionary Learning Algorithm). Chạy thực nghiệm trên bộ dữ liệu chuẩn để tìm ra thuật tốn
học từ điển tốt.

-

Nghiên cứu một số mơ hình tạo nhiễu như Kanungo, Noise Spread để có cơ sở cho việc sinh bộ
ước lượng năng lượng nhiễu.

-

Kiểm thử thuật toán trên các bộ cơ sở dữ liệu khác nhau như GREC, DIBCO, Tobacco800, và
bộ dữ liệu ảnh thực tế. So sánh hiệu quả của phương pháp đề xuất với một số phương pháp
nghiên cứu khác như bộ lọc trung vị, lọc hình thái, và curvelet. Các phương pháp được so sánh
sử dụng một số độ đo tương tự như SSIM (Structural Similarity Measure) hay MSE (Mean
Square Error).

Vấn đề sử dụng biểu diễn thưa trên từ điển học kết hợp với việc xác định tự động hàm năng lượng
nhiễu cho bài toán khử nhiễu ảnh là một vấn đề hoàn toàn mới. Do đó, các kết quả khoa học của đề
tài sẽ là một đóng góp tốt cho cộng đồng nghiên cứu về khử nhiễu ảnh.
4. Tổng kết kết quả nghiên cứu
Trong khuôn khổ nghiên cứu của đề tài này, chúng tôi đã thực hiện các nghiên cứu sau:
a. Nghiên cứu và cài đặt các phương pháp tìm nghiệm thưa. Chỉ ra bằng cơ sở toán học cũng như
thực nghiệm việc thay hàm mục tiêu được định nghĩa theo chuẩn l_{0} thành các chuẩn khác
như chuẩn l_{1} vẫn đảm bảo tính thưa của nghiệm thu được. Kết quả thực nghiệm đã đã phân
4

tích và chi tiết trong báo cáo kết quả cài đạt, kiểm thử và so sánh các phương pháp tìm biểu diễn
thưa của nghiệm.

b. Nghiên cứu và cài đặt các thuật toán xây dựng từ điển học như MOD, K-SVD v.v. Các từ điển
học thu được từ các thuật toán khác nhau sẽ được áp dụng trong bài toán khử nhiễu nhằm so
sánh và tìm ra thuật tốn hiệu quả nhất với giả thuyết rằng năng lượng nhiễu cố định và đã biết.
Ưu và nhược điểm về độ tốt của từ điển và thời gian tính tốn cũng được phân tích chi tiết trong
báo cáo về so sánh các thuật toán xây dựng từ điển học
Việc ứng dụng biểu diễn thưa và từ điển học trên ảnh phong cảnh cũng được kiểm nghiệm và
kết quả đăng trên 01 bài báo quốc tế có phản biện
c. Nghiên cứu các mơ hình tạo ảnh nhiễu, các hàm năng lượng nhiễu như Kanungo, Noise Spread
v.v; trên cơ sở đó đề xuất một cơng thức cho phép ước lượng chính xác tham số của giá trị năng
lượng nhiễu dựa trên phân tích tương quan chéo chuẩn hố. Ngồi ra đề tài cũng đề xuất một
phương pháp mới để giảm nhiễu trên ảnh trong đó sử dụng công thức ước lượng tham số của giá
trị năng lượng nhiễu và từ điển học. Các kết quả thực nghiệm được đánh giá trên bộ cơ sở dữ
liệu đạt chuẩn quốc tế và có so sánh hiệu quả của phương pháp đề xuất với một số phương pháp
khử nhiễu được cho là tốt nhất hiện nay như phương pháp sử dụng biến đổi curvelet. Đề tài
cũng xây dựng cơ sở dữ liệu gồm những ảnh nhiễu thực tế thu được qua quá trình sao chép
(scan) và kiểm thử phương pháp mới trên cơ sở dữ liệu này. Các kết quả từ q trình nghiên cứu
này đã được cơng bố trên 01 tại chí quốc tế (SCIE)
d. Việc ứng dụng học sâu giải quyết các bài toán trong lĩnh vực nhìn máy đã và đang tạo ra kết quả
rất đáng ghi nhận. Tuy nhiên phương pháp nghiên cứu dựa trên học sâu cũng có những hạn chế
bên cạnh yêu cầu về số lượng dữ liệu nhiều và phong phú. Do đó trong q trình kiểm nghiệm
nghiên cứu trên ảnh tài liệu, đề tài cũng tập trung tìm hiểu, đánh giá khả năng “hiểu” và “nhớ”
của mạng neural tích chập trong bài tốn nhận dạng kí tự trong văn bản. Kết quả cho thấy mạng
CNN chỉ có khả năng nhớ và khơng có khả năng hiểu. Kết quả này đã được cơng bố trên 01 hội
thảo quốc tế có phản biện
Tóm lại, đề tài đã đạt được các kết quả sau:
-

Về công bố khoa học: đề tài đã công bố 01 báo cáo quốc tế (SCIE), 02 báo cáo quốc tế (là báo
cáo được chọn và đăng trên CCIS thuộc hệ thống Scopus)

5

-

Về cơ sở dữ liệu và sản phẩm phần mềm: đã xây dựng được cơ sở dữ liệu trên 500 ảnh nhiễu và
01 phần mềm khử nhiễu ảnh

-

Về đào tạo: 01 thạc sĩ đã bảo vệ và 01 NCS đã trúng tuyển (năm 2018)

5. Đánh giá về các kết quả đã đạt được và kết luận
- Về số lượng sản phẩm: đáp ứng yêu cầu với 01 bài ISI, 01 bài hội nghị quốc tế có phản biện, 01
bài báo trong nước được thay bằng 01 bài bài hội nghị quốc tế có phản biện. Đề tài đã xây dựng
thành công cơ sở dữ liệu ảnh nhiễu và một phần mềm khử nhiễu ảnh. Về mặt đào tạo, đề tài đã hỗ
trợ 01 thạc sĩ bảo vệ thành công với đề tài nghiên cứu liên quan trực tiếp đến nội dung nghiên cứu
của đề tài và hỗ trợ 01 NCS trong quá trình học tập và nghiên cứu tại Trường Đại học Khoa học Tự
nhiên
- Về chất lượng sản phẩm: đáp ứng yêu cầu sản phẩm đã đăng kí
- Về yêu cầu công bố kết quả, ghi nhận địa chỉ và tài trợ của ĐHQG Hà Nội: các công bố đề có ghi
nhận và đạt yêu cầu

6. Tóm tắt kết quả (tiếng Việt và tiếng Anh)
Tiếng việt
Đề tài nghiên cứu, đánh giá hiệu năng của các phương pháp tìm biểu diễn thưa và các thuật toán
xây dựng từ điển học. Trên cơ sở lý thuyết của biễu diễn thưa, đề tài tìm hiểu bài tốn BPDN (basis
pursuit denoising) từ đó xây dựng thuật toán khử nhiễu ảnh mới sử dụng với từ điển học, tín hiệu
vào là tập các mảnh ảnh nhiễu. Bên cạnh đó bài tốn BPDN cũng được phát triển theo hướng xác
định tự động mơ hình nhiễu trên ảnh sử dụng mơ hình hồi quy tuyến tính. Các kết quả thu được về

mặt nghiên cứu của đề tài đã được kiểm định trên các bộ cơ sở dữ liệu công khai và bộ cơ sở dữ liệu
do đề tài tự xây dựng nhằm khẳng định phương pháp mới có thể áp dụng tốt với bất kì kiểu nhiễu
nào trên ảnh.
Bên cạnh kết quả về khử nhiễu, trong q trình thực hiện đề tài, nhóm nghiên cứu cũng kiểm thử
hiệu quả của việc ứng dụng biểu diễn thưa cho bài tốn trình xuất thơng tin vùng chứa văn bản trên
ảnh phong cảnh. Kết quả nghiên cứu đã chỉ ra biểu diễn thưa trên từ điển học xây dựng từ tập các
đặc tả của ảnh cho kết quả tốt hơn các phương pháp trích xuất thơng tin trên ảnh chỉ dựa vào đặc tả
ảnh.
Việc đánh giá hiệu quả các các phương pháp học sâu trong bài toán về ảnh cũng được nhóm nghiên
cứu quan tâm, tìm hiểu. Các thực nghiệm đánh giá hiệu năng của mạng neural tích chập (CNN) cho
6

bài tốn nhận dạng kí tự đã chỉ ra nhược điểm của mạng CNN trong quá trình hiểu ý nghĩa của ảnh.
CNN có khả năng ghi nhớ tốt nhưng khơng có khả năng hiểu tốt, do đó đề tài đề xuất thay vì xây
dựng mạng CNN sử dụng các ảnh nên xây dựng mơ hình mạng học sâu với đầu vào của quá trình
học là ngữ nghĩa trên ảnh.

Tiếng anh
The project researches the sparse representation over the learned dictionary. Base on the theory of
sparsity, the projects developed the BPDN (basic pursuit denoising) algorithm to de-noise the
images without information on the energy of noise. To do that, the project proposed the new
algorithm to estimate the energy of noise automatically using the normalized cross-correlation and
linear regression model. The experiments were done on some public databases and a self-built
database present that the proposed method is over-performing compared to the state-of-the-art.
Besides, the project also studies how sparse representation over a learned dictionary can be used in
text detection over scene images. The dictionary now is not the dictionary built on patches of
images, but build directly on local descriptors of images. Therefore the dictionary can keep the
invariance characteristic of local descriptors under some linear transformation and improve the
performance of text detection.

The project also estimates the performance of a deep learning network (CNN network) on character
recognition. From this study, the project indicates that CNN just can remember well, but can not
understand the meaning of images. To take advantage of the deep learning network, we propose to
use descriptors of images instead of images to train the network.

PHẦN III. SẢN PHẨM, CÔNG BỐ VÀ KẾT QUẢ ĐÀO TẠO CỦA ĐỀ TÀI
3.1. Kết quả nghiên cứu
TT

Tên sản phẩm

Yêu cầu khoa học hoặc/và chỉ tiêu kinh tế - kỹ thuật
Đăng ký

Đạt được

1

Cơ sở dữ liệu ảnh nhiễu

Kích thước cơ sở dữ liệu cỡ Đạt được các yêu cầu như
lớn nhằm phục vụ việc kiểm đã đăng kí
định chất lượng của các kĩ
thuật khử nhiễu.

2

Phần mềm khử nhiễu ảnh dựa Phần mềm có khả năng khử Đạt được các yêu cầu như
trên biểu diễn thưa và mơ hình nhiều kiểu nhiễu ảnh mà đã đăng kí
hồi quy tuyến tính

khơng cần bất kì giả thiết
nào về kiểu nhiễu
7

3.2. Hình thức, cấp độ cơng bố kết quả
Ghi địa chỉ
Tình trạng
và cảm ơn
(Đã in/ chấp nhận in/ đã nộp đơn/ sự tài trợ
Sản phẩm
TT
đã được chấp nhận đơn hợp lệ/ đã
của
được cấp giấy xác nhận SHTT/ ĐHQGHN
xác nhận sử dụng sản phẩm)
đúng quy
định
1 Cơng trình cơng bố trên tạp chí khoa học quốc tế theo hệ thống ISI/Scopus
1.1 T.H. Do, O. Ramos Terades, S. Đã in
Đã ghi địa
Tabbone,
DSD:
document ( chỉ VNU và
sparse‐based
denoising 018-0714-3)
đã cảm ơn
algorithm, Pattern Analysis and
đề tài
Applications, Vol 22, pp. 177186, 2019

2
2.1
3
3.1
4
4.1

Đánh
giá
chung
(Đạt,
không
đạt)
Đạt

Sách chuyên khảo được xuất bản hoặc ký hợp đồng xuất bản
Đăng ký sở hữu trí tuệ

Bài báo quốc tế không thuộc hệ thống ISI/Scopus
Thanh-Ha Do, Thi Minh Huyen Đã in
Đã ghi địa
Nguyen, K.C. Santosh, Text
chỉ VNU và
Extraction
Using
Sparse (Communications in Computer đã cảm ơn
Representation over Learning and Information Science thuộc đề tài
Dictionary, Communications in hệ thống Scopus)
Computer
and

Information
Science (No 1037), Recent
Trends in Image Processing and
Pattern Recognition, pp. 3-13,
2019
4.2 Thanh-Ha Do, Nguyen T. V. Đã in
Đã ghi địa
Anh, Nguyen T. Dat, K.C.
chỉ VNU và
Santosh, Can We Understand (Communications in Computer đã cảm ơn
Image
Semantics
from and Information Science thuộc đề tài
Conventional Neural Networks, hệ thống Scopus)
Communications in Computer
and Information Science (No
1035), Recent Trends in Image
Processing
and
Pattern
Recognition, pp. 509-519, 2019
5 Bài báo trên các tạp chí khoa học của ĐHQGHN, tạp chí khoa học chuyên ngành
quốc gia hoặc báo cáo khoa học đăng trong kỷ yếu hội nghị quốc tế
5.1
6 Báo cáo khoa học kiến nghị, tư vấn chính sách theo đặt hàng của đơn vị sử dụng
6.1
7 Kết quả dự kiến được ứng dụng tại các cơ quan hoạch định chính sách hoặc cơ sở
ứng dụng KH&CN
7.1

Đạt

Đạt

8

Ghi chú:
- Cột sản phẩm khoa học công nghệ: Liệt kê các thông tin các sản phẩm KHCN theo thứ tự
cơng trình, mã cơng trình đăng tạp chí/sách chun khảo (DOI), loại tạp chí ISI/Scopus>

3.3. Kết quả đào tạo
Thời gian và kinh phí
TT
Họ và tên
tham gia đề tài
(số tháng/số tiền)
Nghiên cứu sinh
1 Hà Mỹ Linh
~3 tháng /21,450,000
Học viên cao học
1 Trần Thị
~ 2 tháng /14,950,000
Huyền

Cơng trình cơng bố liên quan
(Sản phẩm KHCN, luận án, luận
văn)

Đã bảo vệ

Đã bảo vệ

PHẦN IV. TỔNG HỢP KẾT QUẢ CÁC SẢN PHẨM KH&CN VÀ ĐÀO TẠO CỦA ĐỀ TÀI
TT
Sản phẩm
Số lượng Số lượng đã
đăng ký
hoàn thành
1 Bài báo cơng bố trên tạp chí khoa học quốc tế theo hệ thống 01
01
ISI/Scopus
2 Sách chuyên khảo được xuất bản hoặc ký hợp đồng xuất
bản
3 Đăng ký sở hữu trí tuệ
4 Bài báo quốc tế không thuộc hệ thống ISI/Scopus
01
02 bài đăng
trên CCIS thuộc hệ
thống Scopus
5 Số lượng bài báo trên các tạp chí khoa học của ĐHQGHN,
01
00
tạp chí khoa học chuyên ngành quốc gia hoặc báo cáo khoa
học đăng trong kỷ yếu hội nghị quốc tế
6 Báo cáo khoa học kiến nghị, tư vấn chính sách theo đặt
hàng của đơn vị sử dụng
7 Kết quả dự kiến được ứng dụng tại các cơ quan hoạch định
chính sách hoặc cơ sở ứng dụng KH&CN

8 Đào tạo/hỗ trợ đào tạo NCS
01
01
9 Đào tạo thạc sĩ
01
01
PHẦN V. TÌNH HÌNH SỬ DỤNG KINH PHÍ
TT
A
1
2
3
4
5

Nội dung chi
Chi phí trực tiếp
Th khốn chun mơn
Ngun, nhiên vật liệu, cây con..
Thiết bị, dụng cụ
Cơng tác phí
Dịch vụ th ngồi

Kinh phí
được duyệt
(triệu đồng)
282.5
245.882

Kinh phí

thực hiện
(triệu đồng)
282.5
245.882

Ghi chú

9

TT
6
7
8
B
1
2

Nội dung chi
Hội nghị, Hội thảo, kiểm tra tiến độ, nghiệm
thu
In ấn, Văn phịng phẩm
Chi phí khác
Chi phí gián tiếp
Quản lý phí
Chi phí điện, nước
Tổng số

Kinh phí
được duyệt

(triệu đồng)
34.8

Kinh phí
thực hiện
(triệu đồng)
34.8

1.818

1.818

17.5
17.5

17.5
17.5

300

300

Ghi chú

PHẦN V. KIẾN NGHỊ (về phát triển các kết quả nghiên cứu của đề tài; về quản lý, tổ chức thực
hiện ở các cấp)
Đề nghị ĐHQG đánh giá nghiệm thu đề tài và xem xét chấp nhận sản phẩm 01 bài báo đăng trên kỉ
yếu hội nghị quốc tế (có phản biện, có thuộc hệ thống Scopus) được thay thế cho sản phẩm là 01 bài
báo trong nước. Đề xuất thay thế này đã được chủ nhiệm đề tài nêu ra trong cuộc họp báo cáo tiến
độ giai đoạn 1 và được đoàn kiểm tra xem xét, ủng hộ.

PHẦN VI. PHỤ LỤC (minh chứng các sản phẩm nêu ở Phần III)

Hà Nội, ngày ........ tháng........ năm .......
Đơn vị chủ trì đề tài
(Thủ trưởng đơn vị ký tên, đóng dấu)

Chủ nhiệm đề tài
(Họ tên, chữ ký)

10

Computer Science - Image Processing | Pattern Analysis and Applications

Image Processing
SUBDISCIPLINES

JOURNALS

Home > Computer Science > Image Processing
BOOKS

SERIES

TEXTBOOKS

REFERENCE WORKS

Pattern Analysis and Applications
Editor-in-Chief: Sameer Singh

ISSN: 1433-7541 (print version)
ISSN: 1433-755X (electronic version)
Journal no. 10044

66,39 €

Personal Rate e-only

Get Subscription
Online subscription, valid from January through December of current calendar year
Immediate access to this year's issues via SpringerLink
1 Volume(-s) with 4 issue(-s) per annual subscription
Automatic annual renewal
More information: >> FAQs // >> Policy

ABOUT THIS JOURNAL

EDITORIAL BOARD

ETHICS & DISCLOSURES

Describes novel pattern analysis techniques as well as industrial and medical applications.
Details new technology and methods for pattern recognition and analysis in applied domains.
Examines the use of advanced methods
contains case-studies as well as reviews on benchmarks, evaluations of tools, and important research
activities at international centers of excellence.
This journal presents original research that describes novel pattern analysis techniques as well as
industrial and medical applications. It details new technology and methods for pattern recognition
and analysis in applied domains, including computer vision and image processing, speech analysis,

robotics, multimedia, document analysis, character recognition, knowledge engineering for pattern
recognition, fractal analysis, and intelligent control.
Pattern Analysis and Applications (PAA) also examines the use of advanced methods, including
statistical techniques, neural networks, genetic algorithms, fuzzy pattern recognition, machine
learning, and hardware implementations which are either relevant to the development of pattern
analysis as a research area or detail novel pattern analysis applications.
The journal contains case-studies as well as reviews on benchmarks, evaluations of tools, and
important research activities at international centers of excellence.
Related subjects » Image Processing
IMPACT FACTOR: 1.352 (2016) *

Journal Citation Reports®
ABSTRACTED/INDEXED IN

Science Citation Index Expanded (SciSearch), Journal Citation Reports/Science Edition,
SCOPUS, Zentralblatt Math, Google Scholar, Academic OneFile, ACM Digital Library, CNKI,
Current Abstracts, Current Contents/Engineering, Computing and Technology, DBLP, EBSCO
Academic Search, EBSCO Applied Science & Technology Source, EBSCO Computer Science
Index, EBSCO Computers & Applied Sciences Complete, EBSCO Engineering Source, EBSCO
STM Source, EBSCO TOC Premier, Gale, Mathematical Reviews, OCLC, PASCAL, ProQuest
Advanced Technologies & Aerospace Database, ProQuest SciTech Premium Collection,
ProQuest Technology Collection, Referativnyi Zhurnal (VINITI), SCImago, Summon by
ProQuest

READ THIS JOURNAL ON SPRINGERLINK

Online First Articles
All Volumes & Issues
FOR AUTHORS AND EDITORS

2016 Impact Factor

1.352

Aims and Scope
Submit Online
Open Choice - Your Way to Open Access
Instructions for Authors
Author Academy: Training for Authors

SERVICES FOR THE JOURNAL

Contacts
Download Product Flyer
Shipping Dates
Order Back Issues
Article Reprints
Bulk Orders
ALERTS FOR THIS JOURNAL

Get the table of contents of every new issue published in
Pattern Analysis and Applications.
Your E-Mail Address

SUBMIT

Please send me information on new Springer
publications in Pattern Recognition.

ADDITIONAL INFORMATION

Editorial information

RELATED BOOKS - SERIES - JOURNALS

BookSeries

Advances in Computer
Vision and Pattern
Recognition
Editor» Series Ed.: Kang, Sing
Bing

BACK

NEXT

1/10

Pattern Analysis and Applications
/>
SHORT PAPER

DSD: document sparse‑based denoising algorithm
T. H. Do1 · O. Ramos Terrades2 · S. Tabbone3
Received: 18 July 2017 / Accepted: 5 May 2018
© Springer-Verlag London Ltd., part of Springer Nature 2018

Abstract
In this paper, we present a sparse-based denoising algorithm for scanned documents. This method can be applied to any kind
of scanned documents with satisfactory results. Unlike other approaches, the proposed approach encodes noise documents
through sparse representation and visual dictionary learning techniques without any prior noise model. Moreover, we propose a precision parameter estimator. Experiments on several datasets demonstrate the robustness of the proposed approach
compared to the state-of-the-art methods on document denoising.
Keywords Document denoising · Sparse representations · Sparse dictionary learning · Document degradation models

1 Introduction
Image denoising is a well-studied problem that continually
attracts researchers [1, 13, 23, 27, 29, 33, 37, 40]. This interest is partially explained because the performances of many
pattern recognition techniques depend on an accurate control
of noise in images. Thus, image denoising is one basic image
processing technique used to enhance image quality before
applying any other image processing methods such us feature extraction, optical flow estimation, image registration
etc. Most of these methods are focused on denoising images
coming from natural scenes, which are collected without any
specific acquisition protocol. Despite the good performance
of many of these methods there is room for improvement in
image denoising [4, 43].

* O. Ramos Terrades

T. H. Do

S. Tabbone

1

Department Informatics, Faculty of Mathematics Mechanics
Informatics, VNU University of Science, Hanoi, Vietnam

2

Computer Vision Center, Computer Science Department,
Engineering School, Universitat Autònoma de Barcelona,
08193 Bellaterra, Catalonia, Spain

3

LORIA ‑ UMR 7503, Université de Lorraine, Campus
Scientifique, BP 239, 54506 Vandoeuvre‑lès‑Nancy, France

Moreover, most general purpose methods assume additive white Gaussian (AWG) noise. They can be divided into
two groups: spatial filtering methods and transform domain
filtering methods. The AWG model assumes that noise is
found into high frequencies of signal, and low-pass filters are
applied to remove it. An example of noise filtering method
is the mean filter, which performs reasonably well in many
scenarios. However, it performance decrease on signaldependent noise and sharp edges are blurred. The Wiener filter requires to know the kind of noise of images beforehand,
which is hard to estimate in practice or even unknown [25].
To overcome some of these drawbacks, several other filters
have been developed, such as the weighted median [42],
the conditioned rank selection [21], and the relaxed median

[20]. These filters, though, only work well for AWG noise.
Transform domain filtering methods are based on the definition of a family of functions that encode images. These
functions are called vocabulary, or atoms depending on the
context, and they compose a dictionary. The main difference
between these methods is whether such atomic functions are
defined beforehand, using some a priori knowledge of the
acquisition device, or they are learned from the data itself.
We refer to them as non-adaptive and adaptive methods,
respectively. Non-adaptive methods, such as multi-resolution
analysis (MRA), have been applied to a wide range of image
processing problems as image compression, image restoration, and image denoising [33, 39, 40]. The overall performance of these methods depends on two factors: the a priori
knowledge of images, which leads to the choice of dictionary
functions, and the kind of noise found on them [3, 7, 11, 17,

13

Vol.:(0123456789)

35, 36]. Recent works have gone in depth in the study of
directional functions, such as contourlet, to improve image
denoising around edges [29, 37]. These methods improve
spatial filtering methods. However, they are still unsatisfactory for document analysis tasks. The main reason is that all
these methods assume AWG noise but scanned document
degradation is well known to be non-AWG [2, 26].
On the contrary, adaptive methods learn the vocabulary from data and consequently learned dictionaries are
better adapted to the image properties [9, 15, 43]. These
approaches have confirmed promising performance for noise

removal. However, the limitation of such methods is that
the denoising performance may degrade quickly since noise
levels increase. Moreover, for different reasons, very few
are dedicated to scanned documents. Indeed, the scanned
documents suffer from specific image degradations in which
highly degraded regions are located near text contours.
Therefore, noise of scanned documents is not white Gaussian and, consequently, most existing methods, that implicitly
assume AWG noise have a low performance. This has a huge
impact on the performance of OCR systems [2].
In this paper, we address the task of document image
denoising without any a priori assumption about the noise
degradation model. Our working hypotheses are twofold:
first, we think that adaptive methods will better fit to document degradations; second, we have to be able to fix an adaptive threshold to cope with non-AWG noise. Consequently,
in this paper, we propose a new method for document
image denoising, called document sparse-based denoising
(DSD). Our main contributions are aligned with our working
hypotheses as follows:
– First, the DSD method is an adaptive method, based on
well-known sparse theory and dictionary learning. To
the best of our knowledge, we are the first to apply it to
denoising scanned documents.
– Second, we propose a precision parameter estimator that
does not assume any particular kind of noise model and
which is defined automatically from a regression model.
Thus, noise is estimated from trained data with pairs of
noised and non-noised samples and, consequently, it can
be applied to any kind of documents and also to natural
scene images.
We evaluate the efficiency of our approach through an
exhaustive evaluation on benchmark documents collections

of different kinds, such as engineering, administrative and
historical documents. The reported results show the ability
of the approach to denoise documents and that the regression model is sound since the estimator is near the optimal
solution. Moreover, comparison with the state-of-the-art
on denoising scanned documents shows that our method is
competitive.

13

Pattern Analysis and Applications

The remainder of this paper is organized as follows.
Section 2 briefly reviews document degradation models.
Section 3 recalls sparse representation methods, their link
to the denoising problem and the dictionary learning algorithm used. Sections 4 and 5 contain our main contributions:
Sect. 4 presents the DSD algorithm for denoising scanned
documents, while Sect. 5 explains the precision parameter
estimator. Section 6 is devoted to all the experiments carried out on benchmark datasets. The main conclusions are
drawn in Sect. 7.

2 Document degradation models
Noise in images is different depending on the acquisition
devices. Moreover, document degradations do not only
come from scanning processes but also from printing and
photocopy processes [26]. Document degradation is seen
as a non-AWG noise that can be modeled. This noise is
mainly located around image contours. On text regions, it is
attached to letter contours. Thus, letter segmentation errors
increase on highly degraded documents and consequently,
decrease the performance of OCR systems.

The first statistical model for document degradation, the
Kanungo model, was introduced in 1993 [26]. Two reasons
motivated this degradation model. First, this model allows to
evaluate recognition algorithms as a function of perturbation
of input data. Second, a precise degradation model can help
to improve the performance of image restoration algorithms.
The Kanungo model is applied on binary images, f0 . This
model computes the pixel distance to letters boundaries. It is
defined by (1) and it has six parameters: 𝛼0 , 𝛼 , 𝛽0 , 𝛽 , 𝜂 , and k.

f (x, y) = 𝛼0 e−𝛼di (x,y) 1{f0 (x,y)=1} + 𝛽0 e−𝛽di (x,y) 1{f0 (x,y)=0} + 𝜂
2

2

(1)
𝛼 and 𝛼0 provide the probability of changing a foreground
pixel to a background pixel. Similarly, parameters 𝛽 and
𝛽0 control the probability of changing a background pixel
with a foreground pixel. The parameter 𝜂 is a constant value
added to all pixels regardless their relative position to letter boundaries. Note that with this parameter AWG noise is
added to the degradation model. Finally, the last parameter
k is the size of the disk used in the morphological closing
operation.
Research in scanned document degradations has shown
that the document edge noise depends on the optical system, the additive noise and a thresholding value. The Noise
Spread (NS) model can quantify this edge noise [2], and it
is inspired on the physics of the image acquisition process.
More precisely, the main assumption behind this model is
that documents are uniformly lighted and the captured light

is proportional to the paper reflectance. CCD sensors capture
the light that comes from the document. Image generation is

Pattern Analysis and Applications

modeled by the Point Spread (PS) function, which is linked
to the impulse response of the optical system. The final
document is obtained by the convolution of the PS function
with the original document. Then, noise coming from sensors, or other sources, is added to the scanned document at
the acquisition time.
There are several choices for the PS function. For the sake
of simplicity, the PS function is usually chosen to be circular
with radius r, such as a bivariate Gaussian distribution or a
bivariate Cauchy distribution. In fact, a bivariate Cauchy
distribution represents the physics of a scanner more accurately than a bivariate Gaussian distribution but its decrease
generates a large support on the numerical simulation. Consequently, a Gaussian distribution is preferred.
To sum up, both degradation models include the blurring
effects caused by the optics system. While the distance to
contours is used in the Kanungo model, the PS function is
used to quantify the noise spread which has been used also to
denoise graphical document images [24]. However, we will
see in Sect. 5 that the NS model requires prior knowledge
of the noise energy.

3 Sparse‑based denoising algorithms
As briefly summarized in the Introduction, many of the
recent advances in denoising algorithms are based on overcomplete representations of input images. Some of them
use pre-defined dictionaries such as wavelets, curvelets or
contourlets, to enumerate but a few, while others learn a

dictionary from a training data collection [15, 29, 37, 40].
The proposed method, the DSD algorithm, also learns a dictionary from a collection of document data.
A signal x ∈ ℝL is sparse if most of its coefficients are
equal to zero. Given an arbitrary input signal h and an overcomplete dictionary A the problem of sparse representation
arises when we look for an sparse representation x such that
h ≈ Ax.
In general, A = {a1 , a2 , … , aM } ∈ ℝL×M , M ≫ L is a fullrank matrix and hence the under-determined system h = Ax
has infinite solutions. An objective function f(x) measuring
the sparsity degree is added to find the sparsest solution. The
following constrained optimization problem is then defined:

x̂ = argmin f (x) subject to Ax = h
x

(2)

Differences between sparse representation algorithms come
from the choices of f and the numerical schemes used to
solve (2). If f (x) = ||x||0 is the l0-norm, the above problem
is NP-hard in general but greedy algorithms find sub-optimal solutions [31, 34, 41]. Other algorithms use lp-norms,
p ∈ (0, 2] as objective functions f(x) [5, 6, 10, 19, 30].

Sometimes, the exact constraint h = Ax is changed by the
quadratic penalty function Q(x) = ‖Ax − h‖2 ≤ 𝜀 , with 𝜖 ≥ 0
is the precision parameter:

x̂ = argmin ‖x‖1 subject to ‖h − Ax‖2 ≤ 𝜀
x

(3)

This is the case when sparse representations are used for
noise removal. Assume that the signal h = Ax + e has noise
e with finite energy ‖e‖22 ≤ 𝜀2 . The optimal solution of (3),
x̂ , such that ĥ = Âx , is the denoised sparse representation
of h. When the objective function f(x) is the lp-norms, (3) is
known as Basis Pursuit denoising (BPDN) method. Naturally, the BPDN method needs a proper dictionary A and an
accurate estimation of the precision parameter 𝜀.
The choice of a suitable dictionary A is critical in the
BPDN performance. Translation-invariant dictionaries,
including curvelets, contourlets, wedgelets, bandelets, and
steerable wavelets, are used for image denoising. These
dictionaries perform reasonably well for a wide range of
images. However, if the type of collection images is particularly constrained, dictionaries adapted to data have shown to
outperform these pre-defined translation-invariant dictionaries. There are several algorithms that can be used for learning dictionaries, namely K-SVD [1], MOD [16], ODL [32]
and RLS−DLA [38]. All these algorithms find a dictionary A using a two-step iterative scheme: sparse coding and
update dictionary stages. In the sparse coding stage, these
algorithms use the current dictionary A to find new sparse
representation of training data. Then, during the update dictionary step, an updating rule is applied to change A to find
better sparse representations of all training examples.
Some experiments have been performed in order to evaluate the complexity of these algorithms and to find out the
advantages and disadvantages in the sparse coding and the
update dictionary stages [8, 14]. In these evaluations, the
OMP algorithm [34] is used to find the sparse representation. The K-SVD algorithm outperforms the other algorithms in these evaluations.

4 The DSD algorithm
The key idea of the DSD algorithm is to apply dictionary
learning algorithms, such as the K-SVD, taking into account
the estimation of the precision parameter 𝜀 , which is linked
to the document noise.

The DSD method works at patch level. It divides an input
document into patches of size w × w , where it applies the
BPDN algorithm with the learned dictionary A. Then, it
merges each denoised patch to compose the full denoised
image of the input image. More specifically, we decompose
the DSD learning algorithm into the following three steps:

13

Pattern Analysis and Applications

1. Creation of a training database Using a sliding window
of size w × w to scan noisy images and to create a training set of patches.
2. Estimation of the precision parameter Applying the precision parameter estimator that is introduced in the next
section.
3. Learning of a visual dictionary Applying the K-SVD
algorithm to learn the visual dictionary A from the training data created in the step before using the estimated
precision parameter 𝜀.
These three steps allow the DSD algorithm to learn an adaptive dictionary from noised images that we will use later for
document images denoising. Note that the DSD algorithm
learns the sparse visual dictionary from the noisy images.
Then, at denoising step, the DSD works as follows:
1. It splits the input image y up into patches hj of size
w × w . For each of them:
a. It finds the solution of the optimization problem (3)
using OMP with the precision parameter 𝜀 calculated from (5),
b. It computes the denoised version of each patch hj by

ĥ j = Âxj,
2. It merges the denoised patches ĥ j , as an average of overlapping windows, to get the denoised image ŷ and binarize ŷ to get the final result ỹ .
In other words, the DSD algorithm denoises the input document by restoring it from a set of visual words. These visual
words have directly been learned from the training data,
which provide an adaptive dictionary. As a consequence,
the DSD algorithm is able to remove AWG, other document
distortions coming from acquisition devices, and other kind
of document degradations, as it is shown in the experiment
section. Next section explains how to estimate the precision
parameter 𝜀.

cleaning as much as possible AWG noise without regarding the physical image degradation sensor. For AWG noise
assumption with standard deviation 𝜎Noise , the precision
2
parameter is usually defined as 𝜀 = c0 w𝜎Noise
, with w the size
of the patch and c0 ∈ [0.5, 1.5] [14]. This model is slightly
modified in [40], since
√ find that 𝜀 is better mod√ the authors
eled by the line: 0.6 w + 1.02 w𝜎Noise , as a function of
𝜎Noise.
The NS model takes into account both kinds of degradations, namely the PS function and AWG noise [24]. With this
model, it was empirically found that 𝜀 is linearly correlated
with the NS function: i.e. 𝜀 = c1 ⋅ NS and NS defined as:

NS =

√

2𝜋 ⋅ r ⋅ 𝜎Noise

LS(ES−1 (𝛩))

(4)

where respectively 𝛩 is the binarization threshold, ES the
cumulative function of the PS function and LS the Line
Spread function, which is the one-dimensional version of
the PS function. Moreover, the NS function depends on the
model parameters: the radius r of PS function and the standard deviation 𝜎Noise of the white Gaussian noise.
The main drawbacks of these models rely on a prior
knowledge on the underlying noise 𝜎 , which is assumed to
follow an AWG law, and the need of setting a constant ci .
Instead, we propose the maximum peak of the normalized
cross-correlation (NCC) as a measure of similarity between
the original document and its noisy version. The NCC computation does not require any a priori knowledge about the
noise model but needs pairs of non-noisy and noisy images.
High NCC values denote relatively clean images, while low
NCC values correspond to noisy images.
We have to estimate the precision parameter 𝜀 before
training the visual vocabulary. Thus, we need to train data
D composed of pairs of documents: {Doi , Dni } . Each pair is an
original image Doi and its noised version Dni . We denote by
ri = 1 − max{NCC(Doi , Dni )} , the maximum value achieved
by the NCC between the original and the noisy image:
(5)
where r̄ is the mean value of ri . Thus, r̄ is increasing with
the level of noise as 𝜎Noise and NS respectively do for AWG
and NS model. We set c2 and 𝛽 by the mean of a linear
regression.

𝜀̂ = c2 ⋅ r̄ + 𝛽,

5 Estimation of precision parameter
As we have seen, the DSD method needs an accurate estimation of the precision parameter 𝜀 . Therefore, a good performance of the DSD method directly depends on an accurate
estimate of the threshold 𝜀 . A too small value will lead to
noisy reconstructions of input data while high values of 𝜀
will cause low quality of sparse approximations with a high
reconstruction error.
To the best of our knowledge, there are no real studies
about the relation between 𝜀 and the noise level in the documents. Most of the denoising schemes at patch level assume
that 𝜀 follows a Gaussian distribution and they focus on

13

6 Experiments
In this section, we evaluate the performance of the proposed method compared to other related state-of-the-art
methods. In these experiments, unless otherwise specified,
we use the Peak Signal-to-Noise Ratio (PSNR). We evaluate these methods on benchmark document image datasets:

Pattern Analysis and Applications

the document dataset Tobacco-800 [28], the symbol images
dataset GREC 2005 [12] and the old document dataset
DIBCO 2009 [18].
For each set of noisy images, we randomly sample 𝜀 in an
interval [0, 5] . For each 𝜀 value, we train a visual dictionary
on the target image. We experimentally set the following
K-SVD parameters: the maximum number of iterations to

50 and the ratio of the dictionary to 1/4. We extract 8 × 8
patches from the training images. Thus, the size of the input
patches stacked into a column vector is 64 and the size of the
dictionary is fixed to 256 for all the experiments. We carry
out two sets of experiments:
– Evaluation of the precision model We study how the precision model, proposed in Sect. 5, generalizes to image
datasets with AWG and non-AWG noise. To this end,
we estimate the confidence interval (CI) of the precision model for each performance evaluation measure at
a confidence value 𝛼 = 0.05.
– Comparison to reference state-of-the-art methods We
compare our approach with state-of-the-art on benchmark datasets. For each performance evaluation measure,
we use the paired Wilcoxon signed test to assert significant difference. We choose this statistical test because it
does not assume a Gaussian distribution of the performance values. In the reported results, we compare the
DSD method with the estimated 𝜀̂ against the others.
Marks (−) and (+) on the right side of the performance
values indicate that the corresponding method performs
significantly worst, and significantly better, respectively,
than our method with a significance value of 5% . The
mark (=) denotes that we do not have enough statistical
evidence to conclude that both methods perform significantly differently.

6.1 Evaluation of the precision model
In our first experiment, we estimate the precision parameter model introduced in Sect. 5, which is the linear model
given by (5). This evaluation is similar to the approach done
in [22] where regression lines and CI of their parameters

are estimated to evaluate an image segmentation task. Similarly, we have taken a random sampling of images from
the Tobacco-800 dataset [28], which is composed of 1290
images of documents, scanned using a wide variety of scanner devices over time and at resolutions ranging from 150 to
300 DPI, see Fig. 1. In this collection, the reference images

are the scanned documents and consequently, document
degradations caused by the scanner devices are not taken
into account in this experiment. We compare the reference
images to noisy versions of them, generated by adding them
AWG noise, with variances ranging from 0.01 to 0.50.
The DSD method assumes that the r̄ was previously estimated. Moreover, it assumes that all the images acquired
with the same device have the same noise level r̄ since it
depends on the physical features of acquisition devices. To
emulate this configuration in these experiments, we follow
a leave-one-out scheme to estimate the r̄ value of a given
image.
Then, we seek the optimal 𝜀∗ by brute force by randomly
sampling the interval [0, 5] . We fix the upper bound after
observing that the method performance was always worst for
higher values. Fig. 2a shows the PSNR plot of an image from
the Tobacco-800 dataset as a function of 𝜀 . We estimate the
CI from a t-test of pairs (̄r, 𝜀i ) , being r̄ the estimated noise
level and 𝜀i all the random 𝜀 near to the maximum value.
In case of AWG noise, it is observed that all the performance measures have a similar shape. There is a relative
narrow CI, which shifts with the noise level. This fact is
better seen in Fig. 2b. There, a scatter plot of pairs (̄r, 𝜀∗ )
and the regression line fitting them for the PSNR measure
is shown. In that plot, dots are the mean values of (̄r , 𝜀∗ ) and
vertical lines are the CI of optimal 𝜀∗ , with a confidence of
95%. For AWG, the CI width approximately ranges between
0.1 and 0.4.
Table 1 shows the reliability of the regression model.
To better evaluate the quality of our regression model, this
table also shows three other performance measures: the
mean square error (MSE), the Structural Similarity measure (SSIM) and the Jaccard index (JI). The parameters and

CI for AWG noise (Tobacco-800 dataset) are almost the
same regardless of the performance measure. Moreover, the

Fig. 1 Some signature documents from Tobacco 800 dataset

13

Pattern Analysis and Applications

Fig. 2 Performance plot for the
PSNR measure for the Tobacco
800 dataset. a In the x-axis 𝜖
is randomly sampled. Vertical
dashed lines represent the CI
bounds of the noise level. b
Scatter plot between pairs (̄r, 𝜀∗ )
and linear fitting

Table 1 CI for the precision models for each performance measure
for the Tobacco 800 dataset
Ev. M.

c2

PSNR
MSE
SSIM

JI

4.77 ∈ [4.52,
4.75 ∈ [4.51,
4.93 ∈ [4.69,
4.63 ∈ [4.39,

𝛽
5.02]
5.00]
5.16]
4.88]

1.12 ∈ [1.00,
1.12 ∈ [1.00,
1.16 ∈ [1.05,
1.13 ∈ [1.02,

1.24]
1.24]
1.16]
1.26]

R2

p value

0.97
0.97
0.97

0.97

0
0
0
0

coefficient of determination R2 is quite high for all of them.
It means that the regression model is able to provide a rough
estimation of the optimal precision parameter 𝜀∗.

6.2 Comparison with synthetic image datasets
The above experiments show that we can use the precision parameter model to fix the optimal 𝜀 for documents
with AWG noise. In this experiment, we study whether the
precision parameter model learned for AWG noise is good
enough to be applied in document images with unknown
noise. Thus, we apply the DSD algorithm to a new random
collection of samples from the Tobacco-800 dataset and to
the GREC2005 dataset. The GREC2005 dataset was created
to evaluate symbol recognition system under several kinds of
document degradations. These degradations were generated
using the Kanungo model, reviewed in Sect. 2, with 6 sets of
parameters to obtain 6 levels of degradation.
We compare the DSD method to other benchmark methods for binary images: median filtering, morphological operators, and the method based on curvelets transform (CT)
[24]. The median filter is selected because of its simplicity and relative good performance with AWG noise. In the
document image analysis community, morphological operators are usually applied to clean up document images. Thus,
the method based on morphological operators, referred

13

as the OC method, is devoted to be compared to the DSD
method in the context of document image processing. The
OC method consists of applying opening (O) and closing (C)
operators with structural elements of size 3 × 3 . Finally, the
method based on the curvelet transforms (CT) is an stateof-the-art method, which requires also the estimation of a
precision parameter 𝜀CT and depends on the amount of noise
of input image. For fair comparison, we run the CT method
for random values 𝜀 in the interval [0, 0.5] and we report the
best result in each experiment. The upper bound 0.5 was
selected after running few experiments and observed that
in general the performance of the CT method was worst for
higher values.
The DSD with optimal 𝜀∗ clearly outperforms other
methods for the Tobacco-800 dataset, as shown in Table 2.
Moreover, the DSD method is able to perform well with
highly degraded images. The mean PSNR for a AWG noise
of 0.1 is 22.74 dB with our approach, while the Median filter
achieves 20.35 dB, the CT 17.68 dB, and the OC 5.73 dB.
Moreover, when we use the estimated precision parameter
𝜀̂ , the DSD method also outperforms the other methods. We
can reject the null hypothesis of the Wilcoxon signed test
for the OC and CT methods at all noise levels. However, the
Median filter performs at the same level than the DSD with
the estimated 𝜀̂ when the noise level increases.
For the GREC 2005 datasets, the DSD method with
optimal 𝜀∗ also outperforms the other methods, as shown
in Table 3. Regarding the precision model, the performance of the DSD algorithm decreases compared to the
DSD with the optimal 𝜀∗ for all noise levels. Moreover,
for low noise levels, the difference in terms of the PSNR
performance are not significant. However, the performance

for level 2 and level 6 is quite significant in most cases.
This difference for these two noise levels can easily be
explained after having a look to the kind of degradations
generated at each level, see Fig. 3. Level 2 noise is mostly

Pattern Analysis and Applications
Table 2 Mean value of
the PSNR measure for the
Tobacco800 dataset

𝜎

r̄

0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10

0.0000
0.0011
0.0106
0.0333

0.0654
0.1020
0.1396
0.1756
0.2096
0.2411

Lev.

r̄

Median

23.9479 (+)
23.9272 (−)
23.7706 (−)
23.3982 (−)
22.8764 (−)
22.3126 (=)
21.7636 (=)
21.2500 (=)
20.7758 (=)
20.3455 (=)

OC

CT

DSD
𝜀∗

𝜀̂

22.0605
32.6892
31.9949
28.8321
25.5029
22.9924
21.1942
20.8807
20.5877
20.2928

19.0379 (−)
18.8589 (−)
17.7113 (−)
15.7142 (−)
13.5084 (−)
11.4474 (−)
9.6158 (−)
8.0568 (−)
6.7801 (−)
5.7286 (−)

19.3375 (−)
19.3223 (−)
19.2640 (−)
19.0969 (−)
18.8759 (−)

18.6501 (−)
18.3956 (−)
18.1483 (−)
17.9285 (−)
17.6756 (−)

35.0925 (+)
37.0225 (+)
32.1329 (=)
29.0753 (=)
26.6730 (+)
25.4106 (+)
24.5455 (+)
23.8356 (+)
23.0268 (+)
22.7350 (+)

OC

CT

DSD

Fig. 3 a Original binary symbol; from b to g examples of six
levels of Kanungo noise of the
GREC 2005 dataset

Table 3 Mean value of PSNR
measure for the GREC2005
dataset

1
2
3
4
5
6

0.012
0.478
0.024
0.062
0.003
0.714

Median

30.3633 (=)
22.1449 (−)
26.2114 (=)
24.6269 (+)
36.6187 (=)
3.6285 (=)

AWG, the precision estimator model provides a good 𝜀̂
estimator, and consequently the performance of the DSD
for both parameter estimation is almost the same. However, Level 6 noise highly degrades images, so benchmark
methods have huge difficulties to properly denoise images
without losing relevant information. In this setting, the
DSD method performs better than benchmark methods.

33.2105 (+)
2.6631 (−)
32.4281 (+)
15.2183 (−)
35.6024 (−)
3.4102 (−)

28.8332 (−)
9.8653 (−)
25.4901 (=)
20.9548 (=)
36.1567 (=)
3.6237 (=)

𝜀∗

𝜀̂

33.2013 (+)
25.3381 (=)
28.4470 (+)
27.2243 (+)
37.8624 (=)
5.7283 (+)

30.6896
25.0783
26.1445
21.5266

37.0888
3.6384

6.3 Comparison with real datasets
We conclude the evaluation of the DSD method carrying
out experiments on degraded images where noise is clearly
non-AWG. We apply the DSD method on the DIBCO2009
dataset [18].
The experiment with the DIBCO 2009 dataset was
devised to evaluate the DSD method in a broader scenario,

13

Pattern Analysis and Applications

different to a denoising task, since it is composed of only
5 old and degraded handwritten documents, see Fig. 4.
Comparing to other methods, the DSD algorithm with
optimal 𝜀∗ is able to achieve the best results for all images,
see Table 4. Although we cannot draw any definitive
conclusion given the size of the dataset, the DSD algorithm shows a reconstruction capacity that deserves further study. In addition, when we apply the DSD method
together with the precision model, which was estimated on
the Tobacco-800 dataset in Sect. 6.1, we achieve a similar
performance to other methods, see Table 4.
Overall, our experiments show that the DSD algorithm
is able to outperform other related approaches in a wide
range of scanned documents. Moreover, the proposed precision model estimated with AWG images can be applied

with good results on document images with unknown
noise and non-AWG noise.

Table 4 Results on the DIBCO
2009 dataset with the PSNR
measure

r̄

H01
H02
H03
H04
H05
Mean

0.3052
0.2848
0.2944
0.242
0.2132
0.2679

7 Conclusions
In this paper, we propose a novel denoising algorithm based
on the learning of adaptive dictionaries called DSD algorithm. This algorithm is designed on the basis of sound
sparse theory and the learning of sparse dictionaries. It has
some advantages compared to other similar approaches.
First, no a priori noise model is assumed. This allows the
removal of noise other than AWG. Second, sparse dictionary is straightforwardly learned from noisy images, even if

clean samples are not available for the learning step. Third,
the precision parameter 𝜀 is obtained from a linear regression
model that only depends on the normalized cross-correlation
(NCC). Again, there is no need of any a priori noise model
assumption.
We applied this algorithm to several kinds of degraded
documents from different fields, such as architectural drawings, business and historical documents. The obtained

Median

12.47
20.79
14.17
10.2
13.84
14.99 (=)

OC

12.22
22.87
14.74
10.58
14.56
14.29 (=)

CT

12.58
24.06

15.73
11.43
12.05
15.17 (=)

DSD
𝜀∗

𝜀̂

13.94
24.51
15.85
12.85
15.64
16.56 (=)

11.79
23.49
11.49
10.14
9.654
13.31

Fig. 4 a Noisy DIBCO documents used in Table 4, b denoised documents got by our approach before binarization and c after binarization using
Otsu’s method

13

Pattern Analysis and Applications

results show a significant improvement of the DSD algorithm in comparison to related methods. However, there is
room for improvement. For instance, dictionary learning is
based on sharing dictionaries from similar documents. Thus,
recovering highly degraded documents might be possible by
combining sparse patches. Also, the linear regression of the
precision parameter can be improved by taking into consideration other factors of variability.
Acknowledgements This work was partially supported by the European project SCANPLAN (A0806017L), the Spanish ConCORDIA
Project (TIN2015-70924-C2-2-R) and the Vietnam National University, Hanoi (VNU) under project number QG.18.04.

References
1. Aharon M, Elad M, Bruckstein A (2006) K-SVD: An algorithm
for designing overcomplete dictionaries for sparse representation.
Sig Process 54(11):4311–4322
2. Barney E (2008) Modeling image degradations for improving
OCR. In: Proceedings of the 16th European signal processing
conference (EUSIPCO), pp 1–5
3. Candés EJ, Donoho DL (2000) Curvelets: a surprisingly effective
nonadaptive representation for objects with edges. In: Rabut C,
Cohen A, Schumaker L (eds) Curve and Surface Fitting: SaintMalo 1999 (Innovations in Applied Mathematics), Vanderbilt
University Press, pp 105–120
4. Chatterjee P, Milanfar P (2010) Is denoising dead? Trans Image
Process 19(4):895–911
5. Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61
6. Daubechies I, Devore R, Fornasier M, Gunturk CS (2009) Iteratively reweighted least squares minimization for sparse recovery.
Commun Pure Appl Math 63(1):1–38
7. Do M, Vetterli M (2005) The contourlet transform: an efficient
directional multiresolution image representation. Image Process
14(12):2091–2106

8. Do TH (2014) Sparse representations over learned dictionary for
document analysis. PhD thesis, Université de Lorraine
9. Dong W, Zhang L, Shi G, Li X (2013) Nonlocally centralized
sparse representation for image restoration. IEEE Trans Image
Process 22(4):1620–1630
10. Donoho D, Elad M (2003) Optimally sparse representation in
general (nonorthogonal) dictionaries via 𝓁 1 minimization. PNAS
100(5):2197–2202
11. Donoho DL (1999) Wedgelets: nearly minimax estimation of
edges. Ann Stat 27(3):782–1117
12. Dosch P, Valveny P (2005) Report on the second symbol recognition contest. In: Liu W, Lladós J (ed) Graphics recognition. Ten
years review and future perspectives, volume 3926 of Lecture
notes in computer science, Springer, pp 381–397
13. Eksioglu EM (2014) Online dictionary learning algorithm with
periodic updates and its application to image denoising. Expert
Syst Appl 41:3682–3690
14. Elad M (2010) Sparse and redundant representation: from theory
to applications in signal and images processing. Springer, New
York
15. Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. Image Process
54(12):3736–3745

16. Engan K, Skretting K, Husoy JH (2007) Family of iterative
LS-based dictionary learning algorithm, ITS-DLA, for sparse
signal representation. Digit Signal Proc 17(1):32–49
17. Eslami R, Radha H (2003) The contourlet transform for image
de-noising using cycle spinning. In: Proceedings of Asilomar
conference on signals, systems, and computers, pp 1982–1986
18. Gatos B, Ntirogiannis K, Pratikakis I (2011) DIBCO 2009:
document image binarization contest. Int J Doc Anal Recognit

14(1):35–44
19. Gonzalez I, Rao B (1997) Sparse signal reconstruction from
limited data using focuss: a re-weighted minimum norm algorithm. Sig Process 45(3):600–616
20. Hamza AB, Luque P, Martinez J, Roman R (1999) Removing
noise and preserving details with relaxed median filters. Math
Imag Vis 11(2):161–177
21. Hardie RC, Barner KE (1994) Rank conditioned rank selection
filters for signal restoration. Image Process 3:192–206
22. Hernandez-Sabate A, Gil D, Roche D, Matsumoto M, Furuie
S (2012) Inferring the performance of medical imaging algorithms. In: 14th International conference on computer analysis
of images and patterns, vol 6854. pp 520–528
23. Hoang T, Barney E, Tabbone S (2011) Edge noise removal in
bilevel graphical document images using sparse representation.
In: Proceedings of the international conference on image processing, pp 3610–3613
24. Hoang TV, Smith EHB, Tabbone S (2014) Sparsity-based edge
noise removal from bilevel graphical document images. IJDAR
17(2):161–179
25. Jain AK (1989) Fundamentals of digital image processing.
Prentice-Hall, Upper Saddle River
26. Kanungo T, Haralick RM, Phillips IT (1993) Global and local
document degradation models. In: Proceedings of the second
international conference on document analysis and recognition,
pp 730–734
27. Kuang Y, Zhang L, Yi Z (2014) An adaptive rank-sparsity
K-SVD algorithm for image sequence denoising. Pattern Recogn Lett 45:46–54
28. Lewis D, Agam G, Argamon S, Frieder O, Grossman D, Heard
J (2006) Building a test collection for complex document information processing. In: Proceedings of 29th annual international
ACM SIGIR conference, pp 665–666
29. Liu J, Wang Y, Su K, He W (2016) Image denoising with
multidirectional shrinkage in directionlet domain. Sig Process

125:64–78
30. Mallat S (2009) A wavelet tour of signal processing: The sparse
way, third edn. Academic Press, Cambridge
31. Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. Sig Process 41(12):3397–3415
32. Marial J, Bach F, Ponce J, Sapiro G (2009) Online dictionary
learning for sparse coding. In: 26th Annual international conference on machine learning, pp 689–696
33. Om H, Biswas M (2014) MMSE based map estimation for image
denoising. Opt Laser Technol 57:252–264
34. Pati Y, Rezaiifar R, Krishnaprasad P (1993) Orthogonal matching pursuit: recursive function approximation with applications to
wavelet decomposition. In: 27th Annual Asilomar conference on
signals, systems, and computers, pp 40–44
35. Le Pennec E, Mallat S (2005) Sparse geometric image representations with bandelets. Image Process 14(4):423–438
36. Peyré G, Mallat S (2007) A review of bandlet methods for geometrical image representation. Numer Algorithms 44(3):205–234
37. Sadreazami H, Omair Ahmad M, Swamy MNS (2016) A study
on image denoising in contourlet domain using the alpha-stable
family of distributions. Sig Process 128:459–473
38. Skretting K, Engan K (2010) Recursive least squares dictionary
learning algorithm. Sig Process 58(4):2121–2130

13

39. Starck J-L, Candés EJ, Donoho DL (2002) The curvelet transform
for image denoising. Image Process 11(6):670–684
40. Sun D, Gao Q, Lu Y, Huang Z, Li T (2014) A novel image denoising algorithm using linear Bayesian map estimation based on
sparse representation. Sig Process 100:132–145
41. Temlyakov VN (2000) Weak greedy algorithms. Adv Comput
Math 12(2–3):213–227

13

Pattern Analysis and Applications
42. Yang R, Yin L, Gabbouj M, Astola J, Neuvo Y (1995) Optimal
weighted median filters under structural constraints. Sig Process
43:591–604
43. Zha Z, Zhang X, Wang Q, Bai Y, Chen Y, Tang L, Liu X
(2018) Group sparsity residual constraint for image denoising
with external nonlocal self-similarity prior. Neurocomputing
275:2294–2306

nghiên cứu phát triển một số kĩ thuật khử nhiễu ảnh dựa trên biểu diễn thưa và mô hình hồi quy tuyến tính

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về