Saturday , June 23 2018

A Novel Hybrid Binarization Technique for Images of
Historical Arabic Manuscripts

Khaled M. AMIN3, Sherihan MOHAMED2

1 Faculty of Computers and Information,
Cairo University, Egypt
2 Faculty of Computers and Information,
Mansoura University, Egypt
3 Faculty of Computers and Information,
Menofia University, Egypt

Abstract: In this paper, a novel binarization approach based on neutrosophic sets and sauvola’s approach is presented. This approach is used for historical Arabic manuscript images which have problems with types of noise. The input RGB image is changed into the NS domain, which is shown using three subsets, namely, the percentage of indeterminacy in a subset, the percentage of falsity in a subset and the percentage of truth in a subset. The entropy in NS is used for evaluating the indeterminacy with the most important operation ”λ mean” operation in order to minimize indeterminacy which can be used to reduce noise. Finally, the manuscript is binarized using an adaptive thresholding technique. The main advantage of the proposed approach is that it preserves weak connections and provides smooth and continuous strokes. The performance of the proposed approach is evaluated both objectively and subjectively against standard databases and manually collected data base. The proposed method gives high results compared with other famous binarization approaches.

Keywords: Document image binarization, Historical manuscript image, Neutrosophic theory, Pixel classification.

Aboul Ella HASSANIEN, Mohamed ABDELFATTAH, Khaled M. AMIN, Sherihan MOHAMED, A Novel Hybrid Binarization Technique for Images of Historical Arabic Manuscripts, Studies in Informatics and Control, ISSN 1220-1766, vol. 24 (3), pp. 271-282, 2015.

  1. Introduction

Libraries and archives in the world store a huge number of old and historically important manuscripts and documents. These historical documents accumulate a significant amount of human heritage over time [1]. Digital images of historical documents typically suffer from various degradations due to; uncontrolled storage conditions, ageing [2]. The main degradations are; non-uniform illumination, strain, smears, bleeds- through, faint characters, shadow [1; 2; 3].

The binarization process is a key step in all document image processing workflows. It switches an image into bi-level form in such way that the background information is represented by white pixels and the foreground by white ones [3;4]. Although the process of document image binarization has been studied for many years ago, thresholding of historical document images is still a challenging problem due to the complexity of the images and the above mentioned degradations. Moreover, binarization of ancient Arabic manuscripts has extra problems such as; decorations, diacritics or characters written in multiple colors [2]. Figure 1 shows examples of degraded historical Arabic manuscript images.


a) Dirty document with spots, stains, smears or smdges

b) Ink wet characters visible both sides

c) Broken characters, light handwriting


d) Documents with poor quality paper

e) Multicolored, background

f) Poor contrast between foreground and background

Figure 1. Examples of manuscript images containing multi-colored text lines with different degradations [5; 6].

In this paper, a new hybrid algorithm for binarization of degraded Arabic manuscript image is proposed. It combines the famous adaptive algorithm of Sauvola’s (8) and a NS binarization algorithm of [9] into a hybrid one. Neutrosophic set (NS) approach is quite new and have been useful for various image processing tasks such as segmentation, thresholding, and denoising [7]. Experimental results proves that the proposed approach is capable of select appropriate thresholds automatically and effectively, while it is shown to be less sensitive to noise and to perform better compared with other binarization algorithms.

The remainder of the paper is structured as follows: In Section 2, the NS approach is discussed in brief. Section 3 presents the previous work on binarization of historical documents of historical images and generation of ground truth images.

In Section 4, we present our proposed hybrid method. Section 5 demonstrates experimental results. Finally, section 6 presents our conclusions and some directions for future research.


