Saturday , June 23 2018

A Novel Hybrid Binarization Technique for Images of
Historical Arabic Manuscripts

Aboul Ella HASSANIEN1, Mohamed ABDELFATTAH2,
Khaled M. AMIN3, Sherihan MOHAMED2

1 Faculty of Computers and Information,
Cairo University, Egypt
aboitcairo@gmail.com
2 Faculty of Computers and Information,
Mansoura University, Egypt
3 Faculty of Computers and Information,
Menofia University, Egypt

Abstract: In this paper, a novel binarization approach based on neutrosophic sets and sauvola’s approach is presented. This approach is used for historical Arabic manuscript images which have problems with types of noise. The input RGB image is changed into the NS domain, which is shown using three subsets, namely, the percentage of indeterminacy in a subset, the percentage of falsity in a subset and the percentage of truth in a subset. The entropy in NS is used for evaluating the indeterminacy with the most important operation ”λ mean” operation in order to minimize indeterminacy which can be used to reduce noise. Finally, the manuscript is binarized using an adaptive thresholding technique. The main advantage of the proposed approach is that it preserves weak connections and provides smooth and continuous strokes. The performance of the proposed approach is evaluated both objectively and subjectively against standard databases and manually collected data base. The proposed method gives high results compared with other famous binarization approaches.

Keywords: Document image binarization, Historical manuscript image, Neutrosophic theory, Pixel classification.

>>Full text
CITE THIS PAPER AS:
Aboul Ella HASSANIEN, Mohamed ABDELFATTAH, Khaled M. AMIN, Sherihan MOHAMED, A Novel Hybrid Binarization Technique for Images of Historical Arabic Manuscripts, Studies in Informatics and Control, ISSN 1220-1766, vol. 24 (3), pp. 271-282, 2015.

  1. Introduction

Libraries and archives in the world store a huge number of old and historically important manuscripts and documents. These historical documents accumulate a significant amount of human heritage over time [1]. Digital images of historical documents typically suffer from various degradations due to; uncontrolled storage conditions, ageing [2]. The main degradations are; non-uniform illumination, strain, smears, bleeds- through, faint characters, shadow [1; 2; 3].

The binarization process is a key step in all document image processing workflows. It switches an image into bi-level form in such way that the background information is represented by white pixels and the foreground by white ones [3;4]. Although the process of document image binarization has been studied for many years ago, thresholding of historical document images is still a challenging problem due to the complexity of the images and the above mentioned degradations. Moreover, binarization of ancient Arabic manuscripts has extra problems such as; decorations, diacritics or characters written in multiple colors [2]. Figure 1 shows examples of degraded historical Arabic manuscript images.

01_Art

a) Dirty document with spots, stains, smears or smdges

02_Art
b) Ink wet characters visible both sides

03_Art
c) Broken characters, light handwriting

04_Art

d) Documents with poor quality paper

05_Art
e) Multicolored, background

06_Art
f) Poor contrast between foreground and background

Figure 1. Examples of manuscript images containing multi-colored text lines with different degradations [5; 6].

In this paper, a new hybrid algorithm for binarization of degraded Arabic manuscript image is proposed. It combines the famous adaptive algorithm of Sauvola’s (8) and a NS binarization algorithm of [9] into a hybrid one. Neutrosophic set (NS) approach is quite new and have been useful for various image processing tasks such as segmentation, thresholding, and denoising [7]. Experimental results proves that the proposed approach is capable of select appropriate thresholds automatically and effectively, while it is shown to be less sensitive to noise and to perform better compared with other binarization algorithms.

The remainder of the paper is structured as follows: In Section 2, the NS approach is discussed in brief. Section 3 presents the previous work on binarization of historical documents of historical images and generation of ground truth images.

In Section 4, we present our proposed hybrid method. Section 5 demonstrates experimental results. Finally, section 6 presents our conclusions and some directions for future research.

REFERENCES

  1. NAFCHI, H., R. F. MOGHADDAM, M. CHERIET, Phase-Based Binarization of Ancient Document Images: Model and Applications, IEEE Trans. on Image Processing. vol. 23(7), 2014, pp. 2916-2930.
  2. NTIROGIANNIS, K., B. GATOS, I. PRATIKAKIS, Performance Evaluation Methodology for Historical Document Image Binarization. IEEE Trans. on Image Processing. vol. 22(2), 2013, pp. 595-609.
  3. AMIN, K. M., M. A. AHMAD, A. ALI, A Novel Binarization Algorithm for Historical Arabic Manuscripts using Wavelet Denoising. Int. J. of Computing and Inf. Sciences. Vol. 13(1), 2013.
  4. AMIN, K. M., M. ELFATTAH, A. E. HASSANIEN, G. SCHAEFER, A Binarization Algorithm for Historical Arabic Manuscript Images using a Neutrosophic Approach. In Computer Engineering Systems (ICCES), 9th Intl. Conference, pp. 266-270, 2014.
  5. http://www.wqf.me.comlast accessed at 9 P.M, 10 Jan 2015.
  6. http://ocp.hul.harvard.edu/ihp/manuscripts. html, last accessed at 9 P.M, 10 Sep. 2015.
  7. P. STATHIS, E. KAVALLIERATOU, N. PAPAMARKOS, An Evaluation Technique for Binarization Algorithms, Journal of Universal Computer Science, vol. 14(18), 2008, pp. 3011-3030.
  8. SAUVOLA, J., M. PIETIKAINEN, Adaptive Document Image Binarization, Pattern Recognition, vol. 33(2), 2000, pp. 225-236.
  9. MOHAN, J., V. KRISHNAVENI, Y. GUO, A New Neutrosophic Approach of Wiener Filtering for MRI Denoising, Measurement Science Review, Vol. 13(4), 2013, pp. 177-168.
  10. SAMARANDACHE, F., A Unifying Field in Logics: Neutrosophic Logic. Neutrosophy, Neutrosophic Set, Neutrosophic Probability, (third edition). American Research Press, 2003.
  11. Y. CHENG, HENG-DA, Y. GUO, A New Neutrosophic Approach To Image Thresholding, New Mathematics and Natural Computation, vol. 4(3), 2008, pp. 291-308.
  12. GUO, Y., H. D. CHENGA, New Neutrosophic Approach To Image Segmentation, Advances in Multimedia, Vol. 42(5), pp. 587-595, 2009.
  13. ZHANG, M., Novel Approaches to Image Segmentation Based on Neutrosophic Logic, Doctoral Dissertation, Utah State University, 2010.
  14. BILLER, OFER, WebGT: An Interactive Web-Based System for Historical Document Ground Truth Generation. ICDAR, pp. 305-308. 2013.
  15. YANIKOGLU, B., L. VINCENT. Pink Panther: A Complete Environment for Ground Truthing and Benchmarking Document Page Segmentation, Pattern Recognition. vol. 31(9), 1998, pp. 1191-1204.
  16. LEE, H., CHANG, T. KANUNGO. The Architecture of TRUEVIZ: A
    groundTRUth/metadata Editing and VIsualiZing toolkit. Pattern recognition. Vol. 36(3), pp. 811-825, 2003.
  17. YACOUB, S., S. VINAY, S. S. NUSRULLA. Perfectdoc: A Ground Truthing Environment for Complex Documents. In: Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on. IEEE, 2005, pp. 452-456.
  18. SAUND, E., J. LIN, P. SARKAR, PIXLABELER: User Interface for Pixellevel Labeling of Elements in Document Images. In: Document Analysis and Recognition, 2009. ICDAR’09. 10th Intl. Conf. on. IEEE, 2009, pp. 646-650.
  19. FISCHER, A., Ground Truth Creation for Handwriting Recognition in Historical Documents. Proceedings of the 9th IAPR Intl. Workshop on Document Analysis Systems. ACM, 2010, p. 310.
  20. CLAUSNER, C., S. PLETSCHACHER, A. ANTONACOPOULOS, Aletheia-an advanced document layout and text ground-truthing system for production environments. International Conference on In Document Analysis and Recognition (ICDAR), pp. 48-52, 2011.
  21. FENG, M.-L., Y.-P. TAN, Adaptive Binarization Method for Document Image Analysis, IEEE on Multimedia and Expo, Vol. 1, pp. 339-342, 2004.
  22. SHAFAIT, F., KEYSERS, D., BREUEL, T. M. E_cient implementation of local adaptive thresholding techniques using integral images. In Electronic Imaging. International Society for Optics and Photonics, pp. 681510-681510, 2008.
  23. OTSU, N., A Threshold Selection Method from Gray-Level Histograms, IEEE Transactions on Systems, Man and Cybernetics, Vol. 9(1), pp. 62-66, 1979.
  24. SU, B. S. LU, C. L. TAN, Binarization of historical handwriting document images using local maximum and minimum filter, in Proc. Int. Workshop Document Anal. Syst., pp. 159-166, 2010.
  25. TRIER, O. D., T. TAXT, Evaluation of binarization methods for document images, IEEE Trans. On Pattern Analysis and Machine Intelligence 17, pp. 312-315, March 1995.
  26. BADEKAS, E., N. PAPAMARKOS, Automatic evaluation of document binarization results, in 10th Iberoamerican Congress on Pattern Recognition, pp. 1005-1014, [Havana, Cuba], 2005.
  27. SEZGIN, M., B. SANKUR, Survey Over Image Thresholding Techniques and Quantitative Performance Evaluation, Journal of Electronic Imaging, vol. (13)1, pp. 146-165, 2004.
  28. NIBLACK, W., An Introduction to Digital Image Processing, Prentice Hall, Englewood Cli_s, 1986.
  29. KHURRAM, K., Comparison of Niblack inspired Binarization methods for ancient documents, IST/SPIE Electronic Imaging, pp. 72470U- 72470U, 2009.
  30. GATOS, B. I. PRATIKAKIS, S. PERANTONIS, An Adaptive Binarization Technique for Low Quality Historical Documents, DAS 2004, LNCS 3163, pp. 102-113, 2004.JAIN, A., Fundamentals of Digital Image Processing, Pr. Hall, 1989.
  31. NAFCHI, H., S. AYATOLLAHI, R. FARRAHI, M. CHERIET, An efficient ground truthing tool for binarization of historical manuscripts, 12th Int. Conf. on Document Analysis and Recognition, pp. 807-811, 2013.
  32. JAIN, A., Fundamentals of Digital Image Processing, Pr. Hall, 1989.
  33. SMARANDACHE, F., A Unifying Field in Logics Neutrosophic Logic. Neutrosophy, Neutrosophic Set, Neutrosophic Probability, 3rd Ed., American Research Press, 2003.
  34. GATOS, B., K. NTIROGIANNIS, I. PRATIKAKIS, ICDAR 2009 document image binarization contest (DIBCO 2009), Int. Conf. on Document Analysis and Recognition, pp. 1375-1382, 2009.
  35. LU, H., A. C. KOT, Y. Q. SHI, Distance- Reciprocal Distortion Measure for Binary Document Images, IEEE Signal Processing Letters, vol. 11(2), pp. 228-231.
  36. SU, B., S. LU, C. L. TAN. Robust document image binarization technique for degraded document images. Image Processing, IEEE Transactions on. Vol. (22)4, pp. 1408-1417, 2013.
  37. MAYNARD, D., W. PETERS, Y. LI. Metrics for evaluation of ontology-based information extraction. International World Wide Web conference, 2006.
  38. LU, H., A. C. KOT, Y. Q. SHI, Distancereciprocal distortion measure for binary document images, IEEE Signal Processing Letters, Vol. 11(2), pp. 228-231, 2004.
  39. AGUILERA, J., H. WILDENAUER, M. KAMPEL, M. BORG, D. THIRDE, J. FERRYMAN, Evaluation of motion segmentation quality for aircraft activity surveillance, 2nd Joint IEEE Int.Workshop on Visual Surveillance and Performance Evaluation of Tracking Surveillance, pp. 293- 300, 2005.
  40. PRATIKAKIS, I. B. GATOS, K. NTIROGIANNIS, ICDAR 2011 Document Image Binarization Contest, Intl. Conf. on Document Analysis and Recognition, pp. 1506-1510, 2011.

https://doi.org/10.24846/v24i3y201504