معلومات البحث الكاملة في مستودع بيانات الجامعة

عنوان البحث(Papers / Research Title)


NEURAL NETWORK BASED SEGMENTATION ALGORITHM


الناشر \ المحرر \ الكاتب (Author / Editor / Publisher)

 
ندى عبد الله رشيد الجبوري

Citation Information


ندى,عبد,الله,رشيد,الجبوري ,NEURAL NETWORK BASED SEGMENTATION ALGORITHM , Time 5/4/2011 6:54:07 PM : كلية التربية الاساسية

وصف الابستركت (Abstract)


A simple heuristic segmentation algorithm is used

الوصف الكامل (Full Abstract)


NEURAL NETWORK BASED SEGMENTATION ALGORITHM FOR ARABIC CHARACTERS RECOGNITION
Nada A. RasheedUniversity of BabylonCollege of Basic Education
 
Abstract
    
           This paper presents a novel holistic technique for classifying Arabic handwritten text documents, which it is performed in several steps. First, the Arabic handwritten document images are segmented into their connected parts. A simple heuristic segmentation algorithm is used which finds segmentation points in printed and cursive handwritten words. Second, several features are extracted from these connected parts and then combined to represent a word with one consolidated feature vector. Finally, Neocognitron type of the neural network is used to learn and classify the different fonts into word classes.

1. Introduction
      Object recognition is a very difficult task. Despite many efforts to solve this problem here still are no perfect solutions (Khalid et al, 2003).Character recognition is a long-standing, fundamental problem in pattern recognition. It has been the subject of a considerable number of studies and serves many useful applications (Ehsan et al, 2003).
       Artificial Neural Networks have proven to be successful in many areas of pattern recognition. Some researchers have used conventional methods for segmentation and recognition, while others have used ANN based methods for the character recognition process. Segmentation plays an important role in the overall process of handwriting recognition. Unfortunately, not only is it a vital process but it is also one that has not achieved very accurate results. This research attempts to integrate both conventional and intelligent methods for the segmentation of difficult printed and handwritten words, followed by the accurate recognition of characters.
      A simple heuristic segmentation algorithm is used which finds segmentation points in printed and cursive handwritten words. A neural network trained with valid segmentation points from a database of scanned, handwritten words is used to assess the correctness of the segmentation points found by the heuristic segmentation algorithm. Following segmentation and verification, the resulting characters are identified by another Artificial Neural Network is used.
The remainder of the paper is broken down into four sections. Section 2 briefly describes the Characteristics of the Arabic Writing, Section 3 provides several Preprocessing steps  are performed, Segmentation using a heuristic algorithm & Neural network trained with Neocognitron algorithm follows in Section 4, Conclusion is drawn in Section 5, and Recommendations  follows in Section 6.
 
2. Characteristics of the Arabic Writing
        Arabic language is a widely used language as more than one billion people use Arabic in either their daily activities or religion-related activities (Salama & Zaher, 2008).Arabic is written from right to left and is always cursive. It has 29 basic letters and eight diacritics (Gheith et al, 2008). Printed and handwritten Arabic text is cursive and Arabic characters can have four different shapes due to their position within the word. Moreover, Arabic character shape can be changed dramatically in different fonts (Mostafa, 2004). The table below shows the 29 letters and their various forms. Each letter has multiple forms depending on its position in the word. Each letter is drawn in an isolated form when it is written alone, and is drawn in up to three other forms when it is written connected to other letters in the word. For example, the letter Ain has four forms: Isolated form (ع) and Initial, Medial, and Final forms (ععع ), respectively from right to left. Moreover, letters Hamza, Teh, and Alef have other forms, as shown in the table below Within a word, every letter can connect from the right with the previous letter. However, there are six letters that do not connect from the left with the next letter see the table (Gheith et al, 2008).
 
The shapes of Arabic characters in different positions.


3. The Preprocessing
        After the images were acquired, they were converted into monochrome bitmap (BMP) form. Before any segmentation or processing could take place, it was then necessary to convert the images into binary representations of the handwriting. The dimension of the image used in this work is (250x250) pixels.
       The word images require some manipulation before the application of any segmentation. This process prepares the image and improves its quality in order to eliminate irrelevant information and to enhance the selection of the important features for recognition. This is known as preprocessing. It is performed to improve the robustness of features to be extracted.
Moreover Preprocessing steps are performed in order to reduce noise in the input images, and to remove most of the variability of the handwriting. It is well known that a person’s situation differs in each word at the same time. This leads to changes in inclination angles of the same person’s words. Hence, the rotation algorithm must be used to unify word orientation in a horizontal manner to overcome this problem.It is important to compute the angle (theta), which is used in the rotation operation. The rotation of an image requires the calculation of a new position for each point of the image after the transformation. Each image point is rotated through an angle (theta) about the origin, which varies from one word to other and can be calculated according to the inclination angle. The following Algorithm is used for this purpose.
Rotation Algorithm:
     
 

Before             AfterFigure 1. Image Rotation
 
4. Segmentation using a heuristic algorithm
          A simple heuristic segmentation algorithm was implemented which scanned handwritten words for important features to identify valid segmentation points between characters. The algorithm first scanned the word looking for minima s or arcs between letters, common in handwritten cursive script. In many cases these arcs are the ideal segmentation points, however in the case of letters, such as “ص”, “م” and “ة”, an erroneous segmentation point could be identified. Therefore the algorithm incorporated a “hole seeking” component which attempted to prevent invalid segmentation points from being found.
If an arc was found, the algorithm checked to see whether it had not segmented a letter in half, by checking for a “hole”. Holes, are found in letters which are totally or partially closed such as an “ص”, “ن” and so on. If such a letter was found then segmentation at that point did not occur. Finally, the algorithm performed a final check to see if one segmentation point was no1 too close to another. This was done by ascertaining if the distance between the last segmentation point and the position being checked was equal to or greater than the average character width of a particular word. If the segmentation point in question was too close to the previous one, segmentation was aborted. Conversely, if the distance between the position being checked and the last segmentation point was greater than the average character width, a segmentation point was forced.
Note: Click on the pdf icon below for downloading the paper.

تحميل الملف المرفق Download Attached File

تحميل الملف من سيرفر شبكة جامعة بابل (Paper Link on Network Server) repository publications

البحث في الموقع

Authors, Titles, Abstracts

Full Text




خيارات العرض والخدمات


وصلات مرتبطة بهذا البحث