A Review On Feature Extraction For Indian And American .

4m ago
29 Views
0 Downloads
582.20 KB
5 Pages
Transcription

Neelam K. Gilorkar et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (1) , 2014, 314-318A Review on Feature Extraction for Indian andAmerican Sign LanguageNeelam K. Gilorkar, Manisha M. IngleDepartment of Electronics & Telecommunication,Government College of Engineering, Amravati, IndiaAbstract- In the field of human computer interaction (HCI)Sign Language have been the emphasis of significant researchin real time. Such systems are advance and they are meant tosubstitute interpreters. Recently several techniques have beenadvanced in this area with the improvement of imageprocessing and artificial intelligence techniques. Indian SignLanguage(ISL) are double handed and hence it is moreintricate compare to single handed American SignLanguage(ASL). Many researchers use only single ASL orISL sign for creating their database. In this paper recentresearch and development of sign language are reviewedbased on manual communication and body language. Signlanguage recognition system typically elaborate three stepspreprocessing, feature extraction and classification.Classification methods used for recognition are NeuralNetwork(NN), Support Vector Machine(SVM), HiddenMarkov Models(HMM), Scale Invariant Feature Transform(SIFT),etc.illustrated in figure 1. It consists of both word levelgestures and finger spelling. Fingerspelling is used to formwords with letter by letter coding. Letter by letter signingcan be used to express words for which no sign exists, thewords for which the signer does not know the gestures or toemphasis or clarify a particular word. So the recognition ofthe fingerspelling has a key importance in sign languagerecognition. This paper present a method of automaticrecognition system of static gestures in Indian andAmerican sign languages alphabet and numbers. The signconsidered for recognition are 26 letters alphabets and the0-9 numbers. Some ASL and ISL gestures are shown infigure 1.Keywords-Feature extraction, Hidden Markov Model(HMM) ,Neural Network (NN), Support Vector Machine(SVM), ScaleInvariant Feature Transform.1. INTRODUCTIONSign language is the medium of communication languagegenerally used by deaf-dump community. It uses gesturesinstead of sound patterns to convey meaning. Variousgestures are composed by movements and orientations ofhand, body or facial expressions and lip-patterns forcommunicating information or emotions. Sign language isnot universal and just like spoken language, it has its ownregional dialects. American Sign Language (ASL), BritishSign Language (BSL), Indian Sign Language (ISL) etc. aresome of the common sign languages in the world. Acrossthe world million’s of people are deaf. They find itdifficult to communicate with the normal people as thehearing or normal people are unaware of sign language.There arises the need for sign language interpreters whocan interpret sign language to spoken language and viceversa. But, the availability of such interpreters islimited ,expensive and does not work throughout the lifeperiod of a deaf person. This resulted in the development ofautomatic sign language recognition systems which couldautomatically translate the signs into corresponding text orvoice without the help of sign language interpreters. Suchsystems can help in the development of deaf communitythrough human computer interaction and can bridge the gapbetween deaf people and normal people in the society.By deaf community in ISL is expressed by both handgestures and in ASL is expressed by single hand gesture aswww.ijcsit.com(a)(d)(b)(e)(c)(f)Figure 1. ASL signs for (a)3, (b)A, (c)C and ISL signs for(d)Q, (e)R, (f) S2. DIFFERENT APPROACHES FOR SIGN LANGUAGERECOGNITIONMany researchers have endorsed various techniques forsign language recognition systems. Mainly, these systemsare alienated into two approaches, namely the glove-basedand the vision-based approach[1]. The first categoryrequires user to wear a sensor glove or a colored glove. Thewearing of the glove simplifies the task of segmentationduring processing. The Glove based approaches utilizesensor devices for digitizing hand and finger movementsinto multi-parametric data. The additional sensors facilitatedetection of hand configuration and movement. However,the sensor devices are quite costly and wearing gloves ortrackers is sore and enlarges the “time-to interface,” orsetup time. However, the data glove and its attached wiresare still inconvenient and awkward for users to wear.314

Neelam K. Gilorkar et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (1) , 2014, 314-318Moreover, the cost of the data glove is often too expensivefor regular users. The shortcoming of this approach is thatthe user has to wear the sensor hardware along with theglove during the operation of the system. On the otherhand, vision-based methods are more natural and useful forreal-time applications. Vision based approach uses imageprocessing algorithms to detect and track hand signs as wellas facial expressions of the user. This approach is easier tothe user since there is no need to wear any extra hardware.However, there are accuracy problems related to imageprocessing algorithms and these problems are yet to bemodified.Also,vision based systems need application-specific imageprocessing algorithms, programming, and machinelearning. The necessity for all these to work in a variety ofenvironments causes several issues because these systemsrequire user and camera independent and invariant againstthe background and lighting changes to attain real-timeperformance. From the perspective of the features used torepresent the hand, vision based hand tracking and gesturerecognition algorithms are classified into two categories: 3D hand model-based approach Appearance based approach2.1 3D Hand Model Based ApproachMany methodologies used 3D kinematic hand model withsignificant degrees of freedom (DOF), and calculate thehand parameters by matching the input frames and theappearance projected by the 3D hand model. 3D modelbased methods make use of 3D information of keyelements of the body parts. Using this information, severalimportant parameters, like palm position, joint angles etc.,can be obtained. This approach uses volumetric or skeletalmodels, or a combination of the two. In computeranimation industry and for computer vision purposes,volumetric approach is better suited. This approach is verycomputational intensive and also, systems for live analysisare still to be developed. However, 3D hand model methodrequired a huge database with the entire characteristicshapes under several views due to deformable objects withmany DOFs.2.2 Appearance Based ApproachAppearance-based systems use images or videos as inputs.They directly understand from these videos/images. Theydon’t use a spatial representation of the body. Theparameters are derived directly from the images or videosusing a template database. In this method image featuresare extract to model the visual appearance of the hand andcompare these features with the extracted features from thevideo frames as our approach. They have real timeperformance because of the easier 2-D image features thatare used[2].The prime issues in hand gesture recognition are: (i) handlocalization, (ii) scale and rotational invariance and (iii)viewpoint and person/user independence. Earlier worksexpected the gestures to be performed in a uniformbackground. This required a simple thresholding techniqueto obtain the hand silhouette. For a non-uniformwww.ijcsit.combackground, skin color detection is the most popular andthe general method for hand localization . The skin colorcues are combined with the motion cues and the edgeinformation for improving the efficiency of hand detection.Segmented hand images are usually normalized for thesize, orientation and illumination variations. The featurescan be extracted directly from the intensity images or thebinary silhouettes or the contours.A general block diagram of sign language recognitionsystem is shown in Figure 1. The recognition process isalienated into two phases- training and testing. In thetraining phase, the classifier has to be trained using thetraining dataset. The database can be either created by theresearcher himself or an available database can be used. Anexternal webcam, digital camera or inbuilt webcam in thelaptops can be used to capture the training images. Most ofthe sign language recognition systems classify signsaccording to hand gestures only or in other words, facialexpressions are omitted. The important steps involved intraining phase are creation of database, preprocessing,feature extraction and training the classifier. The testingphase contains video/image acquisition (input can bevideos or images), preprocessing, feature extraction andclassification.Figure 2. Generalized Block Diagram of Sign LanguageRecognition System3. DATA ACQUISITIONData acquisition should be mostly perfect as possible forefficient hand gesture recognition. Suitable input deviceshould be selected foe data acquisition. There are manyinput devices for data acquisition. Some of them are Datagloves, markers, hand images from webcam or stereocamera.4. PREPROCESSINGA preprocessing step is carried out on the training imagesto extract the region of interest (ROI). The ROI can behands if only hand gestures are considered or both face andhands if the facial gestures are also included. Usually thepreprocessing step consists of filtering, imageenhancement, image resizing, segmentation, edgedetection, normalization and morphological filtering.Filtering and image enhancement can be any one of thecommonly used methods. For segmentation, the algorithmthat better suits the input video/images has to be selected.Thresholding, Background subtraction, skin-based andmotion based segmentation, noise removal, statisticalmodel are the commonly used segmentation techniques.315

Neelam K. Gilorkar et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (1) , 2014, 314-318During testing phase, the test images or videos are alsopreprocessed to extract the region of interest.5. FEATURE EXTRACTIONUnder different condition, the performance of differentfeature detectors will be significantly different. Thefeatures should be efficiently and reliably extracted to findshapes and robustly irrespective of changes in illuminationlevels, position, orientation and size of the object in avideo/image. In an image, objects are represented assummation of pixels and for recognition of an object thereis need to describe the properties of these groups of pixels.The features can be obtained in different ways- waveletdecomposition , Haar wavelets, Haar-like features [6],texture features, Euclidean distance[7], scale invariantfeature transform[11], Principal Component Analysis(PCA) [10], Fourier descriptors etc. The feature vector thusobtained using any one of the feature extraction methods isused for training the classifier. Thus feature extraction isthe most crucial step of sign language recognition since theinputs to the classifier are the feature vectors obtained fromthis step.6. CLASSIFICATIONA classifier task is to assigned a new feature vector to somepredefined categories in order to recognize the sign. Thecategory consists of a set of features obtained during thetraining phase using number of training images.Classification mainly concentrates on finding the bestmatching features vector among the set of referencefeatures and displays the text or plays the sound. The testinputs can be images or videos. Most frequently usedclassifiers are Hidden Markov Models (HMM), NeuralNetworks (NN), Multiclass Support Vector Machines(SVM), Fuzzy systems, K Nearest Neighbor (KNN) etc. Interms of recognition rate, the performance of the classifieris measured.6.1 Hidden Markov Models (HMM)As a first preference, researches prefer to use HiddenMarkov Model (HMM) [3] for the data containinginformation for dynamic hand gesture recognition. HMMis a doubly stochastic model and is appropriate for dealingwith the stochastic properties in gesture recognition. Thefirst well known application of HMM technology wasspeech recognition. A Hidden Markov Model is acollection of finite states connected by transitions. Eachstate is characterized into two sets of probabilities: atransition probability and a discrete or continuous outputprobability density function which gives the state, definesthe condition probability of each output symbol from afinite alphabet or a continuous random vector. HMMs areemployed to represent the gestures, and their parametersfrom the training data. Based on the most likelyperformance criterion, the gestures can be recognized byevaluating the trained HMMs. Another advantage ofHMM is its high recognition rates. Some researches usedHMM combining with other classifiers. The topology foran initial HMM can be resolute by estimating how manydifferent states are intricate in specifying a sign.www.ijcsit.comFigure 3. The four state HMM6.2 Haar-like TechniqueIn [6], the Haar-like feature is applied for hand detection.This approach consider adjacent rectangular regions at aspecific location in a detection window. Haar-like featuresconcentrate more on the information within a certain areaof the image rather than each single pixel. To enhanceclassification accuracy and attain real-time performance,the AdaBoost learning algorithm, was used whichadaptively choose the best features in each step andcombine them into a strong classifier. Haar-like techniquesuse Haar wavelet for successful gesture recognition. EachHaar-like feature comprises of two ot three connected “black” and “white” rectangles. Figure 4 shows theextended Haar-like feature set that was proposed byLienhart and Maydt[13]. The value of a Haar-like featureis the difference between sums of the pixel values in theblack and white rectangles,i.e.1Figure 4. Extended set of Haar-like