版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、Reliable Tracking of Human Arm Dynamics by Multiple Cue Integration and Constraint Fusion Yusuf Azoz Lalitha Devi Rajeev SharmaDepartment of Computer Science & Engineering The Pennsylvania State University Un
2、iversity Park, PA 16802 azoz, devi, rsharma @cse.psu.eduAbstractThe use of hand gestures provides an attractive means of in-teracting naturally with a computer generated display. Using one or more video cameras, the
3、hand movements can poten- tially be interpreted as meaningful gestures. One key problem in building such an interface without a restricted setup is the ability to localize and track the human arm robustly in image seq
4、uences. This paper proposes a multiple cue-based local- ization scheme combined with a tracking framework to reliably track the human arm dynamics in unconstrained environments. The localization scheme integrates the
5、multiple cues of motion, shape, and color for locating a set of key image features. Using constraint fusion, these features are tracked by a modified Ex- tended Kalman Filter that exploits the articulated structure of
6、 the arm. We also propose an interaction scheme between track- ing and localization for improving the estimation process while reducing the computational requirements. The performance of the frameworks is validated wit
7、h the help of extensive experi- ments and simulations.1 IntroductionThere has been a recent explosion of technologies that make various novel forms of computer generated display, such as vir- tual reality, augmented real
8、ity, and other 3-D displays widely accessible. However, the technologies available for human- computer interaction (HCI) have lagged behind. This has spawned a very active research toward new devices and tech- niqu
9、es in HCI. These emerging techniques include computer vision-based interpretation of human body motion and speech recognition.In particular, hand gestures provide an attractive alterna-tive to cumbersome interface dev
10、ices used presently for HCI. The HCI interpretation of gestures requires that dynamic and/or static configurations of the human hand, arm and, sometimes, body be measurable by the machine. The challenge in acquir- ing
11、 the arm dynamics with a vision-based approach is to put as little restrictions on the setup and the user of the system as possible.Past work on hand motion analysis use simple background or some restrictions on the us
12、er such as gloves and markers, [4, 7]. Localization problem is ignored by making simple as- sumptions which will not hold in a complex environment. Thereare several face localization techniques that use cues of motion,
13、 color, or shape individually, [6, 10]. Graf et al. proposed the us- age of multiple cues of motion, shape and color in conjunction to segment face [1] which results in a more robust segmenta- tion. These methods cannot
14、be directly extended for hand/arm localization for several reasons—the hand is generally moving, could be occluded, the clothing could result in arbitrary shape and color, etc. Thus a combination of multimodal localizati
15、on scheme and effective tracking may be essential.Several human body tracking schemes have been proposed in literature. Lee and Kunii incorporated the constraints that con- trol the behavior of human hands [5] in hand mo
16、tion analysis. Several researchers used Kalman filter in human body motion analysis [8, 9], but none of these schemes exploit the constraints of the physical model in the estimation process. Further, no de- tailed tra
17、cking of the arm dynamics has been reported. Yet it is an important problem since arm dynamics can play an important role in gesture analysis.In this paper, we propose a multiple cue-based localization scheme combined
18、with a tracking framework based on con- straint fusion, to perform tracking of arm dynamics in uncon- strained environments. The localization scheme uses multiple cues of motion, shape and color. It performs robustly in
19、 a clut- tered environment, and provides the locations of the shoulder, elbow and hand points with their corresponding uncertainties to the tracking framework. The geometric constraints of the arm are fused with a mod
20、ified Extended Kalman filter in tracking in order to fine-tune and complete the data obtained from the localization. The arm is tracked in spite of occlusions and, the complete dynamics of the arm are obtained as an o
21、utput of the tracking. The tracking framework also provides the boundaries of an uncertainty region for each feature point as a feedback that reduces the search area for the localization. Performance analysis with t
22、he help of several experiments and simulations verify the practical feasibility of the localization, tracking, and the interaction schemes.2 Tracking with Constraint FusionThe hand tracking framework is based on an E
23、xtended Kalman filter with an additional fusion loop. The framework exploits the articulated structure of the arm by fusing the con- straints to the system in this additional loop. The output of the tracking consists
24、of the position, velocity and acceleration pa- rameters of the arm feature points. These parameters are crucial color is uniquely identified by chromatic colors even in varying illumination. Chromatic colors( , ), know
25、n as “pure” colors without brightness, are defined as follows:(4)(5)Here R, G, B are respectively the red, green and blue val- ues of an RGB image. The above two equations define amapping. Color blue is redundant after
26、 normalization as . We apply a color histogram-based approach forthe segmentation of the hand and face from the rest of the scene. The color space table is populated during the initialization procedure from manuall
27、y marking out a skin region. The result is a two dimensional color histogram of the user’s skin color which is then normalized to one. The histogram gives the prob- ability of each color in the chromatic color space.
28、A pixel is marked as a part of the skin if the probability of the correspond- ing and values in the histogram is above a certain threshold.Motion Cue.Motion is another important cue in the localization since gen-era
29、lly the arm moves the most in a typical HCI scene. We use edges for shape analysis to extract the arm from the image, but due to the presence of a cluttered background the edge detection results in other edges than the e
30、dges of the arm. The motion cue is used in order to eliminate the non-moving edges belonging to the background and other parts of the human body.Motion cue is used in conjunction with the edges to find a set of time vary
31、ing edges (TVE). The absolute difference in a neigh- borhood (5x5 pixels) surrounding each pixel gives the temporalface. The cluster belonging to the hand is confirmed by check- ing out the strongest time varying edges i
32、n the search region in order to eliminate false clusters. The hand point is located as the center of mass of this cluster. The cluster belonging to the face is used to find the shoulder point. Since the face moves
33、relatively less than the other parts of the body, it is assumed that a constant distance is kept between the shoulder point and the center of the face. The shoulder point is computed using this constant distance that
34、 is obtained during the initialization. The center of masses belonging to the clusters of the hand and the face reduce the search region for locating the elbow point to an area constrained by these two points. The time
35、vary- ing edges in the reduced search region are grouped together us- ing contour following. The edges that start around the hand region with a high pixel count are smoothed and linearized us- ing the algorithm describ
36、ed in [11]. This selective process is done to exclude the time varying edges which might have been generated by noise or due to the movement of the whole body itself. These pixels are now searched for the elbow locat
37、ion. The elbow location is found by searching for the best fit of the arm model on this set of edges. The geometry of the arm is taken into consideration while locating these features. The pixel with the longest lines
38、on either sides and which has maximum distance from the hypothetical line between the hand and the shoulder is assigned to be the elbow pixel. Higher weight is given to length than to the distance. In case there is
39、a farther pixel than the “elbow” pixel which might have been caused due to folds in the clothing the person is wearing, it will get rejected as the length of the lines around it will be very small, compared to the “r
40、eal elbow” pixel.The uncertainty associated with the location of the hand and gradient of the image. If age point at time , TVE,combining the temporal gradient,ent, as follows:denotes the intensity at im- , are compu
41、ted bywith the spatial gradi-(6)shoulder is found using the clusters for the moving hand and face. The weighted center of mass of both clusters is computedby using the probabilities associated with the and valuesofth
42、e pixels belonging to the cluster. The distance of thiscenterfrom the actual center of mass gives a measure of uncertaintyassociated with the locations. The weighted center of mass isIf the magnitude of is above a certai
43、n threshold the pixel is classified as a TVE. Since the arm has the most mobility in a scene, it can be easily extracted from the background. This gives us a way of segmenting the edges of the arm.Shape cue and fusion o
44、f multiple cues.Various approaches of combining the outputs of multiple channels running on the same input exist in literature. One method is to evaluate the outputs of each channel independently and combine them at t
45、he very end. Another method is to com- bine the outputs of different classifiers at varying weights based on the error rates. The approach of integration we have followed is to combine the results of different channels
46、 as early as possi- ble. The shape information is incorporated after the integration of color and motion cues.The color segmentation results in various major color clus-ters.The motion cue is exploited in order to elimi
47、nate the clus- ters resulting from background noise. Varied size shape filtersare used to detect the clusters that belong to the hand and theAuthorized licensed use limited to: Nanjing Southeast University. Downlo
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
評(píng)論
0/150
提交評(píng)論