版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、<p> Multi-Agent Quadrotor Testbed Control Design: Integral Sliding Mode vs. Reinforcement Learning</p><p> Steven L. Waslander, Gabriel M. Hoffmann</p><p> Ph.D. Candidate Aeronautics a
2、nd Astronautics Stanford University</p><p> {stevenw, gabeh}@stanford.edu</p><p> Jung Soon Jang Research Associate Aeronautics and Astronautics Stanford University jsjang@stanford.edu</p&
3、gt;<p> Claire J. Tomlin Associate Professor Aeronautics and Astronautics Stanford University tomlin@stanford.edu</p><p> Abstract—The Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Cont
4、rol (STARMAC) is a multi-vehicle testbed currently comprised of two quadrotors, also called X4-?yers, with capacity for eight. This paper presents a comparison of control design techniques, speci?cally for outdoor altitu
5、de control, in and above ground effect, that accommodate the unique dynamics of the aircraft. Due to the complex air?ow in- duced by the four interacting rotors, classical linear techniques failed to prov</p><
6、;p> I. INTRODUCTION </p><p> As ?rst introduced by the authors in [1],the Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Control(STARMAC) is an aerial platform intended to validate novel mu
7、lti-vehicle control techniques and present real-world problems for further investigation.The base vehicle for STARMAC is a four rotor aircraft with ?xed pitch blades, referred to as a quadrotor, or an X4-?yer.They are c
8、apable of 15 minute outdoor ?ights in a 100m square area[1].</p><p> Fig. 1. One of the STARMAC quadrotors in action.</p><p> There have been numerous projects involving quadrotors to date,wit
9、h the ?rst known hover occurring in October,1922[2]. Recent interest in the quadrotor concept has been sparked by commercial remote control versions, such as the DraganFlyer IV[3]. Many groups [4]–[7]have seen significan
10、t success in developing autonomous quadrotor vehicles. To date,however,STARMAC is the only operational multi-vehicle quadrotor platform capable of autonomous outdoor ?ight, without tethers or motion guides.</p>&l
11、t;p> The ?rst major milestone for STARMAC was autonomous hover control,with closed loop control of attitude, altitude and position. Using inertial sensing, the attitude of the aircraft is simple to control, by applyi
12、ng small variations in the relative speeds of the blades. In fact, standard integral LQR techniques were applied to provide reliable attitude stability and tracking for the vehicle.Position control was also achieved with
13、 an integral LQR, with careful design in order to ensure spectral sep</p><p> Unfortunately, altitude control proves less straightforward. There are many factors that affect the altitude loop specifically t
14、hat do not amend themselves to classical control techniques. Foremost is the highly nonlinear and destabilizing effect of four rotor downwashes interacting. In our experience, this effect becomes critical when motion is
15、not damped by motion guides or tethers. Empirical observation during manual ?ight revealed a noticeable loss in thrust upon descent through the highly </p><p> In order to accommodate this combination of n
16、oise and disturbances, two distinct approaches are adopted. Integral Sliding Mode (ISM) control[10]–[12] takes the approach that the disturbances cannot be modeled, and instead designsa control law that is guaranteed to
17、be robust to disturbances as long as they do not exceed a certain magnitude. Model-based reinforcement learning[13] creates a dynamic model based on recorded inputs and responses, without any knowledge of the underlying
18、 dynamics, and </p><p> II. SYSTEM DESCRIPTION </p><p> STARMAC consists of a ?eet of quadrotors and a ground station. The system communicates over a Bluetooth Class 1 network. The core of th
19、e aircraft are microcontroller circuit boards designed and assembled at Stanford, for this project. The microcontrollers run real-time control code, interface with sensors and the ground station, and supervise the syste
20、m. </p><p> The aircraft are capable of sensing position, attitude, and proximity to the ground. The differential GPS receiver is theTrimble Lassen LP, operating on the L1 band, providing 1Hz updates. The I
21、MU is the MicroStrain 3DM-G, a low cost, light weight IMU that delivers 76 Hz attitude, attitude rate, and acceleration readings. The distance from the ground is found using ultrasonic ranging at 12 Hz.</p><p&
22、gt; The ground station consists of a laptop computer, to interface with the aircraft, and a GPS receiver, to provide differential corrections. It also has a battery charger, and joysticks for control-augmented manual ?i
23、ght, when desired.</p><p> III. QUADROTOR DYNAMICS</p><p> The derivation of the nonlinear dynamics is performed in North-East-Down (NED) inertial and body ?xed coordinates. Let {eN , eE , eD
24、} denote the inertial axes, and {xB , yB , zB } denote the body axes, as de?ned in Figure 2. Euler angles of the body axes are {φ, θ, ψ} with respect to the eN , eE and eD axes, respectively, and are referred to as roll,
25、 pitch andyaw. Let r be de?ned as the position vector from the inertial origin to the vehicle center of gravity (CG), and let ωB be de?ned as the a</p><p> Fig.2. Free body diagram of a quadrotor aircraft.&
26、lt;/p><p> The rotors, numbered 1?4, are mounted outboard on the xB,yB,?xB and -yB axes,respectively, with position vectors ri with respect to the CG. Each rotor produces an aerodynamic torque, Qi , and thrust
27、, Ti , both parallel to the rotor’s axis of rotation, and both used for vehicle control.Here, , where ui is the voltage applied to the motors, as determined from a load cell test. In ?ight, Ti can vary greatly from this
28、approximation. The torques, Qi , are proportional to the rotor thrust, and are giv</p><p> The body drag force is de?ned as DB , vehicle mass is m, acceleration due to gravity is g, and the inertia m
29、atrix is I ∈ R3×3 . A free body diagram is depicted in Figure 2. The total force, F, and moment, M, can be summed as,</p><p><b> (1)</b></p><p><b> (2)</b></p&g
30、t;<p> The full nonlinear dynamics can be described as,</p><p><b> ?。?)</b></p><p> where the total angular momentum of the rotors is assumed to be near zero, because they a
31、re counter-rotating. Near hover conditions, the contributions by rolling moment and drag can be neglected in Equations (1) and (2). De?ne the total thrust as The translational motion is de?ned by,</p><p><
32、;b> (4)</b></p><p> Where Rφ,Rθ, and Rψ are the rotation matrices for roll, pitch, and yaw, respectively. Applying the small angle approximation to the rotation matrices,</p><p><b
33、> ?。?)</b></p><p> Finally, assuming total thrust approximately counteracts gravity, except in the eD axis.</p><p><b> ?。?)</b></p><p> For small angular velo
34、cities, the Euler angle accelerations are determined from Equation (3) by dropping the second order term,ω×Iω, and expanding the thrust into its four constituents. The angular equations become,</p><p>&
35、lt;b> (7)</b></p><p> Where the moment arm lengthl=||ri×zB||is identical for all rotors due to symmetry. The resulting linear models can now be used for control design. </p><p>
36、 IV. ESTIMATION AND CONTROL DESIGN</p><p> Applying the concept of spectral separation, inner loop control of attitude and altitude is performed by commanding motor voltages, and outer loop position control
37、 is performed by commanding attitude requests for the inner loop. Accurate attitude control of the plant in Equation (7) is achieved with an integral LQR controller design to account for thrust biases. </p><p&
38、gt; Position estimation is performed using a navigation ?lter that combines horizontal position and velocity information from GPS, vertical position and estimated velocity information from the ultrasonic ranger, and acc
39、eleration and angular rates from the IMU in a Kalman ?lter that includes bias estimates. Integral LQR techniques are applied to the horizontal components of the linear position plant described in Equation (6). The result
40、ing hover performance is shown in Figure 6. </p><p> As described above, altitude control suffers exceedingly from unmodeled dynamics. In fact, manual command of the throttle for altitude control remains a
41、challenge for the authors to this day. Additional complications arise from the ultrasonic ranging sensor, which has frequent erroneous readings, as seen in Figure 3. To alleviate the effect of this noise, rejection of in
42、feasible measurements is used to remove much of the non-Gaussian noise component. This is followed by altitude and altitude rat</p><p> Fig. 3. Characteristic unprocessed ultrasonic ranging data, displaying
43、 spikes, false echoes and dropouts. Powered ?ight commences at 185 seconds.</p><p> Integral Sliding Mode Control</p><p> A linear approximation to the altitude error dynamics of a quadrotor a
44、ircraft in hover is given by,</p><p><b> ?。?)</b></p><p> where{x1, x2}={(rz,des?rz),( rz,des?r˙z)}are the altitude error states,ui is the control input, andξ(·) is a bounded m
45、odel of disturbances and dynamic uncertainty. It is assumed that ξ(·) satis?es ||ξ||≤γ where γ is the upper bounded norm of ξ(·). </p><p> In early attempts to stabilize this system, it was observ
46、ed that LQR control was not able to address the instability and performance degradation due to ξ(g, x). Sliding Mode Control (SMC) was adapted to provide a systematic approach to the problem of maintaining stability and
47、consistent performance in the face of modeling imprecision and disturbances. However, until the system dynamics reach the sliding mani-fold, such nice properties of SMC are not assured. In order to provide robust control
48、 th</p><p><b> (9)</b></p><p> Where Kp and Kd are proportional and derivative loop gains that stabilize the linear dynamics without disturbances. For disturbance rejection, a slid
49、ing surface,s, is designed,</p><p><b> ?。?0)</b></p><p> such that state trajectories are forced towards the manifold s= 0. Here,s0 is a conventional sliding mode design, Z is an ad
50、ditional term that enables integral control to be included, and α, k∈R are positive constants. Based on the following Lyapunov function candidate,</p><p> , the control component,ud, can be determined such
51、that V <0, guranteeing convergence to the sliding manifold.</p><p><b> ?。?1)</b></p><p> The above condition holds if z = ?α(up+kx2) and ud can be guaranteed to satisfy,</p>
52、;<p><b> ?。?2)</b></p><p> Since the disturbances,ξ(g, x), are bounded by γ, de?ne ud to be ud=?λs with λ∈R. Equation (11) becomes,</p><p><b> ?。?3)</b></p>
53、<p> and it can be seen that λ|s| ?γ >0. As a result, for up and ud as above, the sliding mode condition holds when,</p><p><b> ?。?4)</b></p><p> With the input derived a
54、bove, the dynamics are guaranteed to evolve such that s decays to within the boundary layer,, of the sliding manifold. Additionally, the system does not suffer from input chatter as conventional sliding mode controllers
55、 do, as the control law does not include a switching function along the sliding mode.</p><p> V. REINFORCEMENT LEARNING CONTROL</p><p> An alternate approach is to implement a reinforcement le
56、arning controller. Much work has been done on continuous state-action space reinforcement learning methods[13], [14]. For this work, a nonlinear,nonparametric model of the system is ?rst constructed using ?ight data, app
57、roximating the system as a stochastic Markov process[15], [16]. Then a model-based reinforcement learning algorithm uses the model in policy-iteration to search for an optimal control policy that can be implemented on th
58、e em</p><p> In order to model the aircraft dynamics as a stochastic Markov process, a Locally Weighted Linear Regression (LWLR) approach is used to map the current state,S(t)∈Rns, and input,u(t)∈Rnu, onto
59、the subsequent state estimate,S(t+ 1).</p><p> In this application,,where V is the battery level. In the altitude loop, the input,u∈R, is the total motor power,u. The subsequent state mapping is the summati
60、on of the traditional LWLR estimate, using the current state and input, with the random vector,v∈Rns, representing unmodeled noise. The value for v is drawn from the distribution of output error as determined by using a
61、maximum likelihood estimate[16] of the Gaussian noise in the LWLR estimate. Although the true distribution is not perfect</p><p> The LWLR method[17] is well suited to this problem, as it ?ts a non-parametr
62、ic curve to the local structure of the data. The scheme extends least squares by assigning weights to each training data point according to its proximity to the input value, for which the output is to be computed. The te
63、chnique requires a sizable set of training data in order to re?ect the full dynamics of the system, which is captured from ?ights ?own under both automatic and manually controlled thrust, with the attitud</p><
64、p> For m training data points, the input training samples are stored in X∈R(m)×(ns+nu+1), and the outputs corresponding to those inputs are stored inY∈Rm×ns. These matrices are de?ned as</p><p>
65、;<b> ,(15)</b></p><p> The column of ones in X enables the inclusion of a constant offset in the solution, as used in linear regression.The diagonal weighting matrix W ∈ Rm×m , which acts
66、 on X , has one diagonal entry for each training data point. That entry gives more weight to training data points that are close to the S(t) and u(t) for which S? (t + 1) is to be computed.</p><p> The dist
67、ance measure used in this work is</p><p><b> ?。?6)</b></p><p> Where x(i) is the ith row of X, x is the vector,</p><p> and ?t parameter τ is used to adjust the range
68、of in?uence of training points. The value for τ can be tuned by cross validation to prevent over- or under-?tting the data. Note that it may be necessary to scale the columns before taking the Euclidean norm to prevent u
69、ndue in?uence of one state on the W matrix. </p><p> The subsequent state estimate is computed by summing the LWLR estimate with v,</p><p><b> ?。?7)</b></p><p> Becaus
70、e W is a continuous function of x and X, as x is varied, the resulting estimate is a continuous non-parametric curve capturing the local structure of the data. The matrix computations, in code, exploit the large diagonal
71、 matrix W; as each Wi,i is computed, it is multiplied by row x(i), and stored in W X. </p><p> The matrix being inverted is poorly conditioned, because weakly related data points have little in?uence, so t
72、heir contribution cannot be accurately numerically inverted. To more accurately compute the numerical inversion, one can perform a singular value decomposition,</p><p> (XTW X) =UΣVT. Then, numerical error
73、during inversion can be avoided by using the n singular values σi with values of , where the value of Cmax is chosen by cross validation. In this work,Cmax ≈10 was found to minimize numerical error, and was typically sat
74、is?ed by n= 1. The inverse can be directly computed using the n upper singular values in the diagonal matrixΣn∈Rn×n, and the corresponding singular vectors, in Un∈Rm×n and Vn∈Rm×n. Thus, the stochastic Mar
75、kov model becomes</p><p><b> (18)</b></p><p> Next, model-based reinforcement learning is implemented, incorporating the stochastic Markov model, to design a controller. A quadrati
76、c reward function is used,</p><p><b> ?。?9)</b></p><p> whereR:R2ns→R,C1>0 and C2>0 are constants giving reward for accurate tracking and good damping respectively, and is th
77、e reference state desired for the system. </p><p> The control policy maps the observed state S onto the input </p><p> Command u. In this work, the state space has the constraint of rz ≥0, an
78、d the input command has the constraint of 0≤u≤ u max. The control policy is chosen to be</p><p><b> ?。?0)</b></p><p> Where w∈R nc is the vector of policy coef?cients w1, . . . , w
79、nc. Linear functions were suf?cient to achieve good stability and performance. Additional terms, such as battery level and integral of altitude error, could be included to make the policy more resilient to differing ?igh
80、t conditions. Policy iteration is performed as explained in Algorithm 1. The algorithm aims to ?nd the value of w that yields the greatest total reward R total, as determined by simulating the system over a ?nite hori<
81、;/p><p> Algorithm 1 Model-Based Reinforcement Learning </p><p> 1: Generate set S0 of random initial states </p><p> 2: Generate set T of random reference trajectories </p>
82、<p> 3: Initialize w to reasonable values </p><p> 4:R best← ?∞,W best←w</p><p><b> 5: repeat</b></p><p> 6: Rtotal←0</p><p> 7: for s0∈S0, t∈T
83、do</p><p> 8: S(0)←s0</p><p> 9: for t= 0 to tmax?1 do</p><p> 10: u(t)←π(S(t) , w)</p><p> 11: S(t+ 1)←LWL( R(S(t) , u(t) ) +v</p><p>
84、 12: R total←R total+R(S(t+ 1))</p><p> 13: end for</p><p> 14: end for</p><p> 15: if R total> R best then</p><p> 16: Rbest←Rtotal,wbest←w<
85、;/p><p> 17: end if</p><p> 18: Add Gaussian random vector to w best, store as w </p><p> 19: until w best converges </p><p> In policy iteration, a ?xed set of
86、random initial conditions and reference trajectories are used to simulate ?ights at each iteration, with a given policy parameterized by w. It is necessary to use the same random set at each iteration in order for conver
87、gence to be possible[15]. After each iteration, the new value of w is stored as w best if it outperforms the previous best policy, as determined by comparing Rtotal to Rbest, the previous best reward encountered. Then, a
88、 Gaussian random vector i</p><p> By using a Gaussian update rule for the policy weights,w, it is possible to escape local maxima of Rtotal. The highest probability steps are small, and result in re?nement
89、of a solution near a local maximum of Rtotal. However, if the algorithm is not at the global maximum, and is allowed to continue, there exists a ?nite probability that a suf?ciently large Gaussian step will be performed
90、such that the algorithm can keep ascending.</p><p> VI. FLIGHT TEST RESULTS</p><p> A. Integral Sliding Mode</p><p> The results of an outdoor ?ight test with ISM control can be
91、seen in Figure 4. The response time is on the order of 1-2 seconds, with 5 seconds settling time, and little to no steady state offset. Also, an oscillatory character can be seen in the response, which is most likely bei
92、ng triggered by the nonlinear aerodynamic effects and sensor data spikes described earlier.</p><p> Fig. 4. Integral sliding mode step response in outdoor ?ight test.</p><p> Compared to linea
93、r control design techniques implemented on the aircraft, the ISM control proves a signi?cant enhancement. By explicitly incorporating bounds on the unknown disturbance forces in the derivation of the control law, it is p
94、ossible to maintain stable altitude on a system that has evaded standard approaches.</p><p> B. Reinforcement Learning Control</p><p> One of the most exciting aspects of RL control design is
95、its ease of implementation. The policy iteration algorithm arrived at the implemented control law after only 3 hours on a Pentium IV computer. Figure 5 presents ?ight test results for the controller. The high ?delity mod
96、el of the system, used for RL control design, provides a useful tool for comparison of the RL control law with other controllers. In fact, in simulation with linear controllers that proved unstable on the quadrotor, ?igh
97、t p</p><p> The locally weighted linear regression model showed many relations that were not re?ected in the linear model, but that re?ect the physics of the system well. For instance, with all other states
98、 held ?xed, an upward velocity results in more acceleration at the subsequent time step for a throttle level, and a downward velocity yields the opposite effect. This is essentially negative damping. The model also shows
99、 a strong ground effect. That is, with all other states held ?xed, the closer the vehi</p><p> Fig. 5. Reinforcement learning controller response to manually applied step input, in outdoor ?ight test. Spike
100、s in state estimates are from sensor noise passing through the Kalman ?lter.</p><p> The reinforcement learning control law is susceptible to system disturbances for which it is not trained. In particular,
101、varying battery levels and blade degradation may cause a reduction in stability or steady state offset. Addition of an integral error term to the control policy may prove an effective means of mitigating steady state dis
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 機(jī)械畢業(yè)設(shè)計(jì)英文外文翻譯-倒立擺系統(tǒng)
- 畢業(yè)設(shè)計(jì)(論文)外文資料翻譯----倒立擺
- 單級(jí)倒立擺畢業(yè)設(shè)計(jì)外文翻譯
- 直升機(jī)旋翼動(dòng)平衡試驗(yàn)臺(tái)控制系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)研究.pdf
- 倒立擺系統(tǒng)畢業(yè)設(shè)計(jì)
- 畢業(yè)設(shè)計(jì) 四旋翼飛行器控制系統(tǒng)設(shè)計(jì)
- 直升機(jī)旋翼動(dòng)平衡試驗(yàn)臺(tái)主軸低速控制系統(tǒng)的設(shè)計(jì)與研究.pdf
- 閉式功率流變速器試驗(yàn)臺(tái)控制系統(tǒng)設(shè)計(jì)畢業(yè)設(shè)計(jì)
- 倒立擺專家控制研究畢業(yè)設(shè)計(jì)
- 轉(zhuǎn)子試驗(yàn)臺(tái)畢業(yè)設(shè)計(jì)
- 基于多Agent的倒立擺模糊控制系統(tǒng).pdf
- 機(jī)械電子工程畢業(yè)設(shè)計(jì)-倒立擺系統(tǒng)的控制設(shè)計(jì)
- 圓形軌道倒立擺控制系統(tǒng)設(shè)計(jì).pdf
- 畢業(yè)論文-基于labview的倒立擺控制系統(tǒng)設(shè)計(jì)
- 畢業(yè)設(shè)計(jì)---基于plc控制的液壓試驗(yàn)臺(tái)設(shè)計(jì)
- 倒立擺的控制外文翻譯
- 畢業(yè)設(shè)計(jì)---伺服閥試驗(yàn)臺(tái)設(shè)計(jì)
- 畢業(yè)論文-基于labview的倒立擺控制系統(tǒng)設(shè)計(jì)
- 輪胎試驗(yàn)臺(tái)控制系統(tǒng)設(shè)計(jì)及研究.pdf
- 倒立擺的控制外文翻譯
評(píng)論
0/150
提交評(píng)論