外文翻譯--基于嵌入式系統(tǒng)的寄存器分配的混合進化算法的解決方案_第1頁
已閱讀1頁,還剩23頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、<p><b>  畢業(yè)設計(論文)</b></p><p><b>  外文資料翻譯 </b></p><p>  學生姓名: </p><p>  學 號: </p>&l

2、t;p>  所在學院: </p><p>  專 業(yè): </p><p>  指導教師: </p><p>  2013年 4 月 18 日</p><p>

3、;  Hybrid Evolutionary Algorithm based solution for Register Allocation for Embedded Systems</p><p>  Anjali Mahajan</p><p>  G H Raisoni College of Engineering, Nagpur, India</p><p&g

4、t;  Email : armahajan@rediffmail.com</p><p><b>  M S Ali</b></p><p>  Prof. Ram Meghe Institute of Technology and Research,</p><p>  Badnera, Amravati, India</p>

5、<p>  Email : softalis@hotmail.com</p><p>  Abstract— Embedded systems have an ever-increasing need for optimizing compilers to produce high quality codes with a limited general purpose register set. Ei

6、ther memory or registers are used to store the results of computation of a program. As compared to memory, accessing a register is much faster, but they are scarce resources and have to be utilized very efficiently. The

7、optimization goal is to hold as many live variables as possible in registers in order to avoid expensive memory accesses. </p><p>  I. INTRODUCTION</p><p>  Register allocation is one of the mos

8、t important optimizations a compiler performs and is becoming increasingly important as the gap between processor speed and memory access time widens. Its goal is to find a way to map the temporary variables used in a pr

9、ogram into physical memory locations (either main memory or machine registers). </p><p>  Accessing a register is much faster than accessing memory; therefore one tries to use registers as much as possible.

10、Of course, this is not always possible, thus some variables must be transferred (spilled) to and from memory. This has a cost, the cost of load and store operations, which should be avoided as much as possible. Typically

11、, this degrades runtime performance and increases power consumption. Therefore, an efficient mapping should generally minimize the register requirements and the nu</p><p>  This approach attempts to combine

12、the flexibility of general-purpose programmable processors with the performance achieved by domain-specific architectureoptimizations. Compilers for embedded processors must cope with these architectural optimizations a

13、nd be able to exploit them.</p><p>  This paper presents a heuristic algorithm for graph coloring register allocation problem for embedded systems based on a new crossover operator and a new local search fun

14、ction. </p><p>  Graph coloring abstracts the problem of assigning registers to live ranges in a program into the problem of assigning colors to nodes in an interference graph. The register allocator attempt

15、s to “color” the graph with a finite number of machine registers, with one constraint that any two nodes connected by an interference edge must be colored with different registers.</p><p>  To model register

16、 allocation as a graph coloring problem, the compiler first constructs an interference graph G. The nodes in G correspond to live ranges, and the edges represent interferences. Thus, there is an edge in G from node i to

17、node j if live range of i interferes with live range of j, that is, if they are simultaneously live at some point and cannot occupy the same register.</p><p>  To find an allocation from G, the compiler look

18、s for a k-coloring of G, that is, an assignment of k colors to the nodes of G such that adjacent nodes always have distinct colors. If we choose k to match the number of machine registers, then we can map a k-coloring o

19、f G into a feasible register assignment for the underlying code. Because graph coloring is NP-complete, the compiler uses a heuristic method to search for a coloring; it is not guaranteed to find a k-coloring for all k-c

20、olorable grap</p><p>  Spilling one or more live ranges creates a new and different interference graph. The compiler proceeds by iteratively spilling some live ranges and attempting to color the resulting ne

21、w graph. In the context of embedded processors, the most important restriction of the graph coloring approach is that it is based on the assumption of a homogenous register set. The different phases of register alloca

22、tion are differentlyimpacted By embedded processor irregularities.</p><p>  II. RELATED WORK</p><p>  Register allocation has been widely addressed in literature, and many approaches have been p

23、roposed. Most of them are related to Chaitin’s [6] approach, but very few address problems raised by embedded processor’s architecture like support for irregular registers via register classes and register set concatenat

24、ion. Although [13] proposed an approach using integer-programming supporting irregular register sets, this approach requires modeling of register constraints by inequations, making it diff</p><p>  Embedded

25、processors are often characterized by a small number of registers. Algorithms and heuristics have been adapted in our infrastructure to limit spill when few registers are available. Ref. [17] presents a generalization of

26、 the degree < K test , called the <p, q> test, to handle irregular register sets and register classes. VPO [13] is a portable optimizer for DSPs, based on the Zephyr retargetable compiler infrastructure. VPO add

27、ress the problem of heterogeneous register classes by comput</p><p>  Some researchers attempt to use non-graph-coloring methods. Koes[15] proposes a progressive register allocator which uses a multi-commodi

28、ty networkflow mode tp represent the intricacies of irregular architectures. Scholz[18] formulates register allocation as a partitioned boolean quadratic optimization problem that allows generic modeling of processors pe

29、culiarities. Hirnschrott [11] used Partitioned Boolean Quadratic Programming for register allocation shows better results in spill costs and co</p><p>  There are many local search algorithms for graph color

30、ing, such as simulated annealing [7], tabu search[10],etc Galinier and Hao[8] introduced a algorithm based on hybrid evolutionary algorithm. An important feature of this algorithm is a specialized crossover but the mutat

31、ion operator of the GA is replaced by an LS operator.</p><p>  III. A HYBRID EVOLUTIONARY ALGORITHM</p><p>  Evolutionary algorithms involve natural evolution and genetics. The genetic algorithm

32、 is a classical method in this category. It has been applied to various optimization problems. There are several other methods like genetic programming, ant colony optimization, etc. The simple evolutionary algorithms ar

33、e not generally efficient for complex combinatorial problems. The performance of the evolutionary algorithms is improved with the addition of problem specific knowledge. Specialized operators are</p><p>  An

34、 approach for combinatorial optimization is to embed local search into the framework of population based evolutionary algorithm, leading to hybrid evolutionary algorithm. HEA is based on two elements: an efficient local

35、search (LS) operator and a highly specialized crossover operator. The basic idea consists in using the crossover operator to create new and potentially interesting configurations which are then improved by the LS operato

36、r.</p><p>  We present a hybrid evolutionary algorithm for graph coloring based on a new crossover operator for register allocation problem and a new local search function. This section presents our algorith

37、m- Hybrid evolutionary Algorithm for Graph coloring Register Allocation with proposed operator called crossover by conflict-free sets(CCS).</p><p>  A. The Algorithm</p><p>  The HEA consists of

38、 a genetic component and a local search (LS). The genetic component initializes and evolves a population of solutions. The general algorithm is as given below:</p><p>  Input : Interference graph , IG = ( V,

39、E ) ; number of</p><p>  registers , k</p><p>  Output : best configuration</p><p><b>  begin</b></p><p>  P = generate_population(|P|)</p><p>

40、<b>  iter = 0</b></p><p>  while ( iter < MaxIter or popu-diversity>0 ) do</p><p>  (p1, p2) = select_parents(P)</p><p>  p = crossover (p1, p2)</p><p&g

41、t;  p = local_search( p, L )</p><p>  P = update_population(P,p)</p><p>  iter =iter + 1</p><p><b>  endwhile</b></p><p><b>  end</b></p>

42、<p>  The algorithm first builds an initial population of configurations (generate_population) and then performs a series of cycles called generation. At each generation, two configurations p1 and p2 are chosen in

43、the population (select_parents). A crossover is then used to produce an offspring p from p1 and p2 (crossover). The LS operator is applied to improve p for a fixed number L of iterations (local_search). Finally, the impr

44、oved configuration p is inserted in the population by replacing another </p><p>  Population diversity is calculated as the average distance between all configurations in the population. For two configuratio

45、ns p1 and p2, the distance between p1 and p2 is the minimum number of elementary transformations necessary to transform p1 into p2.</p><p>  In our approach, we consider the partition method for string repre

46、sentation [8]. Each solution Pi partitions the variables into register classes, Pi = {R1 , R2 , …. Rk} where each class Ri includes the live ranges of variables that are mapped to the registers ri and k is the total numb

47、er of registers. Given two parents p1= { R11 , R12 , …. R1k} and p2= { R21 , R22 , …. R2k}, the partial configuration will be a set { R1 , R2 , …. Rk} of disjoint sets of nodes where each subset Ri is included in a</p

48、><p>  B. Initial population generation</p><p>  Generally the initial population is generated using the DSatur algorithm [5], which is a graph coloring heuristic. It considers the saturation degre

49、e of each node, which can be defined as the number of different colors to which the node is adjacent to. In the register allocation problem, both the spill costs and the degree of the nodes in a given interference graph

50、are considered. We adopt the new metric called the spill degree proposed by [17] that can be used for ordering the nodes. The spill </p><p>  SDegree1(i) = SCost(i) x Degree(i)</p><p>  SDegree2

51、(i) = SCost(i) x Degree2(i)</p><p>  SDegree3(i) = SCost(i) (1)</p><p>  In these expressions, SCost(i) is the spill cost and Degree(i) is the number of edges incident to node

52、i. In order to generate the initial population, the spill degrees are set using a combination of the three equations given in (1). The nodes are sorted in decreasing order of spill degrees. At each iteration, an unsigned

53、 node with the maximum spill degree is mapped to the register class with the lowest possible index value, where the selected node should be conflict free with the other nodes in</p><p>  C. The crossover ope

54、rator</p><p>  The crossover used here is the new proposed operator Crossover by Conflict-free Sets (CCS). Given two parent configurations p1= { R11 , R12 , …. R1k} and p2= { R21 , R22 , …. R2k}, chosen rand

55、omly by the select_parents operator from the population, the algorithm crossover (p1, p2) builds an offspring p= {R1 , R2 , …. Rk} as follows</p><p>  Input: configurations p1= { R11 , R12 , …. R1k}</p>

56、;<p>  and p2= { R21 , R22 , …. R2k}</p><p>  Output: configuration p= { R1 , R2 , …. Rk}</p><p><b>  begin</b></p><p>  // consider p1</p><p>  if c

57、onflict(p1)>0 then</p><p>  call function conflict_free(p1)</p><p>  // consider p2</p><p>  if conflict(p2)>0 then</p><p>  call function conflict_free(p2)</p

58、><p>  for i (1< i < k ) do</p><p>  CfR1n= max q?p1 Cf_Individual(q)</p><p>  // CfR1n is the partition with maximum number of</p><p>  //conflict-free variables fr

59、om p1</p><p>  CfR2n= max q?p2 Cf_Individual(q)</p><p>  // CfR2n is the partition with maximum number of</p><p>  // conflict-free variables from p2</p><p>  choose j

60、such that | CfR1n∩?CfR2j| is maximum</p><p>  Ri = |CfR1n ∪?CfR2j|</p><p>  SpillQuality(Ri)=Spillcost(Ri)*Reg_class_Conflict(Ri)</p><p>  remove the nodes of Ri from p1 and p2</

61、p><p><b>  endfor</b></p><p>  assign randomly the nodes of R- (Rl U…U Rk)</p><p><b>  end</b></p><p>  function conflict_free(p)</p><p&

62、gt;<b>  begin</b></p><p>  Cf_Individual=?</p><p>  for i ( 1< i < k) do</p><p>  if Reg_class_Conflict(i) >0 then</p><p>  // Reg_class_Conflict(i

63、) is the total number of conflicts</p><p>  // in ith register class</p><p><b>  begin</b></p><p>  initialize t to 0</p><p>  while (t < (Reg_class_Conf

64、lict(i)/2)) do</p><p>  select a variable m from Ri where</p><p>  Conflict(m)=maxc?Ri{Conflict(c)} or</p><p>  spillcost(m)=minc?Ri{spillcost(c)}</p><p>  remove varia

65、ble m from Ri</p><p>  t=t+Conflict(m)</p><p><b>  endwhile</b></p><p>  CfR(i)= Ri</p><p><b>  endif</b></p><p>  Cf_Individual= C

66、f_Individual∪ CfR(i)</p><p><b>  endfor</b></p><p><b>  end</b></p><p>  He algorithm builds step by step the k classes Rl , R2 … Rk of the offspring. If any

67、 register set of the parent has conflict, the algorithm first determines the conflict free sets of all the register classes of each parent. The function to find the conflict free set is based on the heuristic from [20] f

68、or determining the conflict-free set with the maximum number of nodes of each register class. In that, initially the conflict-free set includes all variables in Rl. The total number of conflicts o</p><p>  A

69、t the end of k steps, some nodes may remain unassigned. These are then assigned to a class which has minimum conflict factor.</p><p>  D. The Local Search LS operator</p><p>  After a solution i

70、s generated using the crossover operator, the local search phase improves it before inserting it into the population for a maximum of L iterations. We have used a Local search operator LS. It uses a 1-exchange neighborho

71、od. Formally, given a partition { R1 , R2 , …. Rk}, a neighboring partition is obtained by assigning a single variable to other register set, i.e., a variable vi is removed from the original register class Ri to which it

72、 belongs to and where vi is already in con</p><p>  A variable vi which has maximum conflict factor is selected. Assume that vi is already assigned to class k. If the fitness value of the register class k wi

73、th vi is more than the fitness of register class j without considering vi, then vi is moved from register class k to register class j. This process is repeated for other register classes also. The register class which ha

74、s minimum value is the final register class for variable vi.</p><p>  The same process is repeated with a new variable according to the decreasing order of conflict factor. In this way the quality of generat

75、ed offspring is improved. It is then inserted in the population by replacing the worst of the two parents.</p><p>  E. Fitness function calculation</p><p>  To solve a register allocation proble

76、m, we consider a set of k - register classes. In the interference graph IG = (V, E), all variables (i.e., nodes) vi are assigned to k - register classes. The nodes i and j in an interference graph are said to be conflict

77、-free, when there is no edge connecting them. Conflict(m) denotes the conflict of a variable m in register class Ri. Spillcost is the spill cost of variable m. The fitness of a register class is given as</p><p

78、>  fitness(Ri)=∑??∈??Conflict(m)*Spillcost(m)/degree(m) </p><p><b>  (3)</b></p><p>  The total sum of fitness values of all register classes gives the fitness value of the given

79、individual </p><p><b>  k</b></p><p>  Fitness = ∑ fitness(i) (4)</p><p><b>  i=1</b></p><p>  The goal of the optimization process is t

80、o minimize fitness until zero.</p><p>  IV. EXPERIMENTAL EVALUATION</p><p>  The experiments are based on MachineSUIF [19] compiler research framework. The register allocator of MachineSUIF impl

81、ements George-Appel’s ‘Iterated Register Coalescing’ algorithm [9]. Our experimental analysis includes three algorithms We compare our algorithm with these algorithms. The first is the MachineSUIF’s default algorithm[9]

82、with extenstions by Smith-Ramsey and Holloway [19] for handling register aliasing. We denote it by IRC. The second is based on GPX operator[8]. The third is optima</p><p>  We applied the algorithms to 6 emb

83、edded and real time applications. For each application, the population size is set to 20 and LS iterations are all set from 500 to 2000. For each application, we run the allocator five times and average the results.</

84、p><p>  Performance evaluation was done with respect to the following parameters : number of memory accesses required , spill loads, spill costs, the compile time needed by the allocator, the execution time of

85、the generated code, including the number of load/store generated , size of allocator itself.</p><p>  Figure 1. Number of memory accesses</p><p><b>  TABLE I </b></p><p>

86、;  SPILL COST OF INSTRUCTION</p><p><b>  TABLE II</b></p><p>  RATIO OF THE SPILL LOADS PRODUCED</p><p>  Comparison of number of memory accesses is the comparison of t

87、otal static number of load and store instructions inserted by each register allocator. Figure 1. compares the number of load/store instructions in the assembly code. The HEA inserts fewer memory- access instruction than

88、ORA, 1.6 % fewer memory- access instruction than IRC and 8.9 % fewer memory- access instruction than GPX.</p><p>  Table I give the spill costs of all the algorithms. For lowest population size the spill cos

89、t is less. For each benchmark given, the spill cost of each variable is set to the number of occurrences of the variables. The spill costs of the variables are the average values. The IRC algorithm gives the highest tota

90、l spill cost. The HEA algorithm produce less spill cost than the maximum in four tests and it outperforms the IRC and GPX algorithms in all tests. Genetic operator systematically eliminate</p><p>  Spill loa

91、ds refers to additional number of loads incurred by the allocation algorithm. Spill loads give an indication of how well the allocator is able to perform the task. The number of spill loads is highly correlated with appl

92、ication running time.</p><p>  We calculate the dynamic number of spill loads added to each module of the program by multiplying the number of spill loads added to each block by the number of times that bloc

93、k is executed. Then we sum the number of dynamic spill loads added to each block. We obtain the dynamic number of spill loads for the entire program by summing the number of dynamic spill loads added to each module.</

94、p><p>  Table II shows the spill loads for each allocator as a ratio to spill loads generated by IRC allocator (considered as base allocator for comparison). The numbers are given as geometric mean. We see impr

95、ovements of HEA over other allocator. Table III gives results in terms of compile time and run time. We observe that in most of the cases, the performance of different allocators were almost similar in terms of compile t

96、imes; however, the execution time of the code generated by our method is less </p><p>  Code quality is measured in terms of number of loads/stores generated for of the program and the number of static instr

97、uctions generated. Table IV give the total number of loads/stores generated for 8 registers. Our algorithm is superior to all other algorithms.</p><p>  V. CONCLUSION</p><p>  Register allocatio

98、n for embedded systems is complex having to deal with irregular architectural characteristics and to meet strict requirements.</p><p>  In this paper, we propose a novel crossover operator for graph coloring

99、 register allocation problem for embedded systems, based on hybrid evolutionary algorithm. Compared to classical graph coloring register allocation algorithms, this technique can deal with irregular architectural charact

100、eristics of embedded processors more uniformly. Our experimental results indicate that compared to other algorithms, HEA is one of the most efficient algorithms with highly specialized, domain specific crossov</p>

101、<p>  Population size of the problem should be set properly. Compared to the size of population, the length of iterations L of LS is more critical on the performance of the algorithm.</p><p>  In the f

102、uture, we intend to test our HEA heuristic on graphs using coalescing and on chordal interference graphs generated from a large set of applications. HEA can also be applied to other compiler optimization problems such as

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論