高等計算機系統(tǒng)機構 - 北京大學微處理器研究開發(fā)中心_第1頁
已閱讀1頁,還剩154頁未讀 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、(第一講),2011年2月21日,程 旭,引 論,高等計算機系統(tǒng)結構,主要教材:,主講教師:,授課時間地點:每周一 下午 15:10—18:00 二教102http://mprc.pku.edu.cn,Computer Architecture: A Quantitative Approach,4th Edition (Oct, 2006) ,Patterson and Hennessy,教材與教師,“高等計算機系統(tǒng)結構”

2、的教學目標,學習和把握將決定二十一世紀計算機具體形態(tài)的設計技術、機器結構、工藝要素、評價方法等,,計算機應用需要什么?操作系統(tǒng)需要那些功能支持?優(yōu)化編譯可以利用和實現(xiàn)哪些功能?我們能夠建造什么樣的機器?今后的計算機將會怎樣?計算機系統(tǒng)結構研究人員必須具有寬厚的專業(yè)知識!,計算機基礎,數(shù)字邏輯,計算機組織與結構,操作系統(tǒng),編譯技術,,數(shù)據(jù)結構應用基礎C語言編程,,存儲管理調度并發(fā),,代碼生成優(yōu)化,,基本邏輯單元處理器

3、基礎知識,本課程在教學安排中的地位,高等計算機體系結構,,,,如何實現(xiàn)!具體細節(jié) ---知其然!,分析+評測—知其所以然!并行計算機系統(tǒng)結構,Charles Babbage 1791-1871Lucasian Professor of Mathematics, Cambridge University, 1827-1839,Charles Babbage,Difference Engine 1823Analy

4、tic Engine 1833The forerunner of modern digital computer!,ApplicationMathematical Tables – AstronomyNautical Tables – NavyBackground Any continuous function can be approximated by a polynomial --- Weie

5、rstrass Technologymechanical - gears, Jacquard’s loom, simple calculators,Difference EngineA machine to compute mathematical tables,Weierstrass:Any continuous function can be approximated by a polynomialAny polynom

6、ial can be computed from difference tablesAn examplef(n)= n2 + n + 41d1(n)= f(n) - f(n-1) = 2nd2(n)= d1(n) - d1(n-1) = 2f(n)= f(n-1) + d1(n) = f(n-1) + (d1(n-1) + 2),all you need is an adder!,Babbage’s Differen

7、ceEngine 11832,Analytic Engine,1833: Babbage’s paper was publishedconceived during a hiatus in the development of the difference engineInspiration: Jacquard Loomslooms were controlled by punched cardsThe set of car

8、ds with fixed punched holes dictated the pattern of weave ? programThe same set of cards could be used with different colored threads ? numbers1871: Babbage diesThe machine remains unrealized.,It is not clear if t

9、he analytic engine could be built even today using only mechanical technology,Babbage’s Difference Engine 2and Analytical Engine,1834 Babbage Analytical Engine,The Store: Memory unit consisting of counter wheelsThe M

10、ill: The arithmetic unit capable of 4 operations used a pair of register and produced results stored in another register in the storeOperation Cards: Specified one of Four operationsVariable Cards: Specified the memory

11、 location to be usedOutput: Printer or punch,Babbage Analytical Engine,Analytic EngineThe first conception of a general-purpose computer,The store in which all variables to be operated upon, as well as all those quanti

12、ties which have arisen from the results of the operations are placed.The mill into which the quantities about to be operated upon are always brought.,The first programmer Ada Byron aka “Lady Lovelace” 1815-52,Ada’s t

13、utor was Babbage himself!,While not using the practical technology of the era, Alan Turing developed the idea of a "Universal Machine" capable of executing anydescribable algorithm, and forming the basis for

14、the concept of "computability". Perhaps more importantly Turing's ideas differed from those of others who were solving arithmetic problems by introducing the concept of "symbol processing".,1937,

15、Alan Turing,第一臺通用電子計算機--ENIAC,1946年2月14日J. Presper Eckert&John MauchlyMoore SchoolUniversity of PennsylvaniaSize: 80 feet long 8.5 feet high18,000 vacuum tubes5000 additions/sec.,The world’s first ge

16、neral-purpose electronic computerconditional Jump and be programmable, distinguished it from earlier onesUsed for computing artillery firing tables,,Electronic Numerical Integrator and Calculator,Accumulator,28 vacuum

17、tubes,,ENIAC’S Application: Ballistic calculationsangle = f (location, tail wind, cross wind, air density, temperature, weight of shell, propellant charge, ... ),WW-2 Effort,ENIAC was NO

18、T a “stored program” device,For each problem, someone analyzed the arithmetic processing needed and prepared wiring diagrams for the computors to use when wiring the machineProcess was time consuming and error proneCle

19、aning personnel often knocked cables out of their place and just put them back somewhere,Wiring the machine,Electronic Discrete Variable Automatic Computer (EDVAC),ENIAC’s programming system was externalSequences of ins

20、tructions were executed independently of the results of the calculationHuman intervention required to take instructions “out of order”Eckert, Mauchly, John von Neumann and others designed EDVAC (1944) to solve this pro

21、blemSolution was the stored program computer? “program can be manipulated as data”First Draft of a report on EDVAC was published in 1945, but just had von Neumann’s signature!In 1973 the court of Minneapolis attrib

22、uted the honor of inventing the computer to John Atanasoff,The von Neumann Machine,Stored Program ComputerIAS(Institute for Advanced Study) Computer,1946,MainMemory,ArithmeticLogicUnit,ProgramControl Unit,I/OEquip

23、ment,,,,,,,,,存儲程序的思想 即構成計算機程序的指令可同數(shù)據(jù)一樣事先存放到存儲器中,然后由計算機自己一條條取出執(zhí)行。這種思想很自然地引出了轉移指令和可對指令的地址部分進行修改的概念,從而使一段程序的指令可以自動地被有意義地多次執(zhí)行。,1949年,EDSAC開始運行其基于累加器的結構和其指令系統(tǒng)設計對以后一段時期的機器設計有著重要影響,第一臺全面的、可操作的、存儲程序計算機--EDSACThe world’s fi

24、rst full-scale,operational,stored-program computer,Maurice Wilkes,Cambridge UniversityEDSAC: Electronic Delay Storage Automatic Calculator,Bell Labs,1940: Ohl develops the PN Junction1945: Shockley's laboratory est

25、ablished1947: Bardeen and Brattain create point contact transistor (U.S. Patent 2,524,035),Diagram from patent application,Bell Labs,1951: Shockley develops a junction transistor manufacturable in quantity (U.S. Patent

26、2,623,105),Diagram from patent application,The Integrated Circuit,1959: Jack Kilby, working at TI, dreams up the idea of a monolithic “integrated circuit”Components connected by hand-soldered wires and isolated by “shap

27、ing”, PN-diodes used as resistors (U.S. Patent 3,138,743),Diagram from patent application,Integrated Circuits,1961: TI and Fairchild introduce the first logic ICs ($50 in quantity)1962: RCA develops the first MOS transi

28、stor,RCA 16-transistor MOSFET IC,Fairchild bipolar RTL Flip-Flop,The Microprocessor,1971: Intel introduces the 4004General purpose programmable computer instead of custom chip for Japanese calculator company,微處理器性能,,,,4

29、004108 kilohertz0.06 MIPS,80802 MHz 0.64 MIPS,80888 MHz0.75 MIPS,Intel386? SX CPU33 MHz2.9 MIPS,Intel486? DX CPU50 MHz41 MIPS,Pentium® Processor233 MHz,Intel® Celeron® Processor1.3 GHz,Pentium&

30、#174; 4 Processor3GHz,Sea Change in Chip Design,Intel 4004 (1971): 4-bit processor,2312 transistors, 0.4 MHz, 10 micron PMOS, 11 mm2 chip,Processor is the new transistor?,RISC II (1983): 32-bit, 5 stage pipeline, 40,

31、760 transistors, 3 MHz, 3 micron NMOS, 60 mm2 chip,125 mm2 chip, 0.065 micron CMOS = 2312 RISC II+FPU+Icache+DcacheRISC II shrinks to ~ 0.02 mm2 at 65 nm,Multicore,Small number of cores, shared memorySome systems hav

32、e multithreaded coresTrend to simplicity in cores (e.g. no branch prediction)Multiple threads share resources (L2 cache, maybe FP units)Deployment in embedded market as well as other sectors,IBM Power4, 2001,Sun T-1 (

33、Niagara), 2005,AMD True quad core die 2007,Cell from IBM and Sony,Intel 80核芯片(2007),80個處理核心1 Teraflop 100億次運算/瓦特主頻3.1GHz 面積 300mm²,各CPU內核與內存1對1地連接,分別擁有256MBps的內存帶寬32MB的片上靜態(tài)RAM 。單芯片整體的內存帶寬達到了1TB/s,13.75mm * 22

34、 mm,IBM POWER7(2010),,,CPU技術發(fā)展簡史,Charles Babbage’s Engines(1823)Turing Machine(1937)ENIAC(1946) EDSAC(1949)CPU MicroprocessorGeneral Purpose Microprocessor VS. special CPU for HPCMulticore, Manycore or …,,,,,連接PC

35、(WWW),分離PC(email),,,,,,,,信息家電,手持 Hand-helds,無線、手機Cellphones &phone access,游戲機 Game Consoles,機頂盒 網絡計算機Set-tops & NCs,9百萬套,6千萬套,2億5千萬套,1985,1995,2005,Sources: Network Computer Inc. & IDC,,因特網訪問方式的改進,,,,,,,,

36、,,,,,,,,,,,,,,,,,,,,,,,,,,,,電腦空間與人和其他物理世界的數(shù)字接口,,平臺,內容,,,平臺,內容,,,局域網和家庭網,公共和私用廣域網,因特網:網絡的網絡,,,,,,,……,,,,,,,,,,,,,,,計 算,通信,數(shù)字化,電腦空間:螺旋上升,,驅動后PC時代的兩大技術:1) 移動消費類設備例如:新一代PDA、新一代移動通信設備、 可穿戴計算機2) 支持上述設備的基礎設施:例如:新一代Big Fa

37、t Web Servers、 Database Servers,后PC時代(PC+時代),新的浪潮—微處理器將無處不在,,Source: Richard Newton,嵌入式微處理器,What?A programmable processor whose programming interface is not accessible to the end-user of the product.The only user-inter

38、action is through the actual application.Examples:- Sharp PDA’s are encapsulated products with fixed functionality- 3COM Palm pilots were originally intended as embedded systems. Opening up the programmers interface

39、turned them into more generic computer systems.,Some interesting numbers,The Intel 4004 was intended for an embedded application (a calculator)Of todays microprocessors95% go into embedded applicationsSSH3/4 (Hitachi)

40、: best selling RISC microprocessor(1997)ARM: best selling embedded microprocessor(2001-)50% of microprocessor revenue stems from embedded systemsOften focused on particular application areaMicrocontrollersDSPsMedia

41、 ProcessorsGraphics ProcessorsNetwork and Communication Processors,不同的評價標準,,,,,,,Flexibility,Power,Cost,Performance as a Functionality Constraint(“Just-in-Time Computing”),Components of CostArea of die / yieldCode d

42、ensity (memory is the major part of die size)PackagingDesign effortProgramming costTime-to-marketReusability,VLSI工藝發(fā)展加快(Gate Length),,芯片制作流程,若? =3, 晶模成本 大致以 晶模大小的 四次方 增長,集成電路的成本,封裝成本: 取決于管腳數(shù)量和散熱要求,ChipDie Pack

43、age Test &Totalcostpinstypecost Assembly386DX$4 132QFP$1 $4 $9 486DX2$12 168PGA$11 $12 $35 PowerPC 601$53 304QFP$3 $21 $77 HP PA 7100$73 504PGA$35 $16 $124 DEC Alpha

44、$149 431PGA$30 $23 $202 SuperSPARC$272 293PGA$20 $34 $326 Pentium$417 273PGA$19 $37 $473,其他成本,Cost/PerformanceWhat is Relationship of Cost to Price?,Component CostsDirect Costs (add 25% to 40%) r

45、ecurring costs: labor, purchasing, scrap, warrantyGross Margin (add 82% to 186%) nonrecurring costs: R&D, marketing, sales, equipment maintenance, rental, financing cost, pretax profits, taxesAverage Discount to g

46、et List Price (add 33% to 66%): volume discounts and/or retailer markup,iPad: Apple’s profit comes from margins in hardware,+ Apple margin,$499,,,,$230,,$70,,$90,,$110,Average industry margin(approx. 30 %),Cost of mater

47、ials andmanufacturing1,Cost of sales(approx. 30 %),,Margin:40%,Source: iSuppli,功耗密度進一步惡化,Surpassed hot-plate power density in 0.5mNot too long to reach nuclear reactor,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

48、,,,,,,,,,,,,,,1,10,100,1000,1.5m,1m,0.7m,0.5m,0.35m,0.25m,0.18m,0.13m,0.1m,0.07m,Watts/cm,2,,i386,,i486,,Pentium,processor,,Pentium Pro,processor,,Pentium II,processor,,Pentium III,processor,,Hot plate,Nuclear Reactor,,,

49、RocketNozzle,,Sun’sSurface,,,,,優(yōu)化能耗,高性能通用微處理器 (例如, Pentiums)10-100 Watts, 100-1000MIPS = 0.01 Mips/mW節(jié)能通用微處理器 (例如, StrongARM)0.5 Watts, 160 MIPS = 0.3 Mips/mW節(jié)能專用處理器(例如, MPEG2)100 Mops/mW,開關能耗,MIMD,,Multiprocessor

50、s and Multicomputer Clusters,Nine Computer Price Tiers(2000),Super server: costs more than $100,000“Mainframe”: costs more than $1 millionan array of processors, disks, tapes, comm ports,1$: embeddables e.g. gre

51、eting card 10$: wrist watch & wallet computers 100$:pocket/ palm computers 1,000$:portable computers 10,000$: personal computers (desktop) 100,000$: dep

52、artmental computers (closet) 1,000,000$:site computers (glass house) 10,000,000$:regional computers (glass castle) 100,000,000$:national centers,What is Computer Architecture?,Application,Physics,(bu

53、t there are exceptions, e.g. magnetic compass),In its broadest definition, computer architecture is the design of the abstraction layers that allow us to implement information processing applications efficiently using av

54、ailable manufacturing technologies.,Abstraction Layers in Modern Systems,Algorithm,Gates/Register-Transfer Level (RTL),Application,Instruction Set Architecture (ISA),Operating System/Virtual Machine,Microarchitecture,Dev

55、ices,Programming Language,Circuits,Physics,The End of the Uniprocessor Era,Single biggest change in the history of computing systems,Old Conventional Wisdom: Power is free, Transistors expensiveNew Conventional Wisdom:

56、“Power wall” Power expensive, Transistors free (Can put more on chip than can afford to turn on)Old CW: Sufficient increasing Instruction-Level Parallelism via compilers, innovation (Out-of-order, speculation, VLIW, …)

57、New CW: “ILP wall” law of diminishing returns on more HW for ILP Old CW: Multiplies are slow, Memory access is fastNew CW: “Memory wall” Memory slow, multiplies fast (200 clock cycles to DRAM memory, 4 clocks for mul

58、tiply)Old CW: Uniprocessor performance 2X / 1.5 yrsNew CW: Power Wall + ILP Wall + Memory Wall = Brick WallUniprocessor performance now 2X / 5(?) yrs? Sea change in chip design: multiple “cores” (2X processors per

59、 chip / ~ 2 years)More, simpler processors are more power efficient,Conventional Wisdom in Computer Architecture,Uniprocessor Performance,VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002 RISC + x8

60、6: ??%/year 2002 to present,From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, October, 2006,Problems with Sea Change,Algorithms, Programming Languages, Compilers, Operating Systems

61、, Architectures, Libraries, … not ready to supply Thread-Level Parallelism or Data-Level Parallelism for 1000 CPUs / chip, Architectures not ready for 1000 CPUs / chipUnlike Instruction-Level Parallelism, cannot be sol

62、ved by computer architects and compiler writers alone, but also cannot be solved without participation of architects4th Edition of textbook “Computer Architecture: A Quantitative Approach” explores shift from Instructio

63、n-Level Parallelism to Thread-Level Parallelism / Data-Level Parallelism,Instruction Set Architecture: Critical Interface,,,,,,,,,,,,,,,,,,,,,,,,,,instruction set,software,hardware,Properties of a good abstractionLasts

64、through many generations (portability)Used in many different ways (generality)Provides convenient functionality to higher levelsPermits an efficient implementation at lower levels,Instruction Set Architecture,“... th

65、e attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical

66、implementation.” – Amdahl, Blaauw, and Brooks, 1964,-- Organization of Programmable Storage-- Data Types & Data Structures: Encodings & Representations-- Instruction Formats-- Instruc

67、tion (or Operation Code) Set-- Modes of Addressing and Accessing Data Items and Instructions-- Exceptional Conditions,Example: MIPS,,,,,0,r0r1°°°r31,,,,PClohi,Programmable storage2^32 x byte

68、s31 x 32-bit GPRs (R0=0)32 x 32-bit FP regs (paired DP)HI, LO, PC,Data types ?Format ?Addressing Modes?,,Arithmetic logical Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU, AddI, AddIU, SLTI, SLTIU,

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論