版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、Data volumes are ever growing,from traditional applications such as databases and scientific computing to emerging applications like Web 2.0 and online social networks. This has driven intensive research on scalable data
2、 intensive systems,including MapReduce and Dryad. Among those systems,Hadoop,an open-source MapReduce implementation,is widely adopted by companies such as Facebook and Google,and academia. Recently,MapReduce has been de
3、ployed in the cloud as a software-as-a-service. Due to its wide adoption,the performance of Hadoop in particular (and MapReduce in general) has received much attention in system research. Meanwhile,virtual machines (VM)
4、have become increasingly important for supporting efficient and flexible resource provisioning. By means of this technique,cloud computing provides users with the ability to perform elastic computation using large pools
5、of VMs,without facing the burden of owning or maintaining physical infrastructure. To this end,when building large scale data intensive systems - data intensive cloud computing - developers need to understand the princip
6、les of designing large systems to get performance guarantees,load balancing and fair charging for use of resources. Performance in data-intensive cloud computing is contributed by many factors including data locality,app
7、lication types and the underlying cloud infrastructure which is mainly VM-based.First of all,a novel replica-aware map execution named Maestro is presented to overcome the non-local map execution in MapReduce system. In
8、Maestro,map tasks are scheduled in two phases. The first one,first wave scheduling,schedules the maps when the job initializes to fill all the empty slots,and the second one,run time scheduler,schedules the map tasks acc
9、ording to data locality,node availability and block weight,which is the probability of the best replication to schedule the task. Interestingly,Maestro not only can efficiently achieve higher locality in MapReduce-like s
10、ystems,but can also reduce unnecessary Map task speculation and balance the intermediate data distribution before the shuffle phase.The existing MapReduce system overlooked the data skew problem that occurs when signific
11、ant variance in both intermediate keys' frequencies and their distributions amongthe different data nodes is introduced,referred to as Partitioning Skew. Experimental results with Hadoop demonstrate that,in the presence
12、of partitioning skew,the applications experience performance degradation due to the long data transfer during the shuffle phase along with the computation skew,particularly in the reduce phase. To address this problem,a
13、novel algorithm for locality-aware and fairness-aware key partitioning in MapReduce is developed,referred as LEEN. LEEN embraces an asynchronous map and reduce scheme. All buffered intermediate keys are partitioned accor
14、ding to their frequencies and the fairness of the expected data distribution after the shuffle phase. LEEN can not only efficiently achieve higher locality and reduce the amount of shuffled data,but also LEEN guarantees
15、fair distribution of the reduce inputs.In the cloud,the computing unit is virtual machine (VM) based; therefore,it is important to demonstrate the applicability of data-intensive computing on a virtualized data center. A
16、lthough virtualization brings many benefits such as resource utilization and isolation,it poses,due to VM interference,a challenging problem for performance predictability and system throughput for large-scale virtualize
17、d environments. To this end,a quantitative analysis on the impact of interference on the system fairness is presented. Because Cloud is an economics-based distributed system,the concept of pricing fairness is adopted fro
18、m micro economics. As a result,the current pay-as-you-go is neither personally nor socially fair. Accordingly,to solve the unfairness caused by interference,new pricing scheme (pay-as-you-consume) is proposed. In the pay
19、-as-you-consume pricing scheme,users are charged according to their effective resource consumption excluding interference. The key idea behind the pay-as-you-consume pricing scheme is a machine learning based prediction
20、model on the relative cost of interference. The preliminary experimental results with Xen demonstrate the accuracy of the prediction model,and the fairness of the pay-as-you-consume pricing scheme.The introduction of vir
21、tualization in Hadoop clusters poses new challenges due to the architectural design of the hypervisor. A series of experiments are conducted to measure and analyze the performance of Hadoop on VMs in terms of Hadoop Dist
22、ributed File System (HDFS) throughput,performance variation with different VM consolidation and configuration,and task speculation. As a result,this dissertation outlines several issuesthat will need to be considered whe
23、n implementing MapReduce to fit completely on virtual machines - such as decoupling the storage system (HDFS) from the computation unit (VMs). Later,a novel MapReduce framework that runs on virtual machines,called Cloudl
24、et,is proposed.Virtualization interferences are contributed to by intertwined factors including the application's type,the number of concurrent VMs,and the VM scheduling algorithms used within the host. Further studies r
25、evealed that selecting the appropriate disk I/O scheduler pairs can significantly affect the applications performance. Furthermore,a typical Hadoop application consists of different interleaving stages,each requiring dif
26、ferent I/O workloads and patterns. As a result,the disk scheduler pairs are not only sub-optimal for different MapReduce applications,but are also sub-optimal for different sub-phases of the whole job. Accordingly,a nove
27、l approach for adaptively tuning the disk scheduler pairs in both the hypervisor and the virtual machines during the execution of a single MapReduce job is proposed. Experimental results show that MapReduce performance c
28、an be significantly improved; specifically,adaptive tuning of disk scheduler pairs achieves a 25% performance improvement on a sort benchmark with Hadoop。Keywords:Cloud computing;Virtualization;MapReduce;Hadoop;Replica-a
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- the client aware cloud
- Ternary logic in quantum computing.pdf
- cloud computing and network security research
- An Algorithm for Task Scheduling In a Heterogeneous Computing Environment.pdf
- analysis on the patentability of cloud computing related techniques
- shanghai telecom and baosteel joined forces for a cloud computing center
- the development and application of cloud computing under information technology development context
- Security Management Strategies in Cloud Computing using Fully Homomorphic Encryption.pdf
- Towards Practical Schemes for Searching the Encrypted Cloud Data.pdf
- An Evolutionary Algorithm for Optimal Budget-Deadline Workflow Scheduling on Cloud Systems.pdf
- An Availability-Aware Task Scheduling Algorithm for Heterogeneous Systems Using Particle Swarm Optimization.pdf
- sla-oriented resource provisioning for cloud computing_ challenges, architecture, and solutions
- SLA-oriented resource provisioning for cloud computing_ Challenges, architecture, and solutions.pdf
- SLA-oriented resource provisioning for cloud computing_ Challenges, architecture, and solutions.pdf
- A Dynamic Materialized View Selection in a Cloud-Based Data Warehouse.pdf
- “Cloud Computing+IOT”驅(qū)動的情緒心理語義識別技術(shù)研究.pdf
- Automatic Computing Model for Analyzing Production Data by Determining Different Flow Periods.pdf
- High Performance Power Spectrum Analysis Using a FPGA Based Recon?gurable Computing Platform.pdf
- High Performance Power Spectrum Analysis Using a FPGA Based Recon?gurable Computing Platform.pdf
- empirical study of performance of data binding in asp.net web applications
評論
0/150
提交評論