Energy efficient resource allocation in data centers

Recent years have witnessed a tremendous increase in the popularity of cloud computing services to support business, communication, online customer service and help make life more productive and efficient. Naturally, this has been accompanied by a constant expansion of data centers scale and global...

Full description

Bibliographic Details
Main Author: Dai, Xiangming
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://repository.ust.hk/ir/Record/1783.1-86634
https://doi.org/10.14711/thesis-b1610740
http://repository.ust.hk/ir/bitstream/1783.1-86634/1/th_redirect.html
Description
Summary:Recent years have witnessed a tremendous increase in the popularity of cloud computing services to support business, communication, online customer service and help make life more productive and efficient. Naturally, this has been accompanied by a constant expansion of data centers scale and global geographical outreach, resulting in a dramatic growth of the energy consumed to power such data centers. Several studies have shown that the energy consumed today by data centers in the US alone is roughly equivalent to the annual output of 34 large (500-megawatt) coal-fired power plants. Also the consumption is forecast to reach double in less than 10 years. This not only costs data center providers billions of dollars in energy bills, but also generates hundreds of millions of tonnes of carbon pollution per year. Energy consumption in data centers comes from several aspects: i) computing and networking equipments take the lion's share, ii) cooling equipments, and iii) power draw and other ancillary equipments. Any reduction of such consumption is seen as such a boon that for example, to cut cooling costs, some heavy data center users/providers such as Facebook and Google have built data centers in as far flung areas as the Arctic circle, while others like Microsoft are considering undersea data centers. In this thesis, we consider several important problems of resource allocation in data centers while optimizing the energy consumed by computing and networking equipments. The thesis consists of three parts. The first falls within the area of the so-called platform-as-a-service (PaaS) cloud service model, and deals with job scheduling in the MapReduce massive-data parallel-processing framework. In this part, we consider energy efficiency as a by-product of minimizing the makespan of jobs. More specifically, we first propose a new scheduling algorithm called Multiple Queue Scheduler (MQS) to improve the data locality rate of map tasks as a means to curb the costly data migration delays. Thereafter, to take into account the intricate details of MapReduce framework such as the early shuffle problem, we propose the Dynamic Priority Multiple Queue Scheduler (DPMQS) that dynamically increases the priority of jobs that are close to completing their Map phase to speed up the start of the reduce phase, thus reducing further the expected job holding time and the makespan. We implemented both algorithms in Hadoop and compared their performance to other existing algorithms. The second part falls within the realm of infrastructure-as-a-service (IaaS) and deals with energy efficient virtual machine (VM) scheduling in data centers. We notably formulate the minimum energy VM scheduling problem as a non-convex optimization problem, prove its NP-hardness, then propose two greedy approximation algorithms, minimum energy VM scheduling algorithm (MinES) and minimum communication VM scheduling algorithm (MinCS), to reduce the energy consumption while satisfying the tenants' service level agreements. Finally, in the third part, under the IaaS service model, we explore the potential of cloud providers supporting services with more intricate network topologies than currently practised. In particular, we consider the problem of embedding virtual clusters specified by the tenants into a data center in a energy efficient manner. We carefully provide a mathematical optimization model of this problem, prove its NP-hardness, then propose an approximation algorithm, the so-called minimum energy virtual cluster embedding (MinE-VCE) to solve the problem. We tested all proposed algorithms the latter two parts using real data traces as well as synthetic workloads to demonstrate their performance.