Scale-Right Provisioning Architectures for Next Generation Datacenters |
||||||||||
OverviewCurrent datacenter deployments have a strong tie between servers and running applications. This one-to-one binding has been adopted mainly to make maintaining servers and running applications manageable and to enforce isolation and security. However, it has resulted in many inefficiencies including, but not limited to: (i) Provisioning for peak loads, and sometimes for multiple peak loads, resulting in low average resources utilization; (ii) Lack of autonomic features such as inability to self-optimize when runtime workloads vary, inability to self-configure when resources are added/removed and inability to self-heal when failures happen; and (iii) Datacenter manageability remains ad-hoc with considerable overhead on the datacenter Total Cost of Ownership (TCO). It is projected that 70 cents of each dollar is spend on managing the datacenter, with the majority of that going towards labor. This research is motivated by these and many other datacenters inefficiencies through addressing the followings: (i) Understand workloads’ resources requirements, including compute, memory, network and storage, as well as their constraints such as Service-Level Agreement (SLA); (ii) map such workload to the most-optimal set of physical or virtual resources to run the workloads, which does not only refer to performance and guaranteeing SLAs, but also to minimal power envelopes and no thermal hotspots; and (iii) continuously monitor runtime workloads’ resources requirements, and scale resources up/down accordingly keeping the above goals in mind.
Project Goals A Scale-Right Provisioning Architectures (SRPA) is intended to optimize datacenter deployments and operations. Such architecture will need to combine a broad scope of technologies including, but not limited to, autonomic computing, virtualization, provisioning, workload monitoring and prediction, and workload-driven platform architectures. Its basic idea include: (i) degenerate the physical resources of a datacenter into a pool of virtual compute/memory, network and storage devices (explained in details in section 1.2.2); (ii) Build comprehensive hierarchical management infrastructure (explained in details in section 1.2.1) with one continuous monitoring (e.g., performance, health and events), able to receive workloads and their requirements, global scheduling, policy-based optimization and a user interface; (iii) select virtual resources based on workload requirements, assemble them and launch them as a Virtual Machine (VM) to run the workload; and (iv) enable scaling virtual resources up/down as runtime changes demand.The figure below shows a high-level view of an SRPA. Autonomic Performance/Power Optimizations: Most early work on server power management has either focused on specific components such as the processor or used heuristics to address base power consumption in server clusters or have ignored thermal ramifications. This motivates us to adopt a holistic approach for system-level power management within an enclosure where we exploit the interactions and dependencies among different devices that constitute whole computing systems. The synergy between system components (processor and memory) has been clearly presented in previous works which address base power consumption for web servers by using a power-shifting technique that dynamically distributes and maintains the power budget among components using workload sensitive policies. In our approach, we plan to optimize performance/watt for the platform through power and thermal management. We apply this technique at different hierarchy levels of from the enclosure to the platform to the component. This holistic multi-variable mathematically rigorous optimization approach for determining optimal performance/watt lends itself best to achieve the desired goals.
Broader Impacts The success of the proposed research will significantly reduce the TCO of datacenters through reducing both their capital (Capex) and operational (Opex) expenses. Opex will be reduced through improving the utilization of resources, minimizing power consumption, reducing cooling cost and reducing human touch as a result of the integrated autonomic capabilities. Capex, on the other hand, will be reduced since higher resources utilization means a need to purchase fewer resources (e.g., servers, network switches, routers and storage). Furthermore, the implementation of the various policy-based optimizations such as autonomic power and performance management will significantly reduce the overall electric consumption for datacenters as well as reduce the vulnerability of Internet services to energy supply disruptions (e.g., recently wholesale electricity spiked to $800 per MWH on the California spot market). An additional auxiliary affect, such performance-power optimizations will have also significant impact on the environment through reducing the amount of released CO2. Another benefit for businesses is to spend the money they saved from their datacenter operations on many things such as new initiatives and businesses, research new technologies or advance other causes. People
Publications
|
|
|||||||||
Phone Number: (520) 621-9915 Room 251, ECE Dept. 1230 E. Speedway Tucson, AZ 85721-0104 ACL - © Copyright 2009, Webmaster: Youssif Al-Nashif All Rights Reserved |
||||||||||