Microsoft researchers explore 'job-centric' cloud model

Model would let tenants buy cloud services based on jobs and required completion time instead of resources, which could benefit providers

Elasticity — the ability to ramp up or down computing resources depending on need — is one of the key benefits of cloud computing. Not being shackled to their own, in-house hardware means organisations can dial up the amount of resources they need to crunch big data sets, run their Web presence during spikes and troughs in demand and process periodic jobs without needing the internal resources necessary to cope with peak demand.

But although the IT department may be less likely to have to run out to buy a new server or two, they will still have to make decisions about the purchase of resources from a cloud provider. Of course, cloud computing will let you change your mind and respond to changes in an organisation's demand, but, according to a group of Microsoft researchers, there may be an easier way to go about it, with benefits for both providers and 'cloud consumers'.

OpenStack: Building a more open Cloud
Yabi: Bringing drag-and-drop to supercomputers
Hadoop: How open source can whittle Big Data down to size

In a paper — presented at the ACM Symposium on Cloud Computing 2012 earlier this month and titled Bridging the Tenant-Provider Gap in Cloud Services (PDF) — the researchers make the case for allowing cloud tenants to purchase resources based on a "job-centric" model. A job-centric cloud would add another layer of abstraction to the cloud by having an interface that lets tenants specify performance and cost goals, instead of an interface that lets them more directly allocate resources.

The researchers — Virajith Jalaparti, Hitesh Ballani, Paolo Costa, Thomas Karagiannis and Ant Rowstron from the Microsoft Research lab in Cambridge, UK — make the case that not only would a job-centric interface make life easier for cloud tenants by removing the burden of translating their "high-level goals into the corresponding resource requirements", but it would add flexibility for cloud providers by allowing them to allocate which and how many resources to a job, as well as when to allocate these resources.

Cloud tenants are mostly concerned about predictable application completion times and costs. "However," the paper states, "with today's setup, tenants are responsible for mapping such high-level completion time goals down to specific resources needed for their jobs." Although this issue has been tackled, it's normally been from the point of view of working out the size and number of virtual machines, not dealing with shared provider infrastructure such as the network.

At its heart, from the providers' point of view, the appeal of a job-centric interface is that different combinations of resources can be used to complete the same job for a customer. An example the paper cites is a job that could be run over a small number of virtual machines with a lot of network bandwidth or a lot of virtual machines with a little bandwidth, "or somewhere in between".

To illustrate the idea, the researchers designed a cloud framework for MapReduce jobs, using the open source big data platform Hadoop, which they dubbed Bazaar. Bazaar allows tenants to specify MapReduce job constraints and then determines the best resource combination from the provider for completing the job within those constraints.

The framework translates the goals of the cloud users into a series of resource tuples (comprising number of virtual machines and network bandwidth) in a predictive analytics phase and then chooses a tuple to complete the job based on the state of the data centre at the particular point in time and with an eye to allowing "the cloud provider to accept more requests in the future, thus maximising its revenue."

The translation process involves analysing the details of a job to work out which resource tuples would meet the tenant's goals, and then a second phase of selecting one of the tuples. The researchers note that Bazaar model was intentionally limited, and that there are benefits from expanding the model to allow the allocation of idle resources to enable earlier completion of tenants' jobs. "Though our core ideas apply to general multi-resource settings, we being by focusing on two specific resources, compute instances (N) and the network bandwidth (B) between them," the paper notes.

Employing Bazaar, the researchers showed that the 'provider' used during their tests could accept 3-14 per cent more requests, as well as more resource-intensive requests, increasing the data centre's goodput by 7-87 per cent.

The paper notes that a Bazaar-style model would also have implications for cloud pricing models: instead of charging based on resources allocated to a client, a provider could move towards job-based pricing that takes account of things such as the amount of data to be processed and desire completion time.

"A job-centric cloud, coupled with job-based pricing, can thus enable a symbiotic tenant provider relationship where tenants benefit due to fixed costs upfront, and better-than-desired performance [because providers can complete jobs quicker if there are idle data centre resources] while providers use the increased flexibility to improve goodput and, consequently, total revenue," the paper argues.

Rohan Pearce is the editor of Techworld Australia and Computerworld Australia. Contact him at rohan_pearce at

Follow Rohan on Twitter: @rohan_p

Read more: Tackling big data challenges with Hadoop

Join the Computerworld newsletter!

Error: Please check your email address.

Tags mapreducemicrosoft researchMicrosofthadoopcloud computing

More about Microsoft