My Photo
Name:
Location: Bloomington, Indiana, United States

Tuesday, April 06, 2010

Reducing debug cycle during Amazon elastic map reduce development

The cost model that Amazon has published for Amazon Elastic Map Reduce is totally unfair during the development process. The minimum billing unit is an hour and these hours add up quickly to run up your bill if you are not careful enough. If you are doing anything serious using Amazon Elastic Map Reduce, that is to say you are running something other than the word count example and you choose not to install Hadoop yourself but rather to develop off the Hadoop in Amazon Elastic Map Reduce, you will end up making lot of debug runs to get the configurations right. In each of these runs if the Hadoop gets launched even for a minute it will charge you for an entire hour times the number of instances you launched. Especially if you are using the Amazon Management Console you will end up having to start a new Job Flow every time you change your application and want to test it. These costs quickly add up if you are not careful, or rather careless.

Tips to reduce the costs

Avoid using the extra large machines

During development avoid using extra large instances because the cost of these are much much higher and because they have 8 cores you will be billed 8 normalized CPU hours when the instance gets launched.

Programatically Launch Job Flow with keep alive

I wrote a blog earlier showing how to launch a Job Flow programatically and in that i showed how to keep the Job Flow and the instances alive after your map reduce application finish. Then you can simply add a job flow step to the already running application. This will not only reduce the debug cycle because the instance boot up time is no longer relevant to the subsequent Job Flow Steps and you can launch multiple map reduce runs as Job Flow Step with in an hour and yet it will cost you only one hour of CPU because you are not shutting down the instances after one run.

Not develop on Amazon Elastic Map Reduce

One option is to install Hadoop locally and test it there before coming the Amazon so you will not end up paying an hours price for every few minutes of debug run you did.

Amazon should provide development instances billed per minute.

Best scenario is amazon either provide cheaper instances for development or bill per minute during development.

Labels: , , , , ,

2 Comments:

Blogger Rushika said...

This amazing article i have ever read in recent times. This is very inforamtive article. I regularly visit this blog for this kind fo helpful posts. Thank you so much for this wonderful blog post, keep posting such helpful information. If you are genuinely searching for a job oriented pega online training or pega online training hyderabad who are expertise to teach 100% practicals based course. And they provide certification material at pega training institutes in hyderabad and you can see this pega online training hyderabad. I was looking for a pega training institutes in pune whose instructor is really good at teaching. So you can either join at pega training institutes in Kolkata or pega training institutes in Bangalore in case if you are staying in Bengaluru. So start finding a job after a rigorous practice at pega training institutes in Mumbai whose faculty trainer the students at pega training institutes in Delhi also and in the end check out this pega interview questions.
Once again thanks a lot for this wonderful blog article, your efforts are priceless.

11:10 PM  
Blogger meritstep said...

amazing article i have ever read in recent times. This is very inforamtive article.
I regularly visit this blog for this kind fo helpful posts. Thank you so much for this wonderful blog post,
keep posting such helpful information. I was looking fo.thank you for thizs article.
pega online training ,
best pega online training ,
top pega online training .

5:02 AM  

Post a Comment

<< Home