Chathura Herath 's Blog

My Photo
Name:
Location: Bloomington, Indiana, United States

Monday, September 06, 2010

How to setup Eclipse project and development envionment for changing Hadoop framework

Here are the three easy steps that i followed when i want to hack the hadoop code to for my research and i have seen this question asked and answered few times and i personally think the approach i took is worth blogging.

First of all you need to decide on the version that you are selecting to hack on. I usually use the .20.2 tag and i download and install the binary version of that and follow the instructions posted that. Now the idea is you would check out the same tag from svn and and setup an eclipse project and you would build that project and replace the jar in binary installation with the newly build jar.

Step 1: Install: Download and install a stable version of Hadoop that you are chosen to develop on.

Step2 : Setup Hadoop Eclipse Project - To setup the Hadoop eclipse project you would have to have Eclipse with the svn plug-in and add a svn repository using https://svn.apache.org/repos/asf/hadoop/common/ URL. Once the repository is added go into the tags in the repository tree and find the tag version you installed in step 1 and check out that using the eclipse. Now you will have a eclipse project with lot of errors. First step in getting the project to compile is to run the ant jar target in the build script. This will find all the dependency jars and they will be stored in the lib and other folders. You can try to add the jars one by one but i would rather suggest using a explorer like Konqueror and searching for all the jar files in the project root folder and copying all of them into a folder named libn. You might have to manually download the ant.jar and add that to the libn as well. Then you can add all these jar files to the eclipse classpath. Once this is done you will be almost all set except for selecting the src folders that you want to be compiled. Most of the exciting stuff in hadoop happen in src/mapreduce, src/core, src/hdfs source trees. So for a starter i would only add these as the source directories. By now the Eclipse project should compile successfully without errors.

Step 3: Deployment - Once the eclipse project is setup and you done with your hacking and changed say the Hadoop core, you could run the jar target of the build.xml and build the project. This would generate a jar file inside the build directory named hadoop-0.xx.x-dev-core.jar. Now you should go to the installation directory of Hadoop in step1 and delete the core jar there and replace it with the new jar you just build. Once you have done that you can restart Hadoop and you will have your own version of Hadoop installation.

Happy Hadoop hacking !!!

Labels: , , , , ,