[Hadoop] Building the Jar of wordcount in IntelliJ IDEA

In this post, I would like to share something about building the jar file so that we can test our program on a distributed cluster.

I am using Hadoop 2.3 (CDH 5.0.0). But this program can be used in Hadoop 2.4. There are so little materials on the Internet to use IDEA writing programs in Hadoop. I tried several times and different ways and finally find a way to run the program successfully.

Firstly, we can create an project and then add the wordcount example code. Here is the program I used in this example.

Of course we need to add the libraries. In this example, we only need 3 jars.

Libraries

Then we need to add a new artifact. Click File -> Project Structure, select artifacts on the left. Click Add button -> Jar -> From modules with dependencies. Choose the module you create. If you specify the Main class here, you don’t need to add its class name in the following command. Click OK to save the settings.

Artifacts setting

Now you can click Build -> Build Artifacts. Select wordcount.jar and click build.

Now we have our Jar file. We can use the following command to run the map-reduce program, in which input is the input path and output is the output  path.

Be careful here if you specify the Main Class. If you do that, you don’t need to specify the class name.

After the job is done, you can check the result in the output path.