Sunday, 13 December 2015

How did you debug your Hadoop code?

There can be several ways of doing this but most common ways are:-
 - By using counters.
 - The web interface provided by Hadoop framework.

How will you write a custom partitioner for a Hadoop job?

To have Hadoop use a custom partitioner you will have to do minimum the following three:
 - Create a new class that extends Partitioner Class
 - Override method getPartition
 - In the wrapper that runs the Mapreduce, either
 - Add the custom partitioner to the job programmatically using method set Partitioner Class or – add the custom partitioner to the job as a config file (if your wrapper reads from config file or oozie)

How can you set an arbitrary number of Reducers to be created for a job in Hadoop?

You can either do it programmatically by using method setNumReduceTasks in the Jobconf Class or set it up as a configuration setting.

How can you set an arbitrary number of mappers to be created for a job in Hadoop?

You cannot set it.

What will a Hadoop job do if you try to run it with an output directory that is already present? Will it

- Overwrite it
 - Warn you and continue
 - Throw an exception and exit
 The Hadoop job will throw an exception and exit.

Is it possible to have Hadoop job output in multiple directories? If yes, how?

Yes, by using Multiple Outputs class.

Is it possible to provide multiple input to Hadoop? If yes then how can you give multiple directories as input to the Hadoop job?

Yes, the input format class provides methods to add multiple directories as input to a Hadoop job.