|
Post by Admin on Mar 14, 2014 6:44:08 GMT
The number of maps is usually driven by the total size of the inputs, that is,the total number of blocks of the input files. Generally it is around 10-100 maps per-node. Task setup takes awhile, so it is best if the maps take at least a minute to execute. Suppose, if you expect 10TB of input data and have a blocksize of 128MB, you'll end up with 82,000 maps, to control the number of block you can use the mapreduce.job.maps parameter (which only provides a hint to the framework).Ultimately, the number of tasks is controlled by the number of splits returned by the InputFormat.getSplits() method (which you can override).
|
|