Monday, 15 August 2011

hadoop - Is InputSplit size or number of map tasks affected by the number of input files -



hadoop - Is InputSplit size or number of map tasks affected by the number of input files -

would create difference number of map tasks spawned job if have lot of little files (~hdfs block size) vs few big files

it depends inputformat use, because determines input splits computation, , number of map tasks.

if utilize default textinputformat, each file have @ to the lowest degree 1 split, @ to the lowest degree 1 mapper per file, if these files few kb, each mapper doing little work, introduces lot of overhead map/reduce framework. said if have guarantee these "small" files close block size, doesn't matter much.

if have no command on files , might small, advise using different inputformat called combinefileinputformat combines several input files in same split, number of maps in case depend on overall amount of data, regardless of number of files. implementation can found here.

hadoop mapreduce hdfs

No comments:

Post a Comment