Monday, 15 June 2015

java - Heap error when using custom RecordReader with large file -



java - Heap error when using custom RecordReader with large file -

i've written custom file reader not split input files big gzipped files , want first mapper job gunzip them. followed illustration in 'hadoop definitive guide' heap error when trying read in byteswritable. believe because byte array of size 85713669, i'm not sure how overcome issue.

here code:

public class wholefilerecordreader extends recordreader<nullwritable, byteswritable> { private filesplit filesplit; private configuration conf; private byteswritable value = new byteswritable(); private boolean processed = false; @override public void close() throws ioexception { // nil } @override public nullwritable getcurrentkey() throws ioexception, interruptedexception { homecoming nullwritable.get(); } @override public byteswritable getcurrentvalue() throws ioexception, interruptedexception { homecoming value; } @override public float getprogress() throws ioexception, interruptedexception { homecoming processed ? 1.0f : 0.0f; } @override public void initialize(inputsplit split, taskattemptcontext context) throws ioexception, interruptedexception { this.filesplit = (filesplit) split; this.conf = context.getconfiguration(); } @override public boolean nextkeyvalue() throws ioexception, interruptedexception { if (!processed) { byte[] contents = new byte[(int) filesplit.getlength()]; path file = filesplit.getpath(); filesystem fs = file.getfilesystem(conf); fsdatainputstream in = null; seek { in = fs.open(file); ioutils.readfully(in, contents, 0, contents.length); value.set(contents, 0, contents.length); } { ioutils.closestream(in); } processed = true; homecoming true; } homecoming false; }

}

in general can not load whole file memory of java vm. should find streaming solution process big files - read info chunk chunk , save results w/o fixing in memory whole info set specific task - unzip not suited mr since there no logical partition of info records. please note hadoop handling gzip automatically - input stream decompressed.

java hadoop heap

No comments:

Post a Comment