java - hadoop textinputformat read only one line per file -
i wrote simple map task hadoop 0.20.2, input dataset consists of 44 files, each 3-5mb. each line of file has format int,int
. input format default textinputformat
, mapper's work parse input text
integers.
after task run, statistics of hadoop framework shew number of input records map task 44. tried debug , found input records method map
first line of each file.
does know problem , can find solution?
thank in advanced.
edit 1
the input data generated different map-reduce task output format textoutputformat<nullwritable, intxint>
. tostring()
method of intxint
should give string of int,int
.
edit 2
my mapper looks following
static class mymapper extends mapreducebas implements mapper<longwritable, text, intwritable, intwritable> { public void map(longwritable key, text value, outputcollector<intwritable, intwritable> output, reporter reporter) { string[] s = value.tostring().split(","); intxint x = new intxint(s[0], s[1]); output.collect(x.firstint(), x.secondint()); } }
edit 3
i have checked, mapper reads 1 line each file, not whole file 1 text
value.
the inputformat defines how read data file mapper instances.the default textinputformat reads lines of text files. key emits each record byte offset of line read (as longwritable), , value contents of line terminating '\n' character (as text object).if have multi-line records each separated $ character, should write own inputformat parses files records split on character instead.
Comments
Post a Comment