java - hadoop textinputformat read only one line per file -

May 15, 2012

i wrote simple map task hadoop 0.20.2, input dataset consists of 44 files, each 3-5mb. each line of file has format int,int. input format default textinputformat , mapper's work parse input text integers.

after task run, statistics of hadoop framework shew number of input records map task 44. tried debug , found input records method map first line of each file.

does know problem , can find solution?

thank in advanced.

edit 1

the input data generated different map-reduce task output format textoutputformat<nullwritable, intxint>. tostring() method of intxint should give string of int,int.

edit 2

my mapper looks following

static class mymapper extends mapreducebas   implements mapper<longwritable, text, intwritable, intwritable> {    public void map(longwritable key,                   text value,                   outputcollector<intwritable, intwritable> output,                   reporter reporter) {      string[] s = value.tostring().split(",");     intxint x = new intxint(s[0], s[1]);     output.collect(x.firstint(), x.secondint());   } }

edit 3

i have checked, mapper reads 1 line each file, not whole file 1 text value.

the inputformat defines how read data file mapper instances.the default textinputformat reads lines of text files. key emits each record byte offset of line read (as longwritable), , value contents of line terminating '\n' character (as text object).if have multi-line records each separated $ character, should write own inputformat parses files records split on character instead.

Search This Blog

KHS

java - hadoop textinputformat read only one line per file -

Comments

Post a Comment

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

java - Using an Integer ArrayList in Android -