awk doesn't work in hadoop's mapper -
this hadoop job:
hadoop streaming \ -d mapred.map.tasks=1\ -d mapred.reduce.tasks=1\ -mapper "awk '{if(\$0<3)print}'" \ # doesn't work -reducer "cat" \ -input "/user/***/input/" \ -output "/user/***/out/" this job fails, error saying:
sh: -c: line 0: syntax error near unexpected token `(' sh: -c: line 0: `export tmpdir='..../work/tmp'; /bin/awk { if ($0 < 3) print } ' but if change -mapper this: -mapper "awk '{print}'" works without error. what's problem if(..) ?
update:
thank @paxdiablo detailed answer.
what want filter out data 1st column greater x, before piping input data custom bin. -mapper looks this:
-mapper "awk -v x=$x{if($0<x)print} | ./bin" is there other way achieve that?
the problem's not if per se, it's fact quotes have been stripped awk command.
you'll realise when @ error output:
sh: -c: line 0: `export tmpdir='..../work/tmp'; /bin/awk { if ($0 < 3) print } ' and when try execute quote-stripped command directly:
pax> echo hello | awk {if($0<3)print} bash: syntax error near unexpected token `(' pax> echo hello | awk {print} hello the reason {print} 1 works because doesn't contain shell-special ( character.
one thing might want try escape special characters ensure shell doesn't try interpret them:
{if\(\$0\<3\)print} it may take effort correctly escaped string can @ error output see generated. i've had escape () since they're shell sub-shell creation commands, $ prevent variable expansion, , < prevent input redirection.
also keep in mind there may other ways filter depending on needs, ways can avoid shell-special characters. if specify needs are, can possibly further.
for example, create shell script (eg, pax.sh) actual awk work you:
#!/bin/bash awk -v x=$1 'if($1<x){print}' then use shell script in mapper without special shell characters:
hadoop streaming \ -d mapred.map.tasks=1 -d mapred.reduce.tasks=1 \ -mapper "pax.sh 3" -reducer "cat" \ -input "/user/***/input/" -output "/user/***/out/"
Comments
Post a Comment