awk doesn't work in hadoop's mapper -


this hadoop job:

hadoop streaming \ -d mapred.map.tasks=1\ -d mapred.reduce.tasks=1\ -mapper "awk '{if(\$0<3)print}'" \  # doesn't work -reducer "cat" \ -input "/user/***/input/" \ -output "/user/***/out/" 

this job fails, error saying:

sh: -c: line 0: syntax error near unexpected token `(' sh: -c: line 0: `export tmpdir='..../work/tmp'; /bin/awk { if ($0 < 3) print } ' 

but if change -mapper this: -mapper "awk '{print}'" works without error. what's problem if(..) ?

update:

thank @paxdiablo detailed answer.

what want filter out data 1st column greater x, before piping input data custom bin. -mapper looks this:

-mapper "awk -v x=$x{if($0<x)print} | ./bin"  

is there other way achieve that?

the problem's not if per se, it's fact quotes have been stripped awk command.

you'll realise when @ error output:

sh: -c: line 0: `export tmpdir='..../work/tmp'; /bin/awk { if ($0 < 3) print } ' 

and when try execute quote-stripped command directly:

pax> echo hello | awk {if($0<3)print} bash: syntax error near unexpected token `('  pax> echo hello | awk {print} hello 

the reason {print} 1 works because doesn't contain shell-special ( character.

one thing might want try escape special characters ensure shell doesn't try interpret them:

{if\(\$0\<3\)print} 

it may take effort correctly escaped string can @ error output see generated. i've had escape () since they're shell sub-shell creation commands, $ prevent variable expansion, , < prevent input redirection.


also keep in mind there may other ways filter depending on needs, ways can avoid shell-special characters. if specify needs are, can possibly further.

for example, create shell script (eg, pax.sh) actual awk work you:

#!/bin/bash awk -v x=$1 'if($1<x){print}' 

then use shell script in mapper without special shell characters:

hadoop streaming \   -d mapred.map.tasks=1 -d mapred.reduce.tasks=1 \   -mapper "pax.sh 3" -reducer "cat" \   -input "/user/***/input/" -output "/user/***/out/" 

Comments

Popular posts from this blog

python - How to create a legend for 3D bar in matplotlib? -

java - Multi-Label Document Classification -

php - Dynamic url re-writing using htaccess -