hadoop - Error running a MapReduce streaming job using Python -


i'm trying run mapper , reducer code (*disclaimer - part of solution training course)

mapper.py

import sys  line in sys.stdin:     data = line.strip().split("\t")     if len(data) == 6:         date, time, store, item, cost, payment = data         print "{0}\t{1}".format(1, cost) 

reducer.py

import sys  stotal = 0 trans = 0  line in sys.stdin:     data_mapped = line.strip().split("\t")     if len(data_mapped) != 2:         continue      stotal += float(data_mapped[1])     trans += 1  print transactions, "\t", salestotal 

keeps throwing error:

undef/bin/hadoop job  -dmapred.job.tracker=0.0.0.0:8021 -kill job_201404041914_0012 14/04/04 23:13:53 info streaming.streamjob: tracking url: http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201404041914_0012 14/04/04 23:13:53 error streaming.streamjob: job not successful. error: na 14/04/04 23:13:53 info streaming.streamjob: killjob... streaming command failed! 

i've tried both explicitly calling python function , specifying python interpreter. (i.e. /usr/bin/env python)

any idea going wrong?

the job failing, because reducer.py, has syntax error.

the problem line:

print transactions, "\t", salestotal 

there no variables name transactions , salestotal.

if execute locally, error:

traceback (most recent call last):   file "r.py", line 14, in <module>     print transactions, "\t", salestotal nameerror: name 'transactions' not defined 

the correct line should be:

print trans, "\t", stotal 

Comments

Popular posts from this blog

java - Intellij Synchronizing output directories .. -

git - Initial Commit: "fatal: could not create leading directories of ..." -