collections - mongodb mapreduce doesn't return right in a sharded cluster -
very interesting, mapreduce works fine in single instance, not on sharded collection. below, may see got collection , write simple map-reduce function,
mongos> db.tweets.findone() { "_id" : objectid("5359771dbfe1a02a8cf1c906"), "geometry" : { "type" : "point", "coordinates" : [ 131.71778292855996, 0.21856835860911106 ] }, "type" : "feature", "properties" : { "isflu" : 1, "cell_id" : 60079, "user_id" : 35, "time" : isodate("2014-04-24t15:42:05.048z") } } mongos> db.tweets.find({"properties.user_id":35}).count() 44247 mongos> map_flow function () { var key=this.properties.user_id; var value={ "cell_id":1}; emit(key,value); } mongos> reduce2 function (key,values){ var ros={flows:[]}; values.foreach(function(v){ros.flows.push(v.cell_id);});return ros;} mongos> db.tweets.mapreduce(map_flow,reduce2, { out:"flows2", sort:{"properties.user_id":1,"properties.time":1} })
but results not want
mongos> db.flows2.find({"_id":35}) { "_id" : 35, "value" : { "flows" : [ null, null, null ] } }
i got lots of null , interesting have 3 ones. mongodb mapreduce seems not right on sharded collection?
the number 1 rule of mapreduce is:
- thou shall emit value of same type reduce function returneth
you broke rule, mapreduce works small collection reduce called once each key (that's second rule of mapreduce - reduce function may called zero, once or many times).
your map function emits value {cell_id:1}
each document.
how reduce function use value? well, return value document array, push cell_id
value. strange already, because value 1, i'm not sure why wouldn't emit 1 (if wanted count).
but happens when multiple shards push bunch of 1's flows array (whether it's intended, that's code doing) , reduce called on several reduced values:
reduce(key, [ {flows:[1,1,1,1]},{flows:[1,1,1,1,1,1,1,1,1]}, etc ] )
your reduce function tries take each member of values array (which document single field flows
) , push v.cell_id
flows array. there no cell_id field here, of course end null
. , 3 nulls because have 3 shards?
i recommend articulate trying aggregate in code, , rewrite map , reduce comply rules mapreduce in mongodb expects code follow.
Comments
Post a Comment