collections - mongodb mapreduce doesn't return right in a sharded cluster -


very interesting, mapreduce works fine in single instance, not on sharded collection. below, may see got collection , write simple map-reduce function,

mongos> db.tweets.findone() {     "_id" : objectid("5359771dbfe1a02a8cf1c906"),     "geometry" : {         "type" : "point",         "coordinates" : [             131.71778292855996,             0.21856835860911106         ]     },     "type" : "feature",     "properties" : {         "isflu" : 1,         "cell_id" : 60079,         "user_id" : 35,         "time" : isodate("2014-04-24t15:42:05.048z")     } } mongos> db.tweets.find({"properties.user_id":35}).count() 44247 mongos> map_flow function () { var key=this.properties.user_id; var value={ "cell_id":1}; emit(key,value); } mongos> reduce2 function (key,values){ var ros={flows:[]}; values.foreach(function(v){ros.flows.push(v.cell_id);});return ros;} mongos> db.tweets.mapreduce(map_flow,reduce2, { out:"flows2", sort:{"properties.user_id":1,"properties.time":1} }) 

but results not want

mongos> db.flows2.find({"_id":35}) { "_id" : 35, "value" : { "flows" : [  null,  null,  null ] } } 

i got lots of null , interesting have 3 ones. mongodb mapreduce seems not right on sharded collection?

the number 1 rule of mapreduce is:

  • thou shall emit value of same type reduce function returneth

you broke rule, mapreduce works small collection reduce called once each key (that's second rule of mapreduce - reduce function may called zero, once or many times).

your map function emits value {cell_id:1} each document.

how reduce function use value? well, return value document array, push cell_id value. strange already, because value 1, i'm not sure why wouldn't emit 1 (if wanted count).

but happens when multiple shards push bunch of 1's flows array (whether it's intended, that's code doing) , reduce called on several reduced values:

reduce(key, [ {flows:[1,1,1,1]},{flows:[1,1,1,1,1,1,1,1,1]}, etc ] ) 

your reduce function tries take each member of values array (which document single field flows) , push v.cell_id flows array. there no cell_id field here, of course end null. , 3 nulls because have 3 shards?

i recommend articulate trying aggregate in code, , rewrite map , reduce comply rules mapreduce in mongodb expects code follow.


Comments

Popular posts from this blog

java - Intellij Synchronizing output directories .. -

git - Initial Commit: "fatal: could not create leading directories of ..." -