spark streaming - each data in mongodb local.oplog.rs, is it a standard bsonobject structure -
i use spark mongo-connector sync data mongodb collection hdfs file, code works fine if collection read through mongos, when comes local.oplog.rs, replica collection read through mongod, gives me exception:
caused by: com.mongodb.hadoop.splitter.splitfailedexception: unable calculate input splits: couldn't find index on splitting key { _id: 1 }
i think data structure different between oplog.rs , normal collection, oplog.rs doesn't have "_id" property, newapihadooprdd can not work nomally, right?
yes, document structure bit different in oplog.rs. find actual document in "o" field of oplog document.
example oplog document:
{ "_id" : objectid("586e74b70dec07dc3e901d5f"), "ts" : timestamp(1459500301, 6436), "h" : numberlong("5511242317261841397"), "v" : 2, "op" : "i", "ns" : "urdb.urcollection", "o" : {     "_id" : objectid("567ba035e4b01052437cbb27"),       ....       .... original document.        } }
use "ns" , "o" of oplog.rs expected collection , document.
Comments
Post a Comment