spark streaming - each data in mongodb local.oplog.rs, is it a standard bsonobject structure -

i use spark mongo-connector sync data mongodb collection hdfs file, code works fine if collection read through mongos, when comes local.oplog.rs, replica collection read through mongod, gives me exception:

caused by: com.mongodb.hadoop.splitter.splitfailedexception: unable calculate input splits: couldn't find index on splitting key { _id: 1 }

i think data structure different between oplog.rs , normal collection, oplog.rs doesn't have "_id" property, newapihadooprdd can not work nomally, right?

yes, document structure bit different in oplog.rs. find actual document in "o" field of oplog document.

example oplog document:

{ "_id" : objectid("586e74b70dec07dc3e901d5f"), "ts" : timestamp(1459500301, 6436), "h" : numberlong("5511242317261841397"), "v" : 2, "op" : "i", "ns" : "urdb.urcollection", "o" : {     "_id" : objectid("567ba035e4b01052437cbb27"),       ....       .... original document.        }

}

use "ns" , "o" of oplog.rs expected collection , document.

Search This Blog

Business

spark streaming - each data in mongodb local.oplog.rs, is it a standard bsonobject structure -

Comments

Post a Comment

Popular posts from this blog

scala - 'wrong top statement declaration' when using slick in IntelliJ -

c# - DevExpress.Wpf.Grid.InfiniteGridSizeException was unhandled -

PySide and Qt Properties: Connecting signals from Python to QML -