我尝试使用mmngo spark connector_2.11:2.2.7
。
初始化Spark上下文MongoDB接受连接后,在我尝试从中读取收集后,我得到:
INFO MongoClientCache: Closing MongoClient: [localhost:27017]
INFO connection: Closed connection [connectionId{localValue:2,
serverValue:17}] to localhost:27017 because the pool has been closed.
mongoDB容器输出:
I NETWORK [initandlisten] connection accepted from 172.18.0.1:65513 #6
(2 connections now open)
I NETWORK [initandlisten] connection accepted from 172.18.0.1:65515 #7
(3 connections now open) I NETWORK [conn7] end connection
172.18.0.1:65515 (2 connections now open) I NETWORK [conn6] end
connection 172.18.0.1:65513 (1 connection now open)
所有组件MongoDB,Spark Master&工人是Docker中的容器(暴露的必要端口,我可以使用Shell连接到所有(。
所以我真的不知道怎么了
我有一个带有一个主人和一个工人的跑步火花集群,所有节点都有将火花连接到mongodb
的必要依赖项 MongoDBsession = SparkSession
.builder
.appName("MongoDB Export to Hive")
.master("spark://localhost:7077")
.config("spark.mongodb.input.uri", "mongodb://localhost:27017/db_name.collection_name?readPreference=primaryPreferred")
.config("spark.mongodb.input.partitioner","MongoSamplePartitioner")
.config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.2.7")
.getOrCreate()
df_mongoDB_messageLogs = MongoDBsession.read
.format("mongo")
.option("database","db_name")
.option("collection","collection_name")
.load()
更新:
仅当将火花应用程序提交到Spark-Cluster(Localhost:7077(时才发生。如果我使用Master = Local运行Spark-Submit,那么从MongoDB阅读数据就没有问题。有什么想法吗?
好的,最终都不是mongodb或spark的错误。这只是码头工人。我无法与Localhost访问MongoDB。