多进程+PyMongo导致[Erno 111]



您好!

我刚开始玩pymongo和多处理。我收到了一个用于实验的多核单元,它运行Ubuntu 18.04.4 LTS, codename: bionic。为了进行实验,我在python 3.8python 3.10上都尝试过,不幸的是,结果相似:

>7lvv_E mol:na length:29  DNA (28-MER)
ELSE 7lvv_E
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "LoadDataOnSequence.py", line 54, in createCollectionPDB
x = newCol.insert_one(dict2Write)
File "/home/username/.local/lib/python3.8/site-packages/pymongo/collection.py", line 698, in insert_one
self._insert(document,
File "/home/username/.local/lib/python3.8/site-packages/pymongo/collection.py", line 613, in _insert
return self._insert_one(
File "/home/username/.local/lib/python3.8/site-packages/pymongo/collection.py", line 602, in _insert_one
self.__database.client._retryable_write(
File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1497, in _retryable_write
with self._tmp_session(session) as s:
File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1829, in _tmp_session
s = self._ensure_session(session)
File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1816, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1766, in __start_session
server_session = self._get_server_session()
File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1802, in _get_server_session
return self._topology.get_server_session()
File "/home/username/.local/lib/python3.8/site-packages/pymongo/topology.py", line 496, in get_server_session
self._select_servers_loop(
File "/home/username/.local/lib/python3.8/site-packages/pymongo/topology.py", line 215, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: 127.0.0.1:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 60db2071e53de99692268c6f, topology_type: Single, servers: [<ServerDescription ('127.0.0.1', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('127.0.0.1:27017: [Errno 111] Connection refused')>]>
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "LoadDataOnSequence.py", line 82, in <module>
myPool.map(createCollectionPDB, listFile("datum/pdb_seqres.txt"))
File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
pymongo.errors.ServerSelectionTimeoutError: 127.0.0.1:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 60db2071e53de99692268c6f, topology_type: Single, servers: [<ServerDescription ('127.0.0.1', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('127.0.0.1:27017: [Errno 111] Connection refused')>]>

我已经尝试过多次以不同的方式修改代码,但没有成功。此外,我还尝试过通过SSH从PyCharm运行代码,以及创建包含所有必要文件的本地(在多核机器上(文件夹。

我计算核心的数量并创建我的MongoClient:

from multiprocessing import *
from pymongo import MongoClient

#Number of cores
x = cpu_count()
print(x)

myClient = MongoClient('mongodb://127.0.0.1:27017/')

我准备了一个要通过的列表,使用这个函数:

def listFile(fileName):
fOpen = open(fileName)
listFile = fOpen.readlines()
arrOfArrs = []
tmp1 = []
for i in listFile:
# print(i)
if i.startswith(">"):
if len(tmp1) > 1:
arrOfArrs.append(tmp1)
tmp1 = []
tmp1.append(i.strip())
else:
tmp1.append(i.strip())
#print(listFile)
return arrOfArrs

这就是我准备一个大文本文件的方法(实际上会有一个更大的文件,我只是使用其中一个PDB文件进行测试:https://www.wwpdb.org/ftp/pdb-ftp-sites我使用seqres文件,我没有链接确切的文件,因为它会立即下载(。我想在那之前一切都会好起来的。接下来是将在Pool:中使用的函数

def createCollectionPDB(fP):
lineName = ""
lineFASTA = ""
colName = ""
PDBName = ""
chainIDName = ""
typeOfMol = ""
molLen = ""
proteinName = ""
for i in fP:
print("test", i)
print(lineName)
if ">" in i:
lineName = i.strip()
print("LINE NAME")
colName = lineName.split(" ")[0].strip()[1:]
print("COLNAME", colName)
PDBName = lineName.split("_")[0].strip()
chainIDName = colName.split("_")[-1].strip()
typeOfMol = lineName.split(" ")[1].strip().split(":")[1].strip()
molLen = lineName.split(" ")[2].strip().split(":")[-1].strip()#[3].split(" ")[0].strip()
proteinName = lineName.split(" ")[-1].strip()
print(colName, PDBName, chainIDName, typeOfMol, molLen, proteinName)
else:
print("ELSE", colName)
lineFASTA = i.strip()
dict2Write={"PDB_ID" : PDBName, "Chain_ID" : chainIDName, "Molecule Type" : typeOfMol, "Length" : molLen, "Protein_Name" : proteinName, "FASTA" : lineFASTA}
myNewDB = myClient["MyPrjPrj_PDBs"]
newCol = myNewDB[colName]
x = newCol.insert_one(dict2Write)
print("PDB", x.inserted_id)#'''

那个过去也一样。最后Imultiprocess:

f1 = listFile("datum/pdb_seqres.txt")
myPool = Pool(processes=x)
myPool.map(createCollectionPDB, f1)
myPool.join()
myPool.close()

我一直在寻找各种解决方案,比如更改Python版本,尝试不同版本的mongo(5.0和4.x(,以及重新启动mongo。我还尝试过更改进程的数量,这让我遇到了几乎相同的错误,尽管停在了不同的行。我尝试过的另一个选项是使用ssh_pymongo,但也没有成功。它也可以在没有多处理的情况下工作,尽管没有多处理,我在较小的文件上使用它。

每个进程都需要有自己的客户端,因此您很可能需要在每个进程中创建客户端,而不是在调用多处理之前创建一个客户端。

分叉进程:套接字传递失败:管道破裂包含MongoDB驱动程序如何处理分叉的一般信息。

相关内容

  • 没有找到相关文章

最新更新