使用 Python/Flask/Waitress 的多线程导致重复输出



我有一个Python Flask Server,它在一个路由上提供文档文件,将它们保存到特定于作业的文件夹中,然后当所有.docx文件都上传时,新路由会触发批处理多线程作业以将它们转换为PDF。

问题是,如果我从同一客户端为不同的作业发送第二个请求,第一个作业可以正常完成,但第二个和后续作业会处理第一个和第二个作业中请求的所有文件,并将它们复制到第二个或后续输出文件夹。

路线:

@app.route('/docxgroupproc2',methods=['GET'])
def docgroupproc2():
startTime=datetime.now()
jobid=request.args.get('jobid')
newPath, jobPath, outPath=paths('',str(jobid))
localdata=local()
localdata2=local()
localdata.value=jobPath
localdata2.value=outPath
mtThread= Thread(target=mtconvDOCX.bleck, args=(localdata.value,localdata2.value),daemon=True)
mtThread.setDaemon(True)
mtThread.start()
print("Thread Started")
mtThread.join()    
endTime=datetime.now()
print(endTime-startTime)
return ({'completed': "status"})

多线程模块:

def parseDOCS(outPath,file):
comtypes.CoInitialize()
word=comtypes.client.CreateObject('Word.Application')
word.Visible=False
doc= word.Documents.Open(file,Visible=False)
outFile = os.path.join(outPath,str(os.path.splitext(os.path.basename(file))[0] + ".PDF"))
try:
doc.SaveAs(outFile, FileFormat=wdFormatPDF)
except COMError:
res = "FAIL"
else:
res = "SUCCESS"
finally:
doc.Close()
word.quit()
return    

def setupParse(dir,fileCounter=0,TotalFileCounter=0,fileslist=[]):
"return number of files in dir"
for files in os.scandir(dir):
if files.is_file():
fileCounter+=1
TotalFileCounter=+1
fileslist.append(files.path)
text="DOCX Files" + " : " + str(fileCounter) + "ntotal files: " + str(TotalFileCounter)
print(fileCounter)
#dictlist=map([(x,outPath) for x in [fileslist]])
return text, fileCounter, TotalFileCounter, fileslist

def bleck(dir, outPath):
text, fileCounter, TotalFileCounter, dictlist=setupParse(dir)
pool=ThreadPool(4)
#result=pool.starmap_async(parseDOCS,zip(dictlist, repeat(outPath)),chunksize=1)
result=pool.map_async(partial(parseDOCS,outPath),dictlist)
while not result.ready():
print("rNumber of Files Processed: {}".format(fileCounter-result._number_left+1), end='           ')
pass
pool.close()
pool.join()
return "completed"
def setupParse(dir,fileCounter=0,TotalFileCounter=0,fileslist=[]):

所以这条线就是问题所在。 从 def 行中移动声明,然后将它们初始化为停止的文件列表中的正确值,以免被每个线程/作业附加到。

最新更新