从现实世界凌乱的文件中提取文本


WET READ: ___ ___ ___ 7:31 PM
Persistent right lower lung opacity and right pleural effusion. Effusion
perhaps slightly decreased since radiograph dated ___.
______________________________________________________________________________
FINAL REPORT
PA AND LATERAL CHEST RADIOGRAPH.
TECHNIQUE:  AP upright portable radiograph of chest was reviewed in comparison
to prior radiograph from ___.

As compared to ___ there is interval improvement of pulmonary edema. 
Right lower lobe consolidation with internal cavitation surrounded by pleural
effusion appears to be grossly unchanged in the short interim.  There is no
evidence of progression of left consolidation.  Small amount of left pleural
effusion is noted.  

我有这样的文本文件,我想从"最终报告"之后的文件中提取数据。意思是我想删除文本文件中的所有内容,包括"最终报告"。

我试过正则表达式,但是找不到方法。

一行一行地写:

def copy( infile ):
tempname = infile + '.tmp'
if os.path.exists( tempname ):
os.remove( tempname )
keep = False
with open(infile) as fin, open(tempname,'w') as fout:
for line in fin
if "FINAL REPORT" in line:
keep = True
if keep:
print( line.strip(), file=fout )
os.remove( infile )
os.rename( tempfile, infile )

最新更新