如何使用python提取文本文件中的特定段落?



我必须从"SUBSTITUTION OF TRUSTEE"并以"根据所述信托契约"结尾。

  1. 由于字段重复,只需要查找段落内的数据。

  2. 数据可以是日期、文件数目等

sample.txt
Inst #: 2021
Fees: $42.00
06/24/2021 06:54:48 AM
Receipt #: 4587188
Requestor:
FINANCIAL CORPORATION OF
After recording return to: Src: MAIL
Mail Tax Statements to:
SUBSTITUTION OF TRUSTEE
AND DEED OF RECONVEYANCE
The undersigned, Financial Corporation of Nevada, a Nevada Corporation, as the Owner and
Holder of the Note secured by Deed of Trust dated March 1, 2013 made by Elvia Bello, Trustor, to
Official Records -- HEREBY substitutes Financial Corporation of Nevada, a Nevada Corporation,
as Trustee in lieu of the Trustee therein.

Said Note, together with all other indebtedness secured by said Deed of Trust, has been fully paid 
satisfied; and as successor Trustee, the undersigned does hereby RECONVEY WITHOUT
WARRANTY TO THE PERSON OR PERSONS LEGALLY ENTITLED THERETO, all the estate now
held by it under said Deed of Trust.
This JNO aay of June 2021,
Financial Corporation
wy luo Rtn rae
import re
mylines = []
pattern = re.compile(r"SUBSTITUTION OF TRUSTEE", re.IGNORECASE)
with open(r'sample.txt', 'rt', encoding='utf-8') as myfile:
for line in myfile:                 
mylines.append(line)
for line in mylines:
if(line == "SUBSTITUTION OF TRUSTEE "):
print(line)
break
else:
mylines.remove(line)

print("my lines",mylines)

您可以首先在每行的开头检查substitution of trustee子字符串,一旦找到,将标志变量设置为True。当标志为true时,继续向mylines列表添加行。然后,一旦到达包含under said deed or trust的行,停止添加行并返回结果:

mylines = []
flag = False
with open(r'sample.txt', 'rt', encoding='utf-8') as myfile:
for line in myfile:
if line.strip().upper().startswith("SUBSTITUTION OF TRUSTEE"):
flag = not flag
if flag:
mylines.append(line)
if "under said deed of trust" in line.strip().lower():
break
print("".join(mylines))

查看Python演示。

输出:

SUBSTITUTION OF TRUSTEE
AND DEED OF RECONVEYANCE
The undersigned, Financial Corporation of Nevada, a Nevada Corporation, as the Owner and
Holder of the Note secured by Deed of Trust dated March 1, 2013 made by Elvia Bello, Trustor, to
Official Records -- HEREBY substitutes Financial Corporation of Nevada, a Nevada Corporation,
as Trustee in lieu of the Trustee therein.

Said Note, together with all other indebtedness secured by said Deed of Trust, has been fully paid 
satisfied; and as successor Trustee, the undersigned does hereby RECONVEY WITHOUT
WARRANTY TO THE PERSON OR PERSONS LEGALLY ENTITLED THERETO, all the estate now
held by it under said Deed of Trust.

下面是一个naïve方法来完成您想要的-

extracted_lines=[]
extract = False
for line in open("sample.txt"):
if extract == False and "SUBSTITUTION OF TRUSTEE".lower() in line.strip().lower():
extract = True

if extract :
extracted_lines.append(line)
if "under said Deed of Trust".lower() in line.strip().lower():
extract = False # or break

print("".join(extracted_lines))

最新更新