https://gist.github.com/SirChurchill9999/1baee3adc055d8c1c76ca2f2be417c8f
Leanne Hazard
1. Leanne Hazard
Expert Witness AR
Position: Forensic Toxicologist
Areas of Expertise: Medical & Surgical - Toxicology; Employment & Vocational - Forensics
Expert Evaluator Report
James M. Pape, M.D.
2.James M. Pape, M.D.
Expert Witness AR
Position: Orthopedic Surgeon
Areas of Expertise: Medical & Surgical - Surgery -- General; Medical & Surgical - Orthopedics
Expert Evaluator Report
Wayne E. Williams
3. Wayne E. Williams
Expert Witness AR
Position: Forestry Expert
Areas of Expertise: Environmental - Trees
Expert Challenge Report
Expert Evaluator Report
我正试图将分组项目移动到单独的列中,但所有项目都与人名对齐。。。。例如:
Leanne Hazard。。。。。1.Leanne Hazard。。。。。。专家证人AR。。。。。。职位:法医毒物学家
我遇到的问题是,其中一些团体以";专家评估报告";而其他人则用";专家质询报告"有关示例,请参阅上面的代码。
由于记录总是以"专家评估报告"结尾:
from urllib.request import urlopen
with urlopen('https://gist.githubusercontent.com/SirChurchill9999/1baee3adc055d8c1c76ca2f2be417c8f/raw/25a05aad8ee315f20970ac3af20b21bf1231b97b/Sample.xlsx') as f:
while True:
try:
record = [next(f).decode().strip() for __ in range(6)]
if record[-1] != 'Expert Evaluator Report':
record.append(next(f).decode().strip())
print(record)
except StopIteration:
pass
结果:
['Leanne Hazard', '1. Leanne Hazard', 'Expert Witnessxa0AR', 'Position:xa0Forensic Toxicologist', 'Areas of Expertise:xa0Medical & Surgical - Toxicology; Employment & Vocational - Forensics', 'Expert Evaluator Report']
['James M. Pape, M.D.', '2.James M. Pape, M.D.', 'Expert Witnessxa0AR', 'Position:xa0Orthopedic Surgeon', 'Areas of Expertise:xa0Medical & Surgical - Surgery -- General; Medical & Surgical - Orthopedics', 'Expert Evaluator Report']
['Wayne E. Williams', '3. Wayne E. Williams', 'Expert Witnessxa0AR', 'Position:xa0Forestry Expert', 'Areas of Expertise:xa0Environmental - Trees', 'Expert Challenge Report', 'Expert Evaluator Report']
您的文件扩展名为.xlsx
,但似乎是常规文本文件?