将数据的单列分隔为多列但为单行



https://gist.github.com/SirChurchill9999/1baee3adc055d8c1c76ca2f2be417c8f

Leanne Hazard
1. Leanne Hazard
Expert Witness AR
Position: Forensic Toxicologist
Areas of Expertise: Medical & Surgical - Toxicology; Employment & Vocational - Forensics
Expert Evaluator Report
James M. Pape, M.D.
2.James M. Pape, M.D.
Expert Witness AR
Position: Orthopedic Surgeon
Areas of Expertise: Medical & Surgical - Surgery -- General; Medical & Surgical - Orthopedics
Expert Evaluator Report
Wayne E. Williams
3. Wayne E. Williams
Expert Witness AR
Position: Forestry Expert
Areas of Expertise: Environmental - Trees
Expert Challenge Report 
Expert Evaluator Report

我正试图将分组项目移动到单独的列中,但所有项目都与人名对齐。。。。例如:

Leanne Hazard。。。。。1.Leanne Hazard。。。。。。专家证人AR。。。。。。职位:法医毒物学家

我遇到的问题是,其中一些团体以";专家评估报告";而其他人则用";专家质询报告"有关示例,请参阅上面的代码。

由于记录总是以"专家评估报告"结尾:

from urllib.request import urlopen
with urlopen('https://gist.githubusercontent.com/SirChurchill9999/1baee3adc055d8c1c76ca2f2be417c8f/raw/25a05aad8ee315f20970ac3af20b21bf1231b97b/Sample.xlsx') as f:
while True:
try:
record = [next(f).decode().strip() for __ in range(6)]
if record[-1] != 'Expert Evaluator Report':
record.append(next(f).decode().strip())
print(record)
except StopIteration:
pass

结果:

['Leanne Hazard', '1. Leanne Hazard', 'Expert Witnessxa0AR', 'Position:xa0Forensic Toxicologist', 'Areas of Expertise:xa0Medical & Surgical - Toxicology; Employment & Vocational - Forensics', 'Expert Evaluator Report']
['James M. Pape, M.D.', '2.James M. Pape, M.D.', 'Expert Witnessxa0AR', 'Position:xa0Orthopedic Surgeon', 'Areas of Expertise:xa0Medical & Surgical - Surgery -- General; Medical & Surgical - Orthopedics', 'Expert Evaluator Report']
['Wayne E. Williams', '3. Wayne E. Williams', 'Expert Witnessxa0AR', 'Position:xa0Forestry Expert', 'Areas of Expertise:xa0Environmental - Trees', 'Expert Challenge Report', 'Expert Evaluator Report']

您的文件扩展名为.xlsx,但似乎是常规文本文件?

最新更新