我有一个熊猫df
,如下所示:
ID EpisodeID Origin Destination
1 1 A B
1 2 B A
2 1 C D
2 2 D E
2 3 E C
3 1 A D
3 2 D A
我想制作一个以这个df
为源的txt文件。因此,我使用这样的代码:
with open("output.txt","w+") as f:
for index, row in df.iterrows():
f.write(" <person id ="%s">n" % (row['ID']))
f.write(" <activity O="%s" D="%s">n % (row[Origin], row[Destination]))
f.write(" </activity>n")
f.write(" </person>n")
输出显示类似于:
<person id="1">
<activity O="A" D="B">
</activity>
</person>
<person id="1">
<activity O="B" D="A">
</activity>
</person>
然而,我想做的并不是这样的。我如何迭代或编写代码,以便输出类似于:
<person id="1">
<activity O="A" D="B">
</activity>
<activity O="B" D="A">
</activity>
</person>
<person id="2">
<activity O="C" D="D">
</activity>
<activity O="D" D="E">
</activity>
<activity O="E" D="C"
</activity>
</person>
所以,我试图为每个ID而不是所有索引(如果这有意义的话(做些什么。
请帮助:(
编写嵌套循环,首先按ID
列分组,然后为每组写入person
标签,在每组内,循环并写入activity
:
with open("output.txt","w+") as f:
for _id, g in df.groupby('ID'):
f.write(f' <person id ="{_id}">n')
for t in g.itertuples(): # use itertuples since it's faster than iterrows
f.write(f' <activity O="{t.Origin}" D="{t.Destination}">n')
f.write(" </activity>n")
f.write(" </person>n")
输出:
with open("output.txt", "r") as f:
print(''.join(f.readlines()))
<person id ="1">
<activity O="A" D="B">
</activity>
<activity O="B" D="A">
</activity>
</person>
<person id ="2">
<activity O="C" D="D">
</activity>
<activity O="D" D="E">
</activity>
<activity O="E" D="C">
</activity>
</person>
<person id ="3">
<activity O="A" D="D">
</activity>
<activity O="D" D="A">
</activity>
</person>