如何从pandas数据帧中创建一列，该数据帧具有字典格式的重复值

我对如何做到这一点很困惑(我还很新手(，我需要将这个数据帧转换成一个字典，其中包含一列重复值：

import pandas as pd
df = pd.DataFrame({'Name': [['John', 'hock'], ['John','pepe'],['Peter', 'wdw'],['Peter'],['John'], ['Stef'], ['John']],
'Age': [38, 47, 63, 28, 33, 45, 66]
})

我需要一些类似的东西：

Name Age Repeated:
John 38  4

谢谢！

我能想到这样的东西：

resultDict = {}
for index, row in df.iterrows():
for value in row["Name"]:
if value not in resultDict:
resultDict[value] = 0
resultDict[value] += 1
resultDict

输出

{'John': 4, 'Peter': 2, 'Stef': 1, 'hock': 1, 'pepe': 1, 'wdw': 1}

如果你想把它作为数据帧而不是字典：

resultDict = {}
for index, row in df.iterrows():
for value in row["Name"]:
if value not in resultDict:
resultDict[value] = 0
resultDict[value] += 1
pd.DataFrame({"Name":resultDict.keys(), "Repeated":resultDict.values()})

输出

名称	重复
John	4
hock	1
pepe	1
Peter	2
wdw	1
Stef	1

将DataFrame.explode与GroupBy.size:一起使用

df = df.explode('Name').groupby(['Name']).size().reset_index(name='Repeated')
print (df)
Name  Repeated
0   John         4
1  Peter         2
2   Stef         1
3   hock         1
4   pepe         1
5    wdw         1

输出

输出

相关内容

最新更新

热门标签：