我对如何做到这一点很困惑(我还很新手(,我需要将这个数据帧转换成一个字典,其中包含一列重复值:
import pandas as pd
df = pd.DataFrame({'Name': [['John', 'hock'], ['John','pepe'],['Peter', 'wdw'],['Peter'],['John'], ['Stef'], ['John']],
'Age': [38, 47, 63, 28, 33, 45, 66]
})
我需要一些类似的东西:
Name Age Repeated:
John 38 4
谢谢!
我能想到这样的东西:
resultDict = {}
for index, row in df.iterrows():
for value in row["Name"]:
if value not in resultDict:
resultDict[value] = 0
resultDict[value] += 1
resultDict
输出
{'John': 4, 'Peter': 2, 'Stef': 1, 'hock': 1, 'pepe': 1, 'wdw': 1}
如果你想把它作为数据帧而不是字典:
resultDict = {}
for index, row in df.iterrows():
for value in row["Name"]:
if value not in resultDict:
resultDict[value] = 0
resultDict[value] += 1
pd.DataFrame({"Name":resultDict.keys(), "Repeated":resultDict.values()})
输出
名称 | 重复 |
---|---|
John | 4 |
hock | 1 |
pepe | 1 |
Peter | 2 |
wdw | 1 |
Stef | 1 |
将DataFrame.explode
与GroupBy.size
:一起使用
df = df.explode('Name').groupby(['Name']).size().reset_index(name='Repeated')
print (df)
Name Repeated
0 John 4
1 Peter 2
2 Stef 1
3 hock 1
4 pepe 1
5 wdw 1