如何根据字段值应用映射到数据框架?



我有一个脚本,在其中我循环遍历一个数据框架的一个字段值。

之类的
import pandas as pd
import numpy as np
data = {
"thevalue": [0,0,1,2,2,3,5,5,5],
"firstname": ["Sally", "Mary", "John","Peter","Julius","Cornelius","Athos","Porthos","Aramis"],
"age": [50, 40, 30,20,10,20,11,12,23]
}
df = pd.DataFrame(data)
print(df)
print(max(df['thevalue']))
limi=max(df['thevalue'])
print("=============")
def get_result(df,f):
n_df=df.query('thevalue==@f')
print(n_df)
suma=sum(n_df['age'])
if n_df.empty:
return np.nan
ave=suma/len(n_df['age'])
return ave
lista=[]
for f in range(limi+1):   #<---replace from here
print(f)
#print(df.query('thevalue ==@f'))
res=get_result(df,f)
lista.append(res)
print(lista)

我想用map替换最后一个for如果我将一个映射应用到数据帧的所有行,这将不是一个问题,但我如何基于thevalue在块中应用它?

编辑:第一个脚本(带有循环)的结果是

thevalue  firstname  age
0         0      Sally   50
1         0       Mary   40
2         1       John   30
3         2      Peter   20
4         2     Julius   10
5         3  Cornelius   20
6         5      Athos   11
7         5    Porthos   12
8         5     Aramis   23
5
=============
0
thevalue firstname  age
0         0     Sally   50
1         0      Mary   40
1
thevalue firstname  age
2         1      John   30
2
thevalue firstname  age
3         2     Peter   20
4         2    Julius   10
3
thevalue  firstname  age
5         3  Cornelius   20
4
Empty DataFrame
Columns: [thevalue, firstname, age]
Index: []
5
thevalue firstname  age
6         5     Athos   11
7         5   Porthos   12
8         5    Aramis   23
[45.0, 30.0, 15.0, 20.0, nan, 15.333333333333334]

我想有相同的输出,但有映射。因此,最终列表[45.0, 30.0, 15.0, 20.0, nan, 15.333333333333334](如果可能的话,打印如下:

)
0
thevalue firstname  age
0         0     Sally   50
1         0      Mary   40

您可以使用以下代码按组划分数据帧:

g = df.groupby('thevalue')
[g.get_group(x) for x in g.groups]

让我们使用上面的代码来获得所需的输出:

g = df.groupby('thevalue')
range_v = range(df['thevalue'].min(), df['thevalue'].max() + 1)
[(x, g.get_group(x)) if x in g.groups else (x, pd.DataFrame(columns=df.columns)) for x in range_v]

结果:

[(0,
thevalue firstname  age
0         0     Sally   50
1         0      Mary   40),
(1,
thevalue firstname  age
2         1      John   30),
(2,
thevalue firstname  age
3         2     Peter   20
4         2    Julius   10),
(3,
thevalue  firstname  age
5         3  Cornelius   20),
(4,
Empty DataFrame
Columns: [thevalue, firstname, age]
Index: []),
(5,
thevalue firstname  age
6         5     Athos   11
7         5   Porthos   12
8         5    Aramis   23)]

我把它作为元组,但如果你想要不同的类型(列表或字典),请适当修改它。

最新更新