我是Python和Pandas的新手。我有一个 CSV 的日期和温度值超过 30 年,出于我的目的,我计算每个组的分位数按月。
input_file = pd.read_csv("DailyMax_lat-14_lon35.csv")
sQ = data.groupby(['Month']).quantile(0.5)['TmaxScaled']
print(sQ)
Month
1 297.977336
2 298.190348
3 298.433919
4 298.580322
5 298.221629
6 296.736598
7 296.463704
8 298.701436
9 302.380452
10 304.102163
11 303.562688
12 299.231298
现在,我想将数据帧拆分为两个数据帧,这两个数据帧的值分别小于计算的分位数和大。
你可以帮我吗?
print(input_file) is
Year Month Day TmaxScaled
0 1980 1 3 296.941457
1 1980 1 4 296.978455
2 1980 1 5 296.654368
3 1980 1 6 296.732218
4 1980 1 7 297.468730
5 1980 1 8 298.330566
6 1980 1 9 297.844157
7 1980 1 10 297.228007
8 1980 1 11 296.916066
9 1980 1 12 297.247884
10 1980 1 13 297.851888
11 1980 1 14 298.854523
您可能需要transform
df['New']=df.groupby(['Month'])['TmaxScaled'].transform(lambda x :x.quantile(0.5) )
df1,df2=df.loc[df.TmaxScaled>df.New],df.loc[df.TmaxScaled<=df.New]
df1
Out[43]:
Year Month Day TmaxScaled New
4 1980 1 7 297.468730 297.237946
5 1980 1 8 298.330566 297.237946
6 1980 1 9 297.844157 297.237946
9 1980 1 12 297.247884 297.237946
10 1980 1 13 297.851888 297.237946
11 1980 1 14 298.854523 297.237946
df2
Out[44]:
Year Month Day TmaxScaled New
0 1980 1 3 296.941457 297.237946
1 1980 1 4 296.978455 297.237946
2 1980 1 5 296.654368 297.237946
3 1980 1 6 296.732218 297.237946
7 1980 1 10 297.228007 297.237946
8 1980 1 11 296.916066 297.237946
像你已经做的那样计算sQ
:
sQ = df.groupby(['Month']).quantile(0.5)['TmaxScaled'].item()
print(sQ)
297.2379455
现在,执行groupby
并将df
拆分为组:
df_dict = {k : g for k, g in df.groupby(df.TmaxScaled > sQ)}
对于df_dict[False]
,df_dict[True]
返回其值超过分位数的数据帧,反之亦然:
df_dict[True]
Year Month Day TmaxScaled
4 1980 1 7 297.468730
5 1980 1 8 298.330566
6 1980 1 9 297.844157
9 1980 1 12 297.247884
10 1980 1 13 297.851888
11 1980 1 14 298.854523
df_dict[False]
Year Month Day TmaxScaled
0 1980 1 3 296.941457
1 1980 1 4 296.978455
2 1980 1 5 296.654368
3 1980 1 6 296.732218
7 1980 1 10 297.228007
8 1980 1 11 296.916066
请注意,这将保留组内的顺序。