GroupBy 并将元素与 Python 中组的分位数进行比较



我是Python和Pandas的新手。我有一个 CSV 的日期和温度值超过 30 年,出于我的目的,我计算每个组的分位数按月。

input_file = pd.read_csv("DailyMax_lat-14_lon35.csv")
sQ = data.groupby(['Month']).quantile(0.5)['TmaxScaled']
print(sQ)
Month
1     297.977336
2     298.190348
3     298.433919
4     298.580322
5     298.221629
6     296.736598
7     296.463704
8     298.701436
9     302.380452
10    304.102163
11    303.562688
12    299.231298

现在,我想将数据帧拆分为两个数据帧,这两个数据帧的值分别小于计算的分位数和大。

你可以帮我吗?

print(input_file) is 
Year  Month  Day  TmaxScaled
0      1980      1    3  296.941457
1      1980      1    4  296.978455
2      1980      1    5  296.654368
3      1980      1    6  296.732218
4      1980      1    7  297.468730
5      1980      1    8  298.330566
6      1980      1    9  297.844157
7      1980      1   10  297.228007
8      1980      1   11  296.916066
9      1980      1   12  297.247884
10     1980      1   13  297.851888
11     1980      1   14  298.854523

您可能需要transform

df['New']=df.groupby(['Month'])['TmaxScaled'].transform(lambda x :x.quantile(0.5) )
df1,df2=df.loc[df.TmaxScaled>df.New],df.loc[df.TmaxScaled<=df.New]
df1
Out[43]: 
Year  Month  Day  TmaxScaled         New
4   1980      1    7  297.468730  297.237946
5   1980      1    8  298.330566  297.237946
6   1980      1    9  297.844157  297.237946
9   1980      1   12  297.247884  297.237946
10  1980      1   13  297.851888  297.237946
11  1980      1   14  298.854523  297.237946
df2
Out[44]: 
Year  Month  Day  TmaxScaled         New
0  1980      1    3  296.941457  297.237946
1  1980      1    4  296.978455  297.237946
2  1980      1    5  296.654368  297.237946
3  1980      1    6  296.732218  297.237946
7  1980      1   10  297.228007  297.237946
8  1980      1   11  296.916066  297.237946

像你已经做的那样计算sQ

sQ = df.groupby(['Month']).quantile(0.5)['TmaxScaled'].item()
print(sQ)
297.2379455

现在,执行groupby并将df拆分为组:

df_dict = {k : g for k, g in df.groupby(df.TmaxScaled > sQ)}

对于df_dict[False]df_dict[True]返回其值超过分位数的数据帧,反之亦然:

df_dict[True]
Year  Month  Day  TmaxScaled
4   1980      1    7  297.468730
5   1980      1    8  298.330566
6   1980      1    9  297.844157
9   1980      1   12  297.247884
10  1980      1   13  297.851888
11  1980      1   14  298.854523

df_dict[False]    
Year  Month  Day  TmaxScaled
0  1980      1    3  296.941457
1  1980      1    4  296.978455
2  1980      1    5  296.654368
3  1980      1    6  296.732218
7  1980      1   10  297.228007
8  1980      1   11  296.916066

请注意,这将保留组内的顺序。

相关内容

  • 没有找到相关文章

最新更新