所以我试图应用不同的条件,这取决于一个日期,月是具体的。例如,对于1月,替换TEMP中高于45的数据,但对于2月,则替换高于30的数据,以此类推。我已经用下面的代码这样做了,但问题是上个月的数据被替换为nan。
这是我的代码:
meses = ["01", "02"]
for i in var_vars:
if i in dataframes2.columns.values:
for j in range(len(meses)):
test_prueba_mes = dataframes2[i].loc[dataframes2['fecha'].dt.month == int(meses[j])]
test_prueba = test_prueba_mes[dataframes2[i]<dataframes.loc[i]["X"+meses[j]+".max"]]
dataframes2["Prueba " + str(i)] = test_prueba
输出:dataframes2.tail(5)
fecha TEMP_C_Avg RH_Avg Prueba TEMP_C_Avg Prueba RH_Avg
21 2020-01-01 22:00:00 46.0 103 NaN NaN
22 2020-01-01 23:00:00 29.0 103 NaN NaN
23 2020-01-02 00:00:00 31.0 3 NaN NaN
24 2020-01-02 12:00:00 31.0 2 NaN NaN
25 2020-02-01 10:00:00 29.0 5 29.0 5.0
我希望的输出是:
输出:fecha TEMP_C_Avg RH_Avg Prueba TEMP_C_Avg Prueba RH_Avg
21 2020-01-01 22:00:00 46.0 103 NaN NaN
22 2020-01-01 23:00:00 29.0 103 29.0 NaN
23 2020-01-02 00:00:00 31.0 3 31.0 3.0
24 2020-01-02 12:00:00 31.0 2 31.0 2.0
25 2020-02-01 10:00:00 29.0 5 29.0 5.0
如果有人能帮助我,我很感激。
更新:6个月的规则集为1月45日,2月30日,3月45日,4月10日,5月15日,6月30日
数据示例:
fecha TEMP_C_Avg RH_Avg
25 2020-02-01 10:00:00 29.0 5
26 2020-02-01 11:00:00 32.0 105
27 2020-03-01 10:00:00 55.0 3
28 2020-03-01 11:00:00 40.0 5
29 2020-04-01 10:00:00 10.0 20
30 2020-04-01 11:00:00 5.0 15
31 2020-05-01 10:00:00 20.0 15
32 2020-05-01 11:00:00 5.0 106
33 2020-06-01 10:00:00 33.0 107
34 2020-06-01 11:00:00 20.0 20
清晰理解
- 已将月限制编码为
dict
限制 - 使用numpy
select()
,当条件匹配时取第二个参数条件对应的值。默认为第三个参数 - 从限制动态构建条件
dict
- 第二个参数需要与条件
list
相同的长度。构建np.nan
作为list
理解列表,使其长度正确 - 考虑所有列,使用
dict
推导式将**kwarg
参数构建为assign()
df = pd.read_csv(io.StringIO(""" fecha TEMP_C_Avg RH_Avg
25 2020-02-01 10:00:00 29.0 5
26 2020-02-01 11:00:00 32.0 105
27 2020-03-01 10:00:00 55.0 3
28 2020-03-01 11:00:00 40.0 5
29 2020-04-01 10:00:00 10.0 20
30 2020-04-01 11:00:00 5.0 15
31 2020-05-01 10:00:00 20.0 15
32 2020-05-01 11:00:00 5.0 106
33 2020-06-01 10:00:00 33.0 107
34 2020-06-01 11:00:00 20.0 20"""), sep="ss+", engine="python")
df.fecha = pd.to_datetime(df.fecha)
# The ruleset for 6 months is jan 45, feb 30, mar 45, abr 10, may 15, jun 30
limits = {1:45, 2:30, 3:45, 4:10, 5:15, 6:30}
df = df.assign(**{f"Prueba {c}":np.select( # construct target column name
# build a condition for each of the month limits
[df.fecha.dt.month.eq(m) & df[c].gt(l) for m,l in limits.items()],
[np.nan for m in limits.keys()], # NaN if beyond limit
df[c]) # keep value if within limits
for c in df.columns if "Avg" in c}) # do calc for all columns that have "Avg" in name
功能RH_Avg南3南南南南