在Python中用相应的行值填充NaN



我有以下数据帧:

Region                   Date         Confirmed   Deaths  Recovered   Latitude    Longitude
0     Mainland China Anhui     2020-01-22   1.0         0.0     0.0         NaN         NaN
1     Mainland China Beijing   2020-01-22   14.0        0.0     0.0         NaN         NaN
2     Mainland China Chongqing 2020-01-22   6.0         0.0     0.0         NaN         NaN
3     Mainland China Fujian    2020-01-22   1.0         0.0     0.0         NaN         NaN
4     Mainland China Gansu     2020-01-22   0.0         0.0     0.0         NaN         NaN
2825  Mainland China Anhui     2020-03-01   990.0       6.0     873.0       31.8257     117.2264
567   Mainland China Anhui     2020-02-05   1.0         0.0     0.0         NaN         NaN
2951  Mainland China Anhui     2020-03-02   990.0       6.0     917.0       31.8257     117.2264
4273  Mainland China Fujian    2020-03-07   296.0       1.0     295.0       26.0789     117.9874
4541  Mainland China Fujian    2020-03-07   296.0       1.0     295.0       26.0789     117.9874

我想用基于区域的相应值填充Latitude和Longtitude中的NaN值。

我试过了:

df = df.groupby(['Region']).ffill()
df

但这只会让我想到:

Date        Confirmed   Deaths  Recovered   Latitude    Longitude
0       2020-01-22  1.0         0.0     0.0         NaN         NaN
1       2020-01-22  14.0        0.0     0.0         NaN         NaN
2       2020-01-22  6.0         0.0     0.0         NaN         NaN
3       2020-01-22  1.0         0.0     0.0         NaN         NaN
4       2020-01-22  0.0         0.0     0.0         NaN         NaN

提前感谢!

我只想使用max忽略NaN值的事实,所以这应该足够了:

df.loc[:,['Latitude', 'Longitude']] = df.groupby('Region')[['Latitude', 'Longitude']].transform('max')

您可以对分组元素使用前后填充方法。

df['Latitude'] = df.groupby('Region')['Latitude'].fillna(method='backfill').fillna(method='pad')
df['Longitude'] = df.groupby('Region')['Longitude'].fillna(method='backfill').fillna(method='pad')

最新更新