我有一个数据帧,其中有一些NaN值,如下所示,我想在同一列中随机选择的列中填充NaN值。例如,从Col1中随机选取值来填充Col1中的nan值
Col1 Col2 Col3 Col4 Col5
0 -0.671603 -0.792415 0.783922 NaN Blue
1 0.207720 NaN 0.996131 Tom Yellow
2 -0.892115 -1.282333 NaN Julia NaN
3 -0.315598 -2.371529 -1.959646 NaN Pink
4 NaN NaN -0.584636 NaN Orange
5 0.314736 -0.692732 -0.303951 Jim NaN
6 0.355121 NaN NaN NaN Red
7 NaN -1.900148 1.230828 Sophia NaN
8 -1.795468 0.490953 NaN Anne Blue
9 -0.678491 -0.087815 NaN NaN NaN
10 0.755714 0.550589 -0.702019 NaN Pink
11 0.951908 -0.529933 0.344544 Tobi Yellow
12 NaN 0.075340 -0.187669 Jon Red
13 NaN 0.314342 -0.936066 NaN Yellow
14 NaN 1.293355 0.098964 Peter Orange
有什么想法吗?
我试过这样做:
import numpy as np
import pandas as pd
num_nan= df[col_name].isna().sum()
for n in len(range(num_nan)):
#pick random value from e.g. col1 that's not NaN
df[col_name] = df[col_name].where((pd.notnull(df)), None).sample(random_state= 1)
#replace NaN-value in e.g. col1 with picked value
df[col_name]= df.fillna('value')`
用同一列
中的随机选择来替换列中的nan值您可以尝试:
for c in df:
mask = df[c].isna()
df.loc[mask, c] = np.random.choice(df.loc[~mask, c], size=(mask.sum(), 1))
print(df)
打印(例如):
Col1 Col2 Col3 Col4 Col5
0 -0.671603 -0.792415 0.783922 Jon Blue
1 0.207720 -1.900148 0.996131 Tom Yellow
2 -0.892115 -1.282333 -0.702019 Julia Red
3 -0.315598 -2.371529 -1.959646 Tobi Pink
4 -0.892115 0.075340 -0.584636 Jon Orange
5 0.314736 -0.692732 -0.303951 Jim Pink
6 0.355121 -0.792415 0.344544 Tom Red
7 -0.892115 -1.900148 1.230828 Sophia Red
8 -1.795468 0.490953 -0.303951 Anne Blue
9 -0.678491 -0.087815 0.344544 Jon Yellow
10 0.755714 0.550589 -0.702019 Peter Pink
11 0.951908 -0.529933 0.344544 Tobi Yellow
12 -0.678491 0.075340 -0.187669 Jon Red
13 0.951908 0.314342 -0.936066 Julia Yellow
14 -0.892115 1.293355 0.098964 Peter Orange