如果所有行都是nan,则将pandas数据帧拆分为多个数据帧



我有以下数据帧。

a          b           c          d
0   4.65  30.572857  133.899994  23.705000
1   4.77  30.625713  134.690002  23.225000
2   4.73  30.138571  132.250000  23.040001
3   5.07  30.082857  130.000000  23.290001
4   4.98  30.282858  133.520004  23.389999
5    NaN        NaN         NaN        NaN
6   4.82  29.674286  127.349998  23.700001
7   4.83  30.092857  129.110001  24.254999
8   4.85  29.918571  127.349998  24.695000
9   4.70  29.418571  127.139999  24.424999
10  4.69  30.719999  127.610001  25.200001
11   NaN        NaN         NaN        NaN
12  4.43  29.724285  126.620003  24.764999
13   NaN        NaN         NaN        NaN
14  4.29  29.010000  120.309998  24.730000
15  4.11  29.420000  119.480003  25.035000

当所有NaN都有行时,我想把这个df分成多个dfs。

我探索了以下链接,但不知道如何将其应用于我的问题。

如果pandas数据帧的超过10行,则将其一分为二

将数据帧拆分为多个数据帧

在我的示例中,我将有4个数据帧,其中5、5、1和2行作为输出。请提出前进的道路。

使用isnaallcumsumgroupby

首先,我们检查一行中的所有值是否都是NaN,然后使用cumsum创建一个组指示符,最后我们将这些数据帧保存在一个带有groupby:的列表中

grps = df.isna().all(axis=1).cumsum()
dfs = [df.dropna() for _, df in df.groupby(grps)]
for df in dfs:
print(df)
a          b           c          d
0  4.65  30.572857  133.899994  23.705000
1  4.77  30.625713  134.690002  23.225000
2  4.73  30.138571  132.250000  23.040001
3  5.07  30.082857  130.000000  23.290001
4  4.98  30.282858  133.520004  23.389999
a          b           c          d
6   4.82  29.674286  127.349998  23.700001
7   4.83  30.092857  129.110001  24.254999
8   4.85  29.918571  127.349998  24.695000
9   4.70  29.418571  127.139999  24.424999
10  4.69  30.719999  127.610001  25.200001
a          b           c          d
12  4.43  29.724285  126.620003  24.764999
a      b           c       d
14  4.29  29.01  120.309998  24.730
15  4.11  29.42  119.480003  25.035

像这样的东西应该可以做到:

import pandas as pd
import numpy as np

data_frame = pd.DataFrame({"a":[1,np.nan,3,np.nan,4,np.nan,5],
"b":[1,np.nan,3,np.nan,4,np.nan,5],
"c":[1,np.nan,3,np.nan,4,np.nan,5],
"d":[1,np.nan,3,np.nan,4,np.nan,5],
"e":[1,np.nan,3,np.nan,4,np.nan,5],
"f":[1,np.nan,3,np.nan,4,np.nan,5]})
all_nan = data_frame.index[data_frame.isnull().all(1)]
df_list = []
prev = 0
for i in all_nan:
df_list.append(data_frame[prev:i])
prev = i+1
for i in df_list:
print(i)

做同样事情的另一种风格:

nan_indices = df.index[df.isna().all(axis=1)]
df_list = [df.dropna() for df in np.split(df, nan_indices)]
df_list
[      a          b           c          d
0  4.65  30.572857  133.899994  23.705000
1  4.77  30.625713  134.690002  23.225000
2  4.73  30.138571  132.250000  23.040001
3  5.07  30.082857  130.000000  23.290001
4  4.98  30.282858  133.520004  23.389999,
a          b           c          d
6   4.82  29.674286  127.349998  23.700001
7   4.83  30.092857  129.110001  24.254999
8   4.85  29.918571  127.349998  24.695000
9   4.70  29.418571  127.139999  24.424999
10  4.69  30.719999  127.610001  25.200001,
a          b           c          d
12  4.43  29.724285  126.620003  24.764999,
a      b           c       d
14  4.29  29.01  120.309998  24.730
15  4.11  29.42  119.480003  25.035]

最新更新