将数据从数据帧的顶部移动到底部(df的列具有不同长度的索引)



我有一个df,看起来像下面的

Index   Col1   Col2  Col3  Col4   Col5      
0      12     121   346   abc    747
1      156    121   146   68     75967
2      234   121    346   567   
3      gj    161    646   
4      214   171   
5      fhg   

….

我想让数据帧显示为,有空值的列将其数据移动/移位到数据帧的底部。它应该看起来像:

Index   Col1   Col2  Col3  Col4   Col5      
0      12     
1      156    121   
2      234   121    346   
3      gj    121    146   abc 
4      214   161    346   68    747
5      fhg   171    646   567   75967

我一直沿着转变和/或辩护的路线思考。然而,不确定如何以最有效的方式实现大型数据帧

您可以使用一个稍微更改过的justify函数来处理非数值:

def justify(a, invalid_val=0, axis=1, side='left'):    
"""
Justifies a 2D array
Parameters
----------
A : ndarray
Input array to be justified
axis : int
Axis along which justification is to be made
side : str
Direction of justification. It could be 'left', 'right', 'up', 'down'
It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.
"""
if invalid_val is np.nan:
mask = pd.notnull(a)
else:
mask = a!=invalid_val
justified_mask = np.sort(mask,axis=axis)
if (side=='up') | (side=='left'):
justified_mask = np.flip(justified_mask,axis=axis)
out = np.full(a.shape, invalid_val, dtype=object) 
if axis==1:
out[justified_mask] = a[mask]
else:
out.T[justified_mask.T] = a.T[mask.T]
return out

arr = justify(df.values, invalid_val=np.nan, side='down', axis=0)
df = pd.DataFrame(arr, columns=df.columns, index=df.index).astype(df.dtypes)
print (df)
Col1 Col2 Col3 Col4   Col5
0   12  NaN  NaN  NaN    NaN
1  156  121  NaN  NaN    NaN
2  234  121  346  NaN    NaN
3   gj  121  346  567    NaN
4  214  121  346  567  75967
5  fhg  121  346  567  75967

我试过了,

t=df.isnull().sum()
for val in zip(t.index.values,t.values):
df[val[0]]=df[val[0]].shift(val[1])
print df

输出:

Index Col1   Col2   Col3 Col4  Col5      
0      0   12    NaN    NaN  NaN         NaN
1      1  156  121.0    NaN  NaN         NaN
2      2  234  121.0  346.0  NaN         NaN
3      3   gj  121.0  146.0  abc         NaN
4      4  214  161.0  346.0   68       747.0
5      5  fhg  171.0  646.0  567     75967.0

注意:这里我使用了循环,可能不是更好的解决方案,但它会给你一个解决这个问题的想法。

最新更新