如何在这个pandas数据框架中基于列条件创建新的索引行?



我有如下的pandas数据框架:

import pandas as pd 
import numpy as np
data = [['Apple', 1, 1, 1 ,1,], ['Orange', np.nan, 1, 1, np.nan], ['Banana', 1, np.nan, 1, np.nan]]
df = pd.DataFrame(data, columns = ['Type of fruit', 'Paris', "Boston", "Austin", "New York"])

输出:

Type of fruit   Paris   Boston  Austin  New York
0   Apple       1.0     1.0     1       1.0
1   Orange      NaN     1.0     1       NaN
2   Banana      1.0     NaN     1       NaN

我想创建一个名为"Location"的新列,并基于四个列Paris, Boston, Austin, New York创建新的索引,例如:

理想输出:

Location    Type of fruit
0   Paris       Apple
1   Boston      Apple
2   Austin      Apple
3   New York    Apple
4   Boston      Orange
5   Austin      Orange
6   Paris       Banana
7   Austin      Banana

我可以过滤每个位置列以保持非空索引(例如Paris):

df_paris = df.loc[df["Paris"].notna(),["Type of fruit"]]
df_paris["Location"] = "Paris"

,然后连接每个位置的数据帧:

pd.concat([df_paris, df_boston, df_austin, df_new_york])

但是我确信有更好的方法使用pandas函数来做这些事情。

使用DataFrame.set_indexDataFrame.stack,默认会删除缺失值:

df1 = (df.set_index('Type of fruit')
.rename_axis('Location', axis=1)
.stack()
.reset_index()[['Location','Type of fruit']])

或将MultiIndex转换为新的DataFrame:

df1 = (df.set_index('Type of fruit')
.rename_axis('Location', axis=1)
.stack()
.swaplevel(1,0)
.index
.to_frame(index=False))

或使用DataFrame.melt为unpivot删除缺失行DataFrame.dropna:

df1 = (df.melt('Type of fruit', var_name='Location', ignore_index=False)
.sort_index()
.dropna(subset=['value'])[['Location','Type of fruit']]
.reset_index(drop=True))

print (df1)
Location Type of fruit
0     Paris         Apple
1    Boston         Apple
2    Austin         Apple
3  New York         Apple
4    Boston        Orange
5    Austin        Orange
6     Paris        Banana
7    Austin        Banana
(
df
.set_index("Type of fruit")      # Moves the 'Type of fruit' column to the index
.rename_axis('Location', axis=1) # Sets the name of the column index to 'Location'
.stack()                         # Moves the remaining columns to be a second index level
.index                           # Select just the index (which is a MultiIndex with `Type of fruit` and `Location` as levels)
.swaplevel()                     # Swaps the level order so that `Location` is first
.to_frame(index=False)           # Convert the MultiIndex to a DataFrame. 
# `index=False` means that the resulting DF will just have a numeric index. 
# If index=True, the input MultiIndex would be used as both the index and content of the DataFrame
)

最新更新