在循环中创建熊猫 dfs



我有以下数据帧:

import pandas as pd
df = pd.DataFrame({'name':['apple', 'banana', 'apple', 'mandarin', 'kiwi', 'kiwi'], 'val':[1,5,3,4,5,3]})

我想创建单独的数据帧,将水果名称传递给以下函数:

def split_all_fruits(df, fruit):
return df[df['name'] == fruit]

然后我可以创建数据帧,只需调用函数:

apple_df = split_all_fruits(df, 'apple')
banana_df = split_all_fruits(df, 'banana')
#and other fruits

但是我想在 for 循环中创建数据帧,而无需每次为每个水果调用函数。但是是否可以创建 for 循环,一次从列表中为每个水果生成 5 个数据帧,其中包含数据帧名称? 像这样:

#name of dfs
name_of_dfs = ['apple_df', 'banana_df', 'mandarin_df', 'kiwi_df']
#fruit names in df
fruit_name = ['apple', 'banana', 'mandarin', 'kiwi']
#loop to create dfs, but i dont know how to assign list of names in name_of_dfs to respective fruit df
for fruit in fruit_name:
df_final = split_all_fruits(df,fruit)
print(df_final)
#it prints all dfs together, but not seperate df with name from list

谢谢!

虽然这可以通过eval()globals()来完成,但更好(更干净,更易于编程管理(的方法是改用dict,例如:

dfs = {name: split_all_fruits(df, name) for name in fruit_name}

访问方式,例如:

dfs['apple']

>编辑如果你真的想通过apple_df等名称来拥有变量,一种方法是修改globals()字典,例如:

globals().update({name + '_df': split_all_fruits(df, name) for name in fruit_name})

访问方式,例如:

apple_df

请注意,这被认为是不好的做法!

如果您希望为每个水果生成数据框,则可以使用以下方法(请注意,根据水果名称,顺序将按字母顺序排列(:

[d for name, d in df.groupby("name")]

如果您需要按特定顺序排列的列表:

[df.query("name == '{}'".format(f)) for f in fruit_name]

如果您希望能够按名称访问它们,您可以使用字典理解并引用水果名称:

grouped = {name: d for name, d in df.groupby("name")}
# e.g. for apple:
apple_df = grouped['apple']

最后,如果您需要列出特定变量并知道这些变量应该提前列出:

grouped = {name: d for name, d in df.groupby("name")}
apple_df, banana_df, mandarin_df, kiwi_df = [grouped[fruit] for fruit in fruit_name]

如果你想要变量名而不是字典键(这不是一个好主意,但这是OP问的(,你可以使用exec()

#name of dfs
name_of_dfs = ['apple_df', 'banana_df', 'mandarin_df', 'kiwi_df']
fruit_name = ['apple', 'banana', 'mandarin', 'kiwi']
for fruit, df_name in zip(fruit_name, name_of_dfs):
exec(f'{df_name} = split_all_fruits(df, "{fruit}")')
print(globals()[df_name])

现在您可以直接访问变量:

print(apple_df)
name  val
0  apple    1
2  apple    3

您可以修改最后一段代码,如下所示

name_of_dfs = ['apple_df', 'banana_df', 'mandarin_df', 'kiwi_df']
#fruit names in df
fruit_name = ['apple', 'banana', 'mandarin', 'kiwi']
dfs = {}
#loop to create dfs, but i dont know how to assign list of names in name_of_dfs to respective fruit df
for fruit in fruit_name:
df_final = split_all_fruits(df,fruit)
dfs[fruit] = df_final
print(df_final)

然后,使用水果名称访问任何数据帧。

dfs['apple']

最新更新