对多列进行有条件筛选,否则返回整个数据框



我有一个csv与几个人。我想构建一个函数,它将根据给定的所有参数进行过滤,或者在没有传递参数的情况下返回整个数据框。

那么给定这个作为csv:

FirstName    LastName   City
Matt          Fred      Austin
Jim           Jack      NYC
Larry         Bob       Houston
Matt          Spencer   NYC

如果我调用我的函数find,假设这里是我期望看到的取决于我传递的参数

find(first="Matt", last="Fred")
Output: Matt   Fred Austin
find()
Output: Full Dataframe
find(last="Spencer")
Output: Matt Spencer Fred
find(address="NYC")
Output: All people living in NYC in dataframe

这是我尝试过的:

def find(first=None, last=None, city=None):
file= pd.read_csv(list)
searched = file.loc[(file["FirstName"] == first) & (file["LastName" == last]) & (file["City"] == city)]
return searched

如果只传递名字,则返回空白

你可以这样做:

import numpy as np
def find(**kwargs):
assert np.isin(list(kwargs.keys()), df.columns).all()
return df.loc[df[list(kwargs.keys())].eq(list(kwargs.values())).all(axis=1)]
search = find(FirstName="Matt", LastName="Fred")
print(search)
#  FirstName LastName    City
#0      Matt     Fred  Austin
find(LastName="Spencer")
#   FirstName     LastName   City
#3       Matt      Spencer    NYC

如果您想使用"first","last""city":

def find(**kwargs):

df_index = df.rename(columns={"FirstName": "first",
"LastName": "last", 
"City": "city"})
assert np.isin(list(kwargs.keys()), df_index.columns).all()

return df.loc[df_index[list(kwargs.keys())]
.eq(list(kwargs.values())).all(axis=1)]

另一种过滤列的方法:

csv_path = os.path.abspath('test.csv')
df = pd.read_table(csv_path, sep='s+')
def find_by_attrs(df, **attrs):
if attrs.keys() - df.columns:
raise KeyError('Improper column name(s)')
return df[df[attrs.keys()].eq(attrs.values()).all(1)]

print(find_by_attrs(df, City="NYC"))

输出:

FirstName LastName City
1       Jim     Jack  NYC
3      Matt  Spencer  NYC

最新更新