我有一个数据框架,它有一个部门、它的函数、它的子函数和它的子函数。数据框架的一个例子是:
d = [['Dept1' , 'HR' , 'Talent' , 'Good Employee 3'],
['Dept2' , 'IT' , 'Garbage' , 'Analysis HR 2'],
['Dept3' , 'IT' , 'Tech Sup' , 'IT Tech 2'],
[ 'Dept4' , 'HR' , 'Hardware' , 'Trash Data' ] ,
[ 'Dept5' , 'MKT' , 'Sales' , 'Facebook Promo 1'],
['Dept6' , 'MKT' , NaN, 'Car profit ']]
df = pd.DataFrame(d, columns=['Department', 'Function' , 'Subfunction' , 'Sub-subfunction'])
Department Function Subfunction Sub-subfunction
0 Dept1 HR Talent Good Employee 3
1 Dept2 IT Garbage Analysis HR 2
2 Dept3 IT Tech Sup IT Tech 2
3 Dept4 HR Hardware Trash Data
4 Dept5 MKT Sales Facebook Promo 1
5 Dept6 MKT NaN Car profit
我需要创建一个规则来检查一个部门是否在函数中有某个值,它只允许在子函数中有一个可能的值列表。然后,在同一子函数中,每个唯一值只允许子函数中一个可能的值列表。
映射如下:
subfunction = {'HR': ['Talent', 'Analysis Human'],
'IT': ['Tech Sup', 'Hardware'],
'MKT': ['Sales', 'Communication']}
sub_subfunction = {'Talent': ['Good Employee 1', 'Good Employee 2', 'Good Employee 3'],
'Analysis Human': [ 'Analysis HR 1', 'Analysis HR 2', 'Analysis HR 3'],
'Tech Sup': ['IT Tech 1', 'IT Tech 2', 'IT Tech 3', 'Tech Master'],
'Hardware': ['PC pieces', 'Phone pieces'],
'Sales': ['Car profit', 'Bolt profit'],
'Communication': ['Facebook Promo 1', 'Instagram Promo 1']}
在这个数据框中,这个函数将返回那些不遵守这个规则的,在这个例子中它将返回:
Department Function Subfunction Sub-subfunction
1 Dept2 IT Garbage Analysis HR 2
3 Dept4 HR Hardware Trash Data
4 Dept5 MKT Sales Facebook Promo 1
5 Dept6 MKT NaN Car profit
应用这些规则的最佳方式是什么?正如您所看到的,数据框架的值可以在映射字典的值之外(尽管我可以使用函数)。所有的值都映射到子函数字典中(如果这对你有帮助的话)。
感谢您的支持!
将您的字典读入DataFrame,重塑并合并它们,以便您形成['Function', 'Subfunction', 'Sub-Subfunction']
所有允许组合的DataFrame。然后使用'left'
与指示符合并,我们可以查询不匹配的并只返回该子集。
df1 = (pd.DataFrame.from_dict(subfunction, orient='index')
.stack()
.reset_index()
.drop(columns='level_1')
.rename(columns={'level_0': 'Function', 0: 'Subfunction'}))
df2 = (pd.DataFrame.from_dict(sub_subfunction, orient='index')
.stack()
.reset_index()
.drop(columns='level_1')
.rename(columns={'level_0': 'Subfunction', 0: 'Sub-subfunction'}))
# All allowable combinations
key = df1.merge(df2)
res = (df.merge(key, indicator=True, how='left')
.query('_merge == "left_only"')
.drop(columns='_merge'))
print(res)
Department Function Subfunction Sub-subfunction
1 Dept2 IT Garbage Analysis HR 2
3 Dept4 HR Hardware Trash Data
4 Dept5 MKT Sales Facebook Promo 1
5 Dept6 MKT Communication Car profit