用列表将字典映射到pandas数据框架



我有一个字典,我想把它映射到一个大的pandas数据框。问题是我想要用于映射的列被双引号括起来,有时该列中有一个或多个项目。

原来

dict_id = {
'College1': ['1256511'],
'College2': ['1200582'],
'College3': ['1256618'], 
'College10': ['1256621']
}

id1  id2            college_name
0   01   01    "College1, College2"
1   01   02  "College10, College12"
2   01   03             "College19"

需要

id1 id2    college_name              id_college
01   01     "College1, College2"    1256511, 1200582
01   02     "College10, College12"  1256621
01   03     "College19"

你的数据在explode之后格式更好,但我把它都放回了最后的样子~

df.college_name = df.college_name.str[1:-1].str.split(', ')
df = df.explode('college_name')
df2 = pd.DataFrame.from_dict(dict_id, 'index', columns=['id_college'], dtype=str)
df = df.merge(df2, left_on='college_name', right_index=True, how='left')
df = df.fillna('').groupby(['id1', 'id2'], as_index=False).agg(', '.join)
df.college_name = '"' + df.college_name + '"'
print(df)

输出:

id1 id2            college_name        id_college
0  01  01    "College1, College2"  1256511, 1200582
1  01  02  "College10, College12"         1256621,
2  01  03             "College19"

让DF1是你的大学名称和id的字典,DF2是包含大学名称的大数据框架,有时是逗号分隔的大学名称列表

您将希望根据一个基于您的DF1生成序列的函数来设置DF2中的新列,以及DF2中的collegeNames列

def genIds(df, df_col):
id_list = []
for collegeName in df_col:
id_to_add = ""
if ',' in collegeName:
temp_list = []
for cName in collegeName.split(','):
if cName in df.keys(): 
# if this is an actual pandas df do 
# if cName in df['college_names']
temp_list.append(df[cName])
id_to_add = ",".join(temp_list) if len(id_list)>0 else ""
else:
id_to_add = df[collegeName] if collegeName in df.keys() else ""
id_list.append(id_to_add)
return id_list

df2['ids'] = genIds(df1, df2['college_name'].values)

相关内容

  • 没有找到相关文章

最新更新