从数据框中提取两个指定的单词,并将它们放在一个新列中,然后删除这些行



这是数据帧:

data = {"Company" : [["ConsenSys"] , ["Cognizant"], ["IBM"], ["IBM"], ["Reddit, Inc"], ["Reddit, Inc"], ["IBM"]],
"skills" : [['services', 'scientist technical expertise', 'databases'], ['datacomputing tools experience', 'deep learning models', 'cloud services'], ['quantitative analytical projects', 'financial services', 'field experience'],
['filesystems server architectures', 'systems', 'statistical analysis', 'data analytics', 'workflows', 'aws cloud services'], ['aws services'], ['data mining statistics', 'statistical analysis', 'aws cloud', 'services', 'data discovery', 'visualization'], ['communication skills experience', 'services', 'manufacturing environment', 'sox compliance']]}
dff = pd.DataFrame(data)
dff
  • 我需要创建一个新的列,我想通过采取具体的
  • 不包含这些特定单词的行应该是删除。
  • 关键词:"服务"、"统计分析"预期输出:

<表类>公司技能new_col0[ConsenSys][services,科学家技术专长,数据库][services]1[IBM][文件系统服务器架构、系统、统计分析、数据分析、工作流、aws云服务][服务、统计分析]2[Reddit, Inc][数据挖掘统计、统计分析、aws云、服务、数据发现、可视化][统计分析]3[IBM]['沟通技巧经验','服务','制造环境','sox合规'][services]

可以将lambda与列表一起使用

words = ["services", "statistical analysis"]
dff["found"] = dff["skills"].apply(lambda x: ", ".join(set([i for i in x if i in words])).split(", "))
word = ['services', 'statistical analysis']
s1 = df['skills'].apply(lambda x: [i for i in word if i in x])
输出(s1):
0                          [services]
1                                  []
2                                  []
3              [statistical analysis]
4                                  []
5    [services, statistical analysis]
6                          [services]
Name: skills, dtype: object

使s1变为new_colboolean indexing

df.assign(new_col=s1)[lambda x: x['new_col'].astype('bool')]

结果:

Company skills  new_col
0   [ConsenSys] [services, scientist technical expertise, data...   [services]
3   [IBM]   [filesystems server architectures, systems, st...   [statistical analysis]
5   [Reddit, Inc]   [data mining statistics, statistical analysis,...   [services, statistical analysis]
6   [IBM]   [communication skills experience, services, ma...   [services]

我认为你应该做更简单的例子

最新更新