我正在遍历pandas数据帧(最初是一个csv文件(,并在某一列的每一行中检查特定的关键字。如果它至少出现一次,我会在分数上加1。大约有7个关键词,如果得分为>6,我想给另一列(但在这一行(的一个项目分配一个字符串(这里是"软件和应用程序开发人员"(,并确保分数安全。不幸的是,比分在任何地方都是一样的,这让我很难相信。这是我迄今为止的代码:
for row in data.iterrows():
devScore=0
if row[1].str.contains("developer").any() | row[1].str.contains("developpeur").any():
devScore=devScore+1
if row[1].str.contains("symfony").any():
devScore=devScore+1
if row[1].str.contains("javascript").any():
devScore=devScore+1
if row[1].str.contains("java").any() | row[1].str.contains("jee").any():
devScore=devScore+1
if row[1].str.contains("php").any():
devScore=devScore+1
if row[1].str.contains("html").any() | row[1].str.contains("html5").any():
devScore=devScore+1
if row[1].str.contains("application").any() | row[1].str.contains("applications").any():
devScore=devScore+1
if devScore>=6:
data["occupation"]="Software and application developer"
data["score"]=devScore
您在这里为整列分配一个常量:
data["occupation"]="Software and application developer"
data["score"]=devScore
它们应该是:
for idx, row in data.iterrows():
# blah blah
#
.
.
data.loc[idx, "occupation"]="Software and application developer"
data.loc[idx, "score"]=devScore
只需维护一个所需单词goodwords
的列表,这将执行您要查找的逻辑。
import random
import numpy as np
goodwords = ["developer","developpeur","symfony","javascript","java","jee","php","html","html5", "application","applications"]
prefix = ["a","the","junior"]
company = ["apple", "facebook", "alibaba", "grab"]
# build a dataframe where wanted text may occur in a number of columns
df = pd.DataFrame([
{col:f"{prefix[random.randint(0, len(prefix))-1]} {goodwords[random.randint(0, len(goodwords))-1] if random.randint(0,2)<=1 else 'manager'} at {company[random.randint(0, len(company))-1]}" for col in "abcdefgh"}
for r in range(10)])
# start with a truth matrix that only contains false
matches = np.zeros(df.shape)==1
# build up trues where a goodword is in the text
for w in goodwords:
matches = matches | df.apply(lambda r: r.str.contains(w))
# spec shows only set score column if it's >=6
# score is the sum across the row of the truth matrix (True==1)
df = (df.assign(match=matches.sum(axis=1),
score=lambda dfa: np.where(dfa["match"].ge(6), dfa["match"], np.nan),
occupation=lambda dfa: np.where(dfa["match"].ge(6), "Software and application developer", "wannabe"))
.drop(columns="match"))
输出
a b c d e f g h score occupation
the java at grab junior manager at grab the html5 at apple the applications at grab junior manager at grab junior application at grab junior manager at grab junior applications at alibaba NaN wannabe
a manager at facebook junior application at grab junior manager at grab junior symfony at grab the applications at grab junior symfony at alibaba junior developer at apple a javascript at grab 6.0 Software and application developer
junior applications at apple a php at grab a manager at grab junior applications at grab junior manager at facebook a php at facebook the jee at facebook junior javascript at apple 6.0 Software and application developer
the html5 at grab a jee at apple junior html5 at apple a manager at grab a manager at apple the manager at grab the javascript at facebook the php at apple NaN wannabe
a applications at grab junior developer at grab a manager at grab the manager at alibaba a php at grab junior manager at facebook the manager at grab a javascript at apple NaN wannabe
a manager at grab junior manager at apple a manager at grab junior manager at alibaba the javascript at alibaba junior java at apple a applications at grab the manager at apple NaN wannabe
the jee at facebook the html at apple junior applications at grab junior developpeur at facebook the manager at apple the javascript at grab junior jee at grab a developpeur at facebook 7.0 Software and application developer
junior developer at alibaba the manager at facebook a jee at grab a manager at grab the manager at facebook the applications at grab a manager at alibaba junior application at grab NaN wannabe
the manager at apple junior application at alibaba the application at facebook junior manager at grab junior manager at apple junior manager at apple the manager at apple the symfony at alibaba NaN wannabe
junior html5 at apple the applications at alibaba a manager at grab junior manager at grab junior html5 at facebook junior manager at alibaba junior applications at grab junior developer at grab NaN wannabe