在检查条件后,如何在pandas数据帧中分配项目



我正在遍历pandas数据帧(最初是一个csv文件(,并在某一列的每一行中检查特定的关键字。如果它至少出现一次,我会在分数上加1。大约有7个关键词,如果得分为>6,我想给另一列(但在这一行(的一个项目分配一个字符串(这里是"软件和应用程序开发人员"(,并确保分数安全。不幸的是,比分在任何地方都是一样的,这让我很难相信。这是我迄今为止的代码:

for row in data.iterrows():
devScore=0
if row[1].str.contains("developer").any() | row[1].str.contains("developpeur").any():
devScore=devScore+1
if row[1].str.contains("symfony").any():
devScore=devScore+1
if row[1].str.contains("javascript").any():
devScore=devScore+1
if row[1].str.contains("java").any() | row[1].str.contains("jee").any():
devScore=devScore+1
if row[1].str.contains("php").any():
devScore=devScore+1
if row[1].str.contains("html").any() | row[1].str.contains("html5").any():
devScore=devScore+1
if row[1].str.contains("application").any() | row[1].str.contains("applications").any():
devScore=devScore+1
if devScore>=6:
data["occupation"]="Software and application developer"
data["score"]=devScore

您在这里为整列分配一个常量:

data["occupation"]="Software and application developer"
data["score"]=devScore

它们应该是:

for idx, row in data.iterrows():
# blah blah
#
.
.
data.loc[idx, "occupation"]="Software and application developer"
data.loc[idx, "score"]=devScore

只需维护一个所需单词goodwords的列表,这将执行您要查找的逻辑。

import random
import numpy as np
goodwords = ["developer","developpeur","symfony","javascript","java","jee","php","html","html5", "application","applications"]
prefix = ["a","the","junior"]
company = ["apple", "facebook", "alibaba", "grab"]
# build a dataframe where wanted text may occur in a number of columns
df = pd.DataFrame([
{col:f"{prefix[random.randint(0, len(prefix))-1]} {goodwords[random.randint(0, len(goodwords))-1] if random.randint(0,2)<=1 else 'manager'} at {company[random.randint(0, len(company))-1]}" for col in "abcdefgh"}
for r in range(10)])
# start with a truth matrix that only contains false
matches = np.zeros(df.shape)==1
# build up trues where a goodword is in the text
for w in goodwords:
matches = matches | df.apply(lambda r: r.str.contains(w))
# spec shows only set score column if it's >=6
# score is the sum across the row of the truth matrix (True==1)
df = (df.assign(match=matches.sum(axis=1),
score=lambda dfa: np.where(dfa["match"].ge(6), dfa["match"], np.nan),
occupation=lambda dfa: np.where(dfa["match"].ge(6), "Software and application developer", "wannabe"))
.drop(columns="match"))

输出

a                              b                            c                               d                           e                           f                            g                               h  score                          occupation
the java at grab         junior manager at grab           the html5 at apple        the applications at grab      junior manager at grab  junior application at grab       junior manager at grab  junior applications at alibaba    NaN                             wannabe
a manager at facebook     junior application at grab       junior manager at grab          junior symfony at grab    the applications at grab   junior symfony at alibaba    junior developer at apple            a javascript at grab    6.0  Software and application developer
junior applications at apple                  a php at grab            a manager at grab     junior applications at grab  junior manager at facebook           a php at facebook          the jee at facebook      junior javascript at apple    6.0  Software and application developer
the html5 at grab                 a jee at apple        junior html5 at apple               a manager at grab          a manager at apple         the manager at grab   the javascript at facebook                the php at apple    NaN                             wannabe
a applications at grab       junior developer at grab            a manager at grab          the manager at alibaba               a php at grab  junior manager at facebook          the manager at grab           a javascript at apple    NaN                             wannabe
a manager at grab        junior manager at apple            a manager at grab       junior manager at alibaba   the javascript at alibaba        junior java at apple       a applications at grab            the manager at apple    NaN                             wannabe
the jee at facebook              the html at apple  junior applications at grab  junior developpeur at facebook        the manager at apple      the javascript at grab           junior jee at grab       a developpeur at facebook    7.0  Software and application developer
junior developer at alibaba        the manager at facebook                a jee at grab               a manager at grab     the manager at facebook    the applications at grab         a manager at alibaba      junior application at grab    NaN                             wannabe
the manager at apple  junior application at alibaba  the application at facebook          junior manager at grab     junior manager at apple     junior manager at apple         the manager at apple          the symfony at alibaba    NaN                             wannabe
junior html5 at apple    the applications at alibaba            a manager at grab          junior manager at grab    junior html5 at facebook   junior manager at alibaba  junior applications at grab        junior developer at grab    NaN                             wannabe

相关内容

最新更新