我正在尝试根据给定条件(多个条件)创建熊猫数据帧列



我使用了两个不同的代码来解决这个问题:1.我在数据帧内使用了if条件。2.我尝试使用这些功能。

我得到了syntaxerror: invalid syntax.

我仍然是使用 Pyton 的初学者。

第一种方法:

<df['hours_week'] = ['less_than_40' if x < 40 'between_40_and_45' elif x > 40 and x <= 45 'between_40_and_60' elif x >45 and x <= 60 'between_60_and_80' elif x >60 and x <=80 else 'more_than_80' for x in df['hours_per_week']]>

第二种方法:

<def set_value(x):
     for x in df['hours_per_week']:
         if x < 40:
             t == print " less_than_40"
         elif (x > 40 and x <= 45):
             t == print "between_40_and_45"
         elif(x>45 and x <=60):
             t == print "between_40_and_45"
         elif(x>60 and x <= 80):
             t == print "between_60_and_80"
         else:
             t == print "more_than_80"
         return t
df['hours_week'] = df['hours_per_week'].apply(set_value,args=())

这是Tm通过第一种方法得到的:

 File "<ipython-input-36-e90a4b2f98cc>", line 1
    df['hours_week'] = ['less_than_40' if x < 40 'between_40_and_45' elif x > 40 and x <= 45 'between_40_and_60' elif x >45 and x <= 60 'between_60_and_80' elif x >60 and x <=80 else 'more_than_80' for x in df['hours_per_week']]
                                                                   ^
SyntaxError: invalid syntax

使用第二种方法:

 File "<ipython-input-44-0a5dc69b4a15>", line 4
    t == print " less_than_40"
                             ^
SyntaxError: invalid syntax

pandas中,我们通常使用pd.cut

df['hours_week']=pd.cut(df['hours_per_week'],bins=[-np.inf,40,45,60,80,np.inf])

您也可以在此处添加标签,labels=['less_than_40','between_40_and_45'....]

您还可以使用搜索排序:

bins = pd.Series([40, 45, 60, 80])
labels = ['less_than_40', 'between_40_and_45', 'between_45_and_60', 
          'between_60_and_80', 'more_than_80']
df['hours_week'] = df['hours_per_week'].map(lambda x: labels[bins.searchsorted(x)])

第一个标签实际上应该是"less_than_or_equal_to_40"。

最新更新