如何获取列值以 2 或 3 位数字和英寸符号 (") 开头的行



我有一个df,其中的行如下:

index | text
0     | '28,3" LEDTV K98765 AB12345 EU'
1     | '65" LEDTV K98765 AB12345 EU'
2     | '55,3" LEDTV K98765 AB12345 EU'
3     | 'MON 22,8" LED U754 PL333 DE'
4     | 'DAB Radio Work 34RT55 Blue'

每台电视机以英寸为单位的尺寸("28,3"/"65"/"55,3"(开始;TV";文本中的某个位置。

我需要知道哪些产品是电视,如果是,如果它们的屏幕尺寸大于55英寸;。

在这个例子中,第1行和第2行都符合这个标准。

最终结果应该是:

index | text                            | tvandbiggerthan55
0     | '28,3" LEDTV K98765 AB12345 EU' | 0 
1     | '65" LEDTV K98765 AB12345 EU'   | 1
2     | '55,3" LEDTV K98765 AB12345 EU' | 1
3     | 'MON 22,8" LED U754 PL333 DE'   | 0
4     | 'DAB Radio Work 34RT55 Blue'    | 0

我如何一次检查整个专栏?

使用Series.str.extract获取"之前的数字,替换,并转换为浮点,因此可以通过Series.gt进行比较以获得更大值,对于第二个掩码使用Series.str.contains,对于1,0,映射使用Series.view:

m1 = (df['text'].str.extract('(d+,d+|d+)"', expand=False)
.str.replace(',','.')
.astype(float)
.gt(55))
m2 = df['text'].str.contains('TV')
df['tvandbiggerthan55'] = (m1 & m2).view('i1')
print (df)
text  tvandbiggerthan55
0  '28,3" LEDTV K98765 AB12345 EU'                  0
1    '65" LEDTV K98765 AB12345 EU'                  1
2  '55,3" LEDTV K98765 AB12345 EU'                  1
3    'MON 22,8" LED U754 PL333 DE'                  0
4     'DAB Radio Work 34RT55 Blue'                  0

尝试这个链式解决方案;

df['tvandbiggerthan55']=((df.assign(tvandbiggerthan55=
df[df.text.str.contains('^d|TV')])
['tvandbiggerthan55'].str.extract
('(^d+)')).astype(float)>=55).astype(int)
text          tvandbiggerthan55
0  28,3" LEDTV K98765 AB12345 EU                  0
1    65" LEDTV K98765 AB12345 EU                  1
2  55,3" LEDTV K98765 AB12345 EU                  1
3    MON 22,8" LED U754 PL333 DE                  0
4     DAB Radio Work 34RT55 Blue                  0

它的工作原理

# Extract df where text begins with a digit and also contains TV
df.assign(tvandbiggerthan55=df[df.text.str.contains('^d|TV')])
#modify the df above to extract RV inches
df.assign(tvandbiggerthan55=df[df.text.str.contains('^d|TV')])['tvandbiggerthan55'].str.extract('(^d+)')
# Converts the TV inches extracted above into a float and test if it is equal or greater than 55
((df.assign(tvandbiggerthan55=df[df.text.str.contains('^d|TV')])['tvandbiggerthan55'].str.extract('(^d+)')).astype(float)>=55)
# Convert the boolean from above into integers by chaining
.astype(int)

最新更新