我的输出如下:
Date Sex Race Cause of Death City state
07/17/2012 Female White 1,1-Difluoroethane Toxicity NaN NaN
10/01/2012 Male White Heroin Toxicity PORTLAND CT
04/28/2013 Male White Acute Heroin Toxicity CT(41.575155 -72.738288) NaN
04/06/2014 Male White Heroin and Cocaine Intoxication WATERBURY CT
04/27/2014 Male White Acute Heroin Intoxication NEW LONDON CT
我的城市col值为CT(41.575155 -72.738288),我想删除这一行。我该怎么做呢?
如果你认为一个城市的名字中没有数字:
>>> df[~df['City'].fillna('').str.findall(r'd').astype(bool)]
Date Sex Race Cause of Death City state
0 07/17/2012 Female White 1,1-Difluoroethane Toxicity NaN NaN
1 10/01/2012 Male White Heroin Toxicity PORTLAND CT
3 04/06/2014 Male White Heroin and Cocaine Intoxication WATERBURY CT
4 04/27/2014 Male White Acute Heroin Intoxication NEW LONDON CT
Step by Step:
# Convert NaN to '' because NaN is a number :-)
>>> df['City'].fillna('')
0
1 PORTLAND
2 CT(41.575155 -72.738288)
3 WATERBURY
4 NEW LONDON
Name: City, dtype: object
# Now find a number in the string
>>> df['City'].fillna('').str.findall(r'd')
0 []
1 []
2 [4, 1, 5, 7, 5, 1, 5, 5, 7, 2, 7, 3, 8, 2, 8, 8]
3 []
4 []
Name: City, dtype: object
# Convert to boolean. An empty list return False
>>> df['City'].fillna('').str.findall(r'd').astype(bool)
0 False
1 False
2 True
3 False
4 False
Name: City, dtype: bool
# Invert the mask with ~
>>> ~df['City'].fillna('').str.findall(r'd').astype(bool)
0 True
1 True
2 False
3 True
4 True
Name: City, dtype: bool
# Finally keep the right rows (see the answer)