以下代码在 spyder 中工作
import re
price_num = []
for row in df['price']:
price_no_nonnum = re.sub('[^0-9]','', row) # this code line works in spyder
price_num.append(int(price_no_nonnum))
在 Jupyter 笔记本中,我收到错误
import re
price_num = []
for row in df['price']:
price_no_nonnum = re.sub('[^0-9]','', row) # this code line gives an error in jupyter
price_num.append(int(price_no_nonnum))
Jupyter 中出现以下错误
TypeErrorTraceback (most recent call last)
<ipython-input-13-b3f4fcbe9d89> in <module>()
3 price_num = []
4 for row in autos['price']:
----> 5 price_no_nonnum = re.sub("[^0-9]","", row)
6 price_num.append(int(price_no_nonnum))
7
/dataquest/system/env/python3/lib/python3.4/re.py in sub(pattern, repl, string, count, flags)
177 a callable, it's passed the match object and must return
178 a replacement string to be used."""
--> 179 return _compile(pattern, flags).sub(repl, string, count)
180
181 def subn(pattern, repl, string, count=0, flags=0):
TypeError: expected string or buffer
我的猜测是row
不是一个字符串,而是一些特定于 Pandas 的数据类型。您可以尝试此操作并完全避免使用正则表达式:
price_num = []
for row in df['price']:
try:
price = int(row)
except ValueError:
pass
else:
price_no_nonnum = ''.join(c for c in str(row) if c.isdigit())
price = int(price_no_nonnum)
price_num.append(price)