我有一个由一列字符串组成的数据帧。我想提取这些字符串的数字。但是,有些值以米为单位,有些以公里为单位。如何检测数字旁边有"m"或"km",标准化单位,然后将数字提取到新列中?
details numbers
Distance 350m
Longest straight 860m
Top speed 305km
Full throttle 61 per cent
期望输出:
details numbers
Distance 350
Longest straight 860
Top speed 305000
Full throttle 61
使用:
m = df['numbers'].str.contains('d+km')
df['numbers'] = df['numbers'].str.extract('(d+)', expand=False).astype(int)
df.loc[m, 'numbers'] *= 1000
print (df)
details numbers
0 Distance 350
1 Longest straight 860
2 Top speed 305000
3 Full throttle 61
解释:
- 通过
contains
获取km
值的掩码 - 提取整数值并按
extract
强制转换为int
- 将
km
值更正多个
编辑:对于提取float
s 值更改正则表达式extract
通过此解决方案,也最后转换为 float
s:
print (df)
details numbers
0 Distance 1.7km
1 Longest straight 860.8m
2 Top speed 305km
3 Full throttle 61 per cent
m = df['numbers'].str.contains('d+km')
df['numbers'] = df['numbers'].str.extract('(d*.d+|d+)', expand=False).astype(float)
df.loc[m, 'numbers'] *= 1000
print (df)
details numbers
0 Distance 1700.0
1 Longest straight 860.8
2 Top speed 305000.0
3 Full throttle 61.0