拆分字符串并将区域设置应用于Pandas系列的每一行



我想对以下df:的amount列进行两次转换

Address                                         type    amount
0   0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow 250,000 VSO
1   0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow 250,000 VSO
2   0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow 250,000 VSO
  1. 我想从所有行中剪切"VSO"子字符串
  2. 我想将locale.setlocale(locale.LC_ALL, 'en_us')应用于每一行,将每一个字符串变成一个遵循该格式的浮点值

我的当前代码是:

locale.setlocale(locale.LC_ALL, 'en_us')
df_test['amount'].str.split(' VSO')[0]
locale.atof((str(df_test['amount'].values)))

这给了我一个错误:

ValueError: could not convert string to float: "['250000 VSO' '250000 VSO' '250000 VSO' '33333 VSO' '33333 VSO'n '10400000 VSO' '170833 VSO' '170833 VSO' '170833 VSO' '170833 VSO'n

删除尾部"后尝试使用apply;VSO";带rstrip:

import locale
locale.setlocale(locale.LC_ALL, 'en_us')
df["amount"] = df["amount"].str.rstrip(" VSO").apply(locale.atof)
>>> df
Address     type    amount
0  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow  250000.0
1  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow  250000.0
2  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow  250000.0

我认为@not_speshal完美地回答了这个问题
在字符串略有变化的情况下(例如VSO发生了变化),我们可以使用以下regex:

>>> df['amount'] = df.amount.str.extract(r"(d+,d+|d+)")[0].str.replace(',', '').astype(float)
>>> df
Address                                     type        amount
0   0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow     250000.0
1   0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow     250000.0
2   0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow     250000.0