我如何 1) 更改值中的部分文本(例如,', ' -> '__')和 2) 为 Python 数据帧中的缺失值赋予不同的值?

  • 本文关键字:Python 数据帧 文本 例如 python pandas dataframe
  • 更新时间 :
  • 英文 :


我将一个JSON变量转换为多个成对的变量。因此,我有了一个类似的数据集

home_city_1       home_number_1  home_city_2          home_number_2   home_city_3   home_number_3  home_city_4      home_number_4
Coeur D Alene, ID   13.0         Hayden, ID           8.0             Renton, WA     2.0           NaN               NaN
Spokane, WA         3.0          Amber, WA            2.0             NaN            NaN           NaN               NaN
Sioux Falls, SD     9.0          Stone Mountain, GA   2.0             Watertown, SD  2.0           Dell Rapids, SD   2.0
Ludowici, GA        11.0         NaN                  NaN             NaN            NaN           NaN               NaN

此数据集有600列(300*2(。

我想转换那些条件下的值:

  1. 将home_city_#列值中的"或","更改为"_"(栏下(。例如,从"Sioux Falls,SD"到"Sioux_Falls_SD">
  2. 将缺少的值转换为"m"(在home_city_#中缺少(或-1(在home_number_#中丢失(

我试过

customer_home_city_json_2 = customer_home_city_json_1.replace(',', '_')
customer_home_city_json_2 = customer_home_city_json_2 .apply(lambda x: x.replace('null', "-1"))

尝试

citys = [col  for col in df.columns if 'home_city_' in col]
numbers = [col  for col in df.columns if 'home_number_' in col]
df[citys] = df[citys].replace("s|,", "_", regex=True)
df[citys] = df[citys].fillna('m')
df[numbers] = df[numbers].fillna(-1)

要执行正确的任务,您必须获取"home_city_#"one_answers"home_number_#"的列名。这是在前两行中完成的。

为了用"_"替换" "",",我用regex=True调用replace()来使用正则表达式。s(是快捷方式(并删除所有空白,这也可以用代替。

为了填充NaN,我使用fillna并设置所需的值-1m。我建议不要在一列中混用类型。因此,我使用CCD_;数字";城市CCD_ 12。

示例

这就是你的DataFrame

home_city_1  home_number_1         home_city_2  home_number_2
0  Coeur D Alene, ID           13.0          Hayden, ID            8.0
1        Spokane, WA            3.0           Amber, WA            2.0
2    Sioux Falls, SD            9.0  Stone Mountain, GA            2.0
3       Ludowici, GA           11.0                 NaN            NaN

输出将是

home_city_1  home_number_1         home_city_2  home_number_2
0  Coeur_D_Alene__ID           13.0          Hayden__ID            8.0
1        Spokane__WA            3.0           Amber__WA            2.0
2    Sioux_Falls__SD            9.0  Stone_Mountain__GA            2.0
3       Ludowici__GA           11.0                   m           -1.0

考虑到df是数据帧的名称,您可以尝试以下操作:

city_cols = df.filter(regex='^home_city').columns
df[city_cols] = (df[city_cols]
.replace('', '-')
.replace(',', '-', regex=True)
.fillna('m'))
number_cols = df.filter(regex='^home_number').columns
df[number_cols] = df[number_cols].fillna(-1)

通过使用pandas.DataFrame.filter和regex,您可以按具有相同前缀的列进行筛选。

最新更新