我将一个JSON变量转换为多个成对的变量。因此,我有了一个类似的数据集
home_city_1 home_number_1 home_city_2 home_number_2 home_city_3 home_number_3 home_city_4 home_number_4
Coeur D Alene, ID 13.0 Hayden, ID 8.0 Renton, WA 2.0 NaN NaN
Spokane, WA 3.0 Amber, WA 2.0 NaN NaN NaN NaN
Sioux Falls, SD 9.0 Stone Mountain, GA 2.0 Watertown, SD 2.0 Dell Rapids, SD 2.0
Ludowici, GA 11.0 NaN NaN NaN NaN NaN NaN
此数据集有600列(300*2(。
我想转换那些条件下的值:
- 将home_city_#列值中的"或","更改为"_"(栏下(。例如,从"Sioux Falls,SD"到"Sioux_Falls_SD">
- 将缺少的值转换为"m"(在home_city_#中缺少(或-1(在home_number_#中丢失(
我试过
customer_home_city_json_2 = customer_home_city_json_1.replace(',', '_')
customer_home_city_json_2 = customer_home_city_json_2 .apply(lambda x: x.replace('null', "-1"))
尝试
citys = [col for col in df.columns if 'home_city_' in col]
numbers = [col for col in df.columns if 'home_number_' in col]
df[citys] = df[citys].replace("s|,", "_", regex=True)
df[citys] = df[citys].fillna('m')
df[numbers] = df[numbers].fillna(-1)
要执行正确的任务,您必须获取"home_city_#"one_answers"home_number_#"的列名。这是在前两行中完成的。
为了用"_"
替换" "
和","
,我用regex=True
调用replace()
来使用正则表达式。s
(是快捷方式(并删除所有空白,这也可以用代替。
为了填充NaN,我使用fillna
并设置所需的值-1
或m
。我建议不要在一列中混用类型。因此,我使用CCD_;数字";城市CCD_ 12。
示例
这就是你的DataFrame
home_city_1 home_number_1 home_city_2 home_number_2
0 Coeur D Alene, ID 13.0 Hayden, ID 8.0
1 Spokane, WA 3.0 Amber, WA 2.0
2 Sioux Falls, SD 9.0 Stone Mountain, GA 2.0
3 Ludowici, GA 11.0 NaN NaN
输出将是
home_city_1 home_number_1 home_city_2 home_number_2
0 Coeur_D_Alene__ID 13.0 Hayden__ID 8.0
1 Spokane__WA 3.0 Amber__WA 2.0
2 Sioux_Falls__SD 9.0 Stone_Mountain__GA 2.0
3 Ludowici__GA 11.0 m -1.0
考虑到df
是数据帧的名称,您可以尝试以下操作:
city_cols = df.filter(regex='^home_city').columns
df[city_cols] = (df[city_cols]
.replace('', '-')
.replace(',', '-', regex=True)
.fillna('m'))
number_cols = df.filter(regex='^home_number').columns
df[number_cols] = df[number_cols].fillna(-1)
通过使用pandas.DataFrame.filter
和regex,您可以按具有相同前缀的列进行筛选。