将json文件转换为数据帧,并从值中删除空白和换行符



我有一个json文件.json,它具有以下结构:

[
{ "name":"n     Johnn        ", "age":  "30  n ","car":"   Bmw   n   n" },
{ "name":"n     Joen        ", "age":  "20  n ","car":"    mercedes   n   n" },
{ "name":"n     Alexn        ", "age":  "18  n ","car":"      tesla   n   n" }
]

我想去掉每个值的所有空白和换行符。这是我的代码:

df = pd.read_json('a.json')
df= df.replace(r'n','',regex=True)

我删除了换行符,但没有删除空白,尽管我写了

df.columns=df.columns.str.replace(' ','')
df.columns=df.columns.str.strip()
df.columns=df.columns.str.lstrip()

我的输出:

name  age                 car
0       John           30           Bmw
1        Joe           20      mercedes
2       Alex           18         tesla

我该怎么做?

@chitown88的答案可能更快,但如果你想使用regex,你可以这样做:

df.replace('(^s+|s+$)', '', regex=True, inplace=True)

输出:

name  age       car
0  John   30       Bmw
1   Joe   20  mercedes
2  Alex   18     tesla

您可以使用pandas-applymap函数来迭代所有值

import pandas as pd
df = pd.read_json('a.json')
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print(df)

输出:

name  age       car
0  John   30       Bmw
1   Joe   20  mercedes
2  Alex   18     tesla

另一种非常相似但更紧凑的方式是:

import pandas as pd
df = pd.read_json("a.json")
df_obj = df.select_dtypes(['object'])
df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print(df)

输出:

name  age       car
0  John   30       Bmw
1   Joe   20  mercedes
2  Alex   18     tesla

一个选项是使用列表和字典理解来清理json本身:

import pandas as pd
data = [
{ "name":"n     Johnn        ", "age":  "30  n ","car":"   Bmw   n   n" },
{ "name":"n     Joen        ", "age":  "20  n ","car":"    mercedes   n   n" },
{ "name":"n     Alexn        ", "age":  "18  n ","car":"      tesla   n   n" }
]

data = [{k:v.strip() for k,v in each.items()} for each in data]
df = pd.DataFrame(data)

或者,您可以遍历每一列:

data = [
{ "name":"n     Johnn        ", "age":  "30  n ","car":"   Bmw   n   n" },
{ "name":"n     Joen        ", "age":  "20  n ","car":"    mercedes   n   n" },
{ "name":"n     Alexn        ", "age":  "18  n ","car":"      tesla   n   n" }
]

df = pd.DataFrame(data)
for col in df.columns:
df[col] = df[col].str.strip()

输出:

print(df)
name age       car
0  John  30       Bmw
1   Joe  20  mercedes
2  Alex  18     tesla

最新更新