从带有熊猫的嵌套对象列表中删除空格



我有一个具有以下结构的数据帧:

[
{
"name": "Book1",
"details": [
{
"id": 30278752,
"isbn": " 1594634025",
"isbn13": "9781594634024    ",
"text_reviews_count": 417,
"work_reviews_count": 3313007,
"work_text_reviews_count": 109912,
"average_rating": " 3.92"
}
]
},
{
"name": "Book2",
"details": [
{
"id": 34006942,
"isbn": "  1501173219",
"isbn13": " 9781501173219   ",
"text_reviews_count": 565,
"work_reviews_count": 2142280,
"work_text_reviews_count": 75053,
"average_rating": "4.33 "
}
]
}
]

请告诉我如何删除空格(字符串前后都有空格(!

我们可以创建一个lambda函数来从字典中的字符串值中去除空格,然后在数据帧的details列上map这个函数:

strip = lambda d: {k: v.strip() if isinstance(v, str) else v for k, v in d.items()}
df['details'] = df['details'].map(lambda L: [strip(d) for d in L])

结果

>>> df.to_dict('r')
[{'name': 'Book1',
'details': [{'id': 30278752,
'isbn': '1594634025',
'isbn13': '9781594634024',
'text_reviews_count': 417,
'work_reviews_count': 3313007,
'work_text_reviews_count': 109912,
'average_rating': '3.92'}]},
{'name': 'Book2',
'details': [{'id': 34006942,
'isbn': '1501173219',
'isbn13': '9781501173219',
'text_reviews_count': 565,
'work_reviews_count': 2142280,
'work_text_reviews_count': 75053,
'average_rating': '4.33'}]}]
names = ["isbn","isbn13" ,"average_rating"]
for i in v:
for j in names:
i["details"][0][j] = i["details"][0][j].replace(" ","")

你可以试着用这种方式替换它们。

Out[17]: 
[{'name': 'Book1',
'details': [{'id': 30278752,
'isbn': '1594634025',
'isbn13': '9781594634024',
'text_reviews_count': 417,
'work_reviews_count': 3313007,
'work_text_reviews_count': 109912,
'average_rating': '3.92'}]},
{'name': 'Book2',
'details': [{'id': 34006942,
'isbn': '1501173219',
'isbn13': '9781501173219',
'text_reviews_count': 565,
'work_reviews_count': 2142280,
'work_text_reviews_count': 75053,
'average_rating': '4.33'}]}]

这是输出。V是你提到的列表

您应该将json数据放入列中。这是我的方法:

# Pull first element from list
df.loc[:, 'spread_details'] = df.details.map(lambda x: x[0])
# Spread JSON into columns
df1 = pd.json_normalize(df['spread_details'])

merged_df = pd.concat([df, df1], axis=1)
merged_df = merged_df.drop(columns=['details', 'spread_details'])
# Trim required columns
cols_to_be_trimmed = ["isbn", "isbn13"]
merged_df[cols] = merged_df[cols_to_be_trimmed].apply(lambda x: x.str.strip())
merged_df.head(2)

最新更新