无法将提取的json插入到数据帧列中



我有一个DataFrame,它有一个json数组作为一列的值。我想选择其中一个作为列的值,并去掉其余的。我已经将所需的值放入一个序列中,但我不知道如何将它们连接回DataFrame,以取代现有的列:

import json
from pandas.io.json import json_normalize
df = pd.DataFrame({
'bank_account': [101, 102, 201, 301],
'data': [
'{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"}',
'{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0},  "fees": {"monthly": 0}}, "user_name": "Alice"}',
'{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Bob"}',        
'{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Carol"}'        
]},
index = ['Alice', 'Alice', 'Bob', 'Carol']
)
lst = []
for d in df['data']:
d = pd.read_json(d, lines=True)['uid'].values[0]
lst.append(d)
s = pd.DataFrame(lst)
df['data'] = s
print(s)
print(df)  

返回

0
0  100
1  100
2  200
3  300
bank_account  data
Alice           101   NaN
Alice           102   NaN
Bob             201   NaN
Carol           301   NaN

目前,我不知道为什么数据列显示所有nan值。感谢您的帮助。

更新的问题:有些行有json数组列表,而不是只有一个。以下是我目前所拥有的:

import json
from pandas.io.json import json_normalize
df = pd.DataFrame({
'bank_account': [101, 102, 201, 301],
'data': [
'[{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"},{"uid": 150, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "jer"}]',
'{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0},  "fees": {"monthly": 0}}, "user_name": "Alice"}',
'{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Bob"}',        
'{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Carol"}'        
]},
index = ['Alice', 'Alice', 'Bob', 'Carol']
)
# df["data"] = df["data"].apply(lambda x: pd.read_json(x, lines=True)["uid"][0])
df["data"] = df["data"].apply(lambda array : (",".join(list(map(lambda x : pd.read_json(x, lines=True)["uid"][0], array),(df['data'])))))
print(df)

这对我有效:

df = pd.DataFrame({
'bank_account': [101, 102, 201, 301],
'data': [
'{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"}',
'{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0},  "fees": {"monthly": 0}}, "user_name": "Alice"}',
'{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Bob"}',        
'{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0},  "fees": {"monthly": 13.5}}, "user_name": "Carol"}'        
]},
index = ['Alice', 'Alice', 'Bob', 'Carol']
)
df["data"] = df["data"].apply(lambda x: pd.read_json(x, lines=True)["uid"][0])

您的代码不起作用,因为dfs具有不同的索引。如果要在两个打印语句之前修复代码集df['data'] = s[0].values(而不是df['data'] = s(。

正如@rachwa所指出的,问题是索引不匹配,因为s的索引是数字,而df的索引是名称。如果您直接分配lst,而不是将其投射到DataFrame,您将获得所需的结果,即

df['data'] = lst

将按预期工作。

您也可以使用json.loads而不是read_json(它应该更快(:

import json
df['data'] = [json.loads(d)['uid'] for d in df['data']]

输出:

bank_account  data
Alice           101   100
Alice           102   100
Bob             201   200
Carol           301   300

最新更新