在加载到json(tojson)之前从pandas数据帧迭代器中删除元素



我正试图从data_json字符串中删除电子邮件和用户ID,因为它是以前提取的,我不希望这些字段列两次,使用python脚本:

credentials = service_account.Credentials.from_service_account_file('/keys/json_poc.txt')
project_id = 'myproject'
bq_conn = bigquery.Client(credentials=credentials,project=project_id)

# setup sql query
bq_sql = ("""
select email , userid, * except (email , userid) from dataset.usertable
""")
# load up dataframe
df = bq_conn.query(bq_sql).to_dataframe()
# iterate over rows
for i, row in df.iterrows():

# pull out email and userid from df
email = df.loc[i]['email']
userid = df.loc[i]['userid']

# build json data fields for iterable
data_json = df.loc[i].to_json(orient="columns") # remove fields

# json format {"email": "email@yahoo.com","dataFields": {},"userId": "99999"}
final_json = '{"email": "' + email + '","dataFields": ' + data_json + ',"userId": "' + userid + '"}'
r = requests.request(method="POST", url=url, headers=headers, data=final_json)
print('added: ' + email)

API(GET(的输出如下所示:

b'{"user":{"email":"email@yahoo.com","dataFields":{"email":"email@yahoo.com","userId":"99999","StateofResidence":"CA",......,"categoryid":99},"userId":"99999"}}'

在上面,您可以看到电子邮件和用户ID字段被列出了两次。我们将为每个用户提供大约500个属性,因此通过手动提取元素来构建json字符串是不现实的(此外,我希望能够在不更新python的情况下向源表中添加列(。

任何想法都将不胜感激。

整理好了,只需要更改以下内容:

def myconverter(o):
if isinstance(o, datetime.datetime):
return o.__str__()
bq_sql = ("""select email , userid, * except (email , userid) from dataset.usertable 
""")
df = bq_conn.query(bq_sql).to_dataframe()
for i, row in df.iterrows():
payload_dict = {"email": row["email"], "dataFields": row.drop(labels=['userid', 'email']).to_dict(), "userid": row["userid"]} 
json_payload = json.dumps(payload_dict, default = myconverter) #cast datetime

r = requests.request(method="POST", url=url, headers=headers, data=json_payload)
print(r.status_code)
print(r.content)

最新更新