如何将数据分离为两个数据帧



我有一个json数据链接,我想将数据从中分离为两个数据帧。

我的代码如下:

import pandas as pd
import requests
pd.set_option('display.max_rows', 50000)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 10000)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
url = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
data = requests.get(url, headers=headers).json()
for x in range(len(data['records']['data'])):
print(data['records']['data'][x])

输出包含如下行,其中"CE"&"PE"数据可用:

{'strikePrice': 13900, 'expiryDate': '23-Apr-2020', 'PE': {'strikePrice': 13900, 'expiryDate': '23-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY23-04-2020PE13900.00', 'openInterest': 1597, 'changeinOpenInterest': 1589, 'pchangeinOpenInterest': 19862.5, 'totalTradedVolume': 9101, 'impliedVolatility': 110.76, 'lastPrice': 1.65, 'change': -1.5, 'pChange': -47.61904761904762, 'totalBuyQuantity': 49800, 'totalSellQuantity': 17560, 'bidQty': 40, 'bidprice': 1.75, 'askQty': 460, 'askPrice': 2.35, 'underlyingValue': 20681.45}, 'CE': {'strikePrice': 13900, 'expiryDate': '23-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY23-04-2020CE13900.00', 'openInterest': 0, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 2, 'impliedVolatility': 162.07, 'lastPrice': 6901.1, 'change': 3502.4000000000005, 'pChange': 103.05116662253218, 'totalBuyQuantity': 2620, 'totalSellQuantity': 2620, 'bidQty': 200, 'bidprice': 6629.85, 'askQty': 200, 'askPrice': 7208.75, 'underlyingValue': 20681.45}}
{'strikePrice': 13900, 'expiryDate': '30-Apr-2020', 'PE': {'strikePrice': 13900, 'expiryDate': '30-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY30-04-2020PE13900.00', 'openInterest': 989, 'changeinOpenInterest': 12, 'pchangeinOpenInterest': 1.2282497441146367, 'totalTradedVolume': 134, 'impliedVolatility': 98.26, 'lastPrice': 16.3, 'change': -4.899999999999999, 'pChange': -23.113207547169807, 'totalBuyQuantity': 32900, 'totalSellQuantity': 4100, 'bidQty': 20, 'bidprice': 16.3, 'askQty': 20, 'askPrice': 17, 'underlyingValue': 20681.45}, 'CE': {'strikePrice': 13900, 'expiryDate': '30-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY30-04-2020CE13900.00', 'openInterest': 1, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 5000, 'change': -5000, 'pChange': -100, 'totalBuyQuantity': 2640, 'totalSellQuantity': 2840, 'bidQty': 20, 'bidprice': 6242.05, 'askQty': 20, 'askPrice': 7401.65, 'underlyingValue': 20681.45}}
{'strikePrice': 13900, 'expiryDate': '14-May-2020', 'PE': {'strikePrice': 13900, 'expiryDate': '14-May-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY14-05-2020PE13900.00', 'openInterest': 0, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 0, 'change': 0, 'pChange': -100, 'totalBuyQuantity': 100, 'totalSellQuantity': 0, 'bidQty': 100, 'bidprice': 0.3, 'askQty': 0, 'askPrice': 0, 'underlyingValue': 20681.45}, 'CE': {'strikePrice': 13900, 'expiryDate': '14-May-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY14-05-2020CE13900.00', 'openInterest': 0, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 0, 'change': 0, 'pChange': -100, 'totalBuyQuantity': 2420, 'totalSellQuantity': 2420, 'bidQty': 2420, 'bidprice': 6223.45, 'askQty': 2420, 'askPrice': 7565.05, 'underlyingValue': 20681.45}}

我想存储CE&列名为的两个独立数据帧中的PE值

['strikePrice','expiryDate', 'underlying', 'identifier', 'openInterest', 'changeinOpenInterest', 'pchangeinOpenInterest', 'totalTradedVolume','impliedVolatility', 'lastPrice', 'change', 'pChange', 'totalBuyQuantity', 'totalSellQuantity', 'bidQty', 'bidprice', 'askQty', 'askPrice', 'underlyingValue']

与一些来自itertools和collections的朋友一起理解列表应该有助于将数据放入单独的数据帧:

headers = {
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
}
url = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
data = requests.get(url, headers=headers).json()
#list comprehension here
#note that it is nested, since the data we are truly keen on
#is embedded in a list
#this will give us a tuple of the key, and the dataframe of the values
#as the values are dictionaries
res = [[(key, pd.DataFrame.from_dict(value, orient='index').T)
for key, value in entry.items()
if key in ['PE', 'CE']]
for entry in data['records']['data']]
from collections import defaultdict
from itertools import chain
d = defaultdict(list)
#group the values into a pair
#they will be a combined list of dataframes
#belonging to either PE or CE    
for k, v in chain.from_iterable(res):
d[k].append(v)
#now we can merge the values
#and keep our result as a dictionary
#this allows us to access PE or CE via keys   
result = {key: pd.concat(values) for key, values in d.items()}
#now, we can access either PE or CE
#dataframe is quite long, so this is a small part of it
result['PE'].iloc[:3,:5]

strikePrice    expiryDate  underlying  identifier                openInterest
0   13900      23-Apr-2020  BANKNIFTY   OPTIDXBANKNIFTY23-04-2020PE13900.00 1597
0   13900      30-Apr-2020  BANKNIFTY   OPTIDXBANKNIFTY30-04-2020PE13900.00 989
0   13900      14-May-2020  BANKNIFTY   OPTIDXBANKNIFTY14-05-2020PE13900.00 0

假设json_obj是上面粘贴的输出,转换可以简单地使用-来完成

PE

pe = pd.DataFrame.from_dict(json_obj['PE'], orient='index').reset_index()
pe.columns = ['param', 'value'] # Optional renaming of columns
print(pe)
"""
param                                value
0             strikePrice                                13900
1              expiryDate                          23-Apr-2020
2              underlying                            BANKNIFTY
3              identifier  OPTIDXBANKNIFTY23-04-2020PE13900.00
4            openInterest                                 1597
5    changeinOpenInterest                                 1589
6   pchangeinOpenInterest                              19862.5
7       totalTradedVolume                                 9101
8       impliedVolatility                               110.76
9               lastPrice                                 1.65
10                 change                                 -1.5
11                pChange                              -47.619
12       totalBuyQuantity                                49800
13      totalSellQuantity                                17560
14                 bidQty                                   40
15               bidprice                                 1.75
16                 askQty                                  460
17               askPrice                                 2.35
18        underlyingValue                              20681.5
"""

类似地,重复上述步骤将CE转换为数据帧。

我已经解决了以下问题。有没有其他有效的方法来获得想要的结果?

import pandas as pd
import requests
pd.set_option('display.max_rows', 50000)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 10000)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
url = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
records = requests.get(url, headers=headers).json()
df_PE = pd.DataFrame()
df_CE = pd.DataFrame()
# print(len(records['records']['data']))
for x in range(len(records['records']['data'])):
if records['records']['data'][x]['expiryDate'] == "23-Apr-2020":
try:
PE = pd.DataFrame([records['records']['data'][x]['PE']])
df_PE = df_PE.append(PE)
except Exception as err:
pass
try:
CE = pd.DataFrame([records['records']['data'][x]['CE']])
df_CE = df_CE.append(CE)
except Exception as err:
pass
print(df_PE)
print(df_CE)

这里有一个例子,展示了实现您想要的目标的一种方法:

# initialize dataframes to hold data
pe_df = pd.DataFrame()
ce_df = pd.DataFrame()
# loop through records
for x in range(len(data['records']['data'])):
# extract the record
record = data['records']['data'][x]
# print the record
print(record)
# extract the 'PE' and 'CE' data
pe_row = record['PE']
ce_row = record['CE']
# append the row to the appropriate dataframe
pe_df = pe_df.append(pd.Series(pe_row), ignore_index=True)
ce_df = ce_df.append(pd.Series(ce_row), ignore_index=True)

我假设记录数据是您在问题中显示的形式。

最新更新