我有一个json数据链接,我想将数据从中分离为两个数据帧。
我的代码如下:
import pandas as pd
import requests
pd.set_option('display.max_rows', 50000)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 10000)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
url = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
data = requests.get(url, headers=headers).json()
for x in range(len(data['records']['data'])):
print(data['records']['data'][x])
输出包含如下行,其中"CE"&"PE"数据可用:
{'strikePrice': 13900, 'expiryDate': '23-Apr-2020', 'PE': {'strikePrice': 13900, 'expiryDate': '23-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY23-04-2020PE13900.00', 'openInterest': 1597, 'changeinOpenInterest': 1589, 'pchangeinOpenInterest': 19862.5, 'totalTradedVolume': 9101, 'impliedVolatility': 110.76, 'lastPrice': 1.65, 'change': -1.5, 'pChange': -47.61904761904762, 'totalBuyQuantity': 49800, 'totalSellQuantity': 17560, 'bidQty': 40, 'bidprice': 1.75, 'askQty': 460, 'askPrice': 2.35, 'underlyingValue': 20681.45}, 'CE': {'strikePrice': 13900, 'expiryDate': '23-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY23-04-2020CE13900.00', 'openInterest': 0, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 2, 'impliedVolatility': 162.07, 'lastPrice': 6901.1, 'change': 3502.4000000000005, 'pChange': 103.05116662253218, 'totalBuyQuantity': 2620, 'totalSellQuantity': 2620, 'bidQty': 200, 'bidprice': 6629.85, 'askQty': 200, 'askPrice': 7208.75, 'underlyingValue': 20681.45}}
{'strikePrice': 13900, 'expiryDate': '30-Apr-2020', 'PE': {'strikePrice': 13900, 'expiryDate': '30-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY30-04-2020PE13900.00', 'openInterest': 989, 'changeinOpenInterest': 12, 'pchangeinOpenInterest': 1.2282497441146367, 'totalTradedVolume': 134, 'impliedVolatility': 98.26, 'lastPrice': 16.3, 'change': -4.899999999999999, 'pChange': -23.113207547169807, 'totalBuyQuantity': 32900, 'totalSellQuantity': 4100, 'bidQty': 20, 'bidprice': 16.3, 'askQty': 20, 'askPrice': 17, 'underlyingValue': 20681.45}, 'CE': {'strikePrice': 13900, 'expiryDate': '30-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY30-04-2020CE13900.00', 'openInterest': 1, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 5000, 'change': -5000, 'pChange': -100, 'totalBuyQuantity': 2640, 'totalSellQuantity': 2840, 'bidQty': 20, 'bidprice': 6242.05, 'askQty': 20, 'askPrice': 7401.65, 'underlyingValue': 20681.45}}
{'strikePrice': 13900, 'expiryDate': '14-May-2020', 'PE': {'strikePrice': 13900, 'expiryDate': '14-May-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY14-05-2020PE13900.00', 'openInterest': 0, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 0, 'change': 0, 'pChange': -100, 'totalBuyQuantity': 100, 'totalSellQuantity': 0, 'bidQty': 100, 'bidprice': 0.3, 'askQty': 0, 'askPrice': 0, 'underlyingValue': 20681.45}, 'CE': {'strikePrice': 13900, 'expiryDate': '14-May-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY14-05-2020CE13900.00', 'openInterest': 0, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 0, 'change': 0, 'pChange': -100, 'totalBuyQuantity': 2420, 'totalSellQuantity': 2420, 'bidQty': 2420, 'bidprice': 6223.45, 'askQty': 2420, 'askPrice': 7565.05, 'underlyingValue': 20681.45}}
我想存储CE&列名为的两个独立数据帧中的PE值
['strikePrice','expiryDate', 'underlying', 'identifier', 'openInterest', 'changeinOpenInterest', 'pchangeinOpenInterest', 'totalTradedVolume','impliedVolatility', 'lastPrice', 'change', 'pChange', 'totalBuyQuantity', 'totalSellQuantity', 'bidQty', 'bidprice', 'askQty', 'askPrice', 'underlyingValue']
与一些来自itertools和collections的朋友一起理解列表应该有助于将数据放入单独的数据帧:
headers = {
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
}
url = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
data = requests.get(url, headers=headers).json()
#list comprehension here
#note that it is nested, since the data we are truly keen on
#is embedded in a list
#this will give us a tuple of the key, and the dataframe of the values
#as the values are dictionaries
res = [[(key, pd.DataFrame.from_dict(value, orient='index').T)
for key, value in entry.items()
if key in ['PE', 'CE']]
for entry in data['records']['data']]
from collections import defaultdict
from itertools import chain
d = defaultdict(list)
#group the values into a pair
#they will be a combined list of dataframes
#belonging to either PE or CE
for k, v in chain.from_iterable(res):
d[k].append(v)
#now we can merge the values
#and keep our result as a dictionary
#this allows us to access PE or CE via keys
result = {key: pd.concat(values) for key, values in d.items()}
#now, we can access either PE or CE
#dataframe is quite long, so this is a small part of it
result['PE'].iloc[:3,:5]
strikePrice expiryDate underlying identifier openInterest
0 13900 23-Apr-2020 BANKNIFTY OPTIDXBANKNIFTY23-04-2020PE13900.00 1597
0 13900 30-Apr-2020 BANKNIFTY OPTIDXBANKNIFTY30-04-2020PE13900.00 989
0 13900 14-May-2020 BANKNIFTY OPTIDXBANKNIFTY14-05-2020PE13900.00 0
假设json_obj
是上面粘贴的输出,转换可以简单地使用-来完成
PE
pe = pd.DataFrame.from_dict(json_obj['PE'], orient='index').reset_index()
pe.columns = ['param', 'value'] # Optional renaming of columns
print(pe)
"""
param value
0 strikePrice 13900
1 expiryDate 23-Apr-2020
2 underlying BANKNIFTY
3 identifier OPTIDXBANKNIFTY23-04-2020PE13900.00
4 openInterest 1597
5 changeinOpenInterest 1589
6 pchangeinOpenInterest 19862.5
7 totalTradedVolume 9101
8 impliedVolatility 110.76
9 lastPrice 1.65
10 change -1.5
11 pChange -47.619
12 totalBuyQuantity 49800
13 totalSellQuantity 17560
14 bidQty 40
15 bidprice 1.75
16 askQty 460
17 askPrice 2.35
18 underlyingValue 20681.5
"""
类似地,重复上述步骤将CE
转换为数据帧。
我已经解决了以下问题。有没有其他有效的方法来获得想要的结果?
import pandas as pd
import requests
pd.set_option('display.max_rows', 50000)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 10000)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
url = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
records = requests.get(url, headers=headers).json()
df_PE = pd.DataFrame()
df_CE = pd.DataFrame()
# print(len(records['records']['data']))
for x in range(len(records['records']['data'])):
if records['records']['data'][x]['expiryDate'] == "23-Apr-2020":
try:
PE = pd.DataFrame([records['records']['data'][x]['PE']])
df_PE = df_PE.append(PE)
except Exception as err:
pass
try:
CE = pd.DataFrame([records['records']['data'][x]['CE']])
df_CE = df_CE.append(CE)
except Exception as err:
pass
print(df_PE)
print(df_CE)
这里有一个例子,展示了实现您想要的目标的一种方法:
# initialize dataframes to hold data
pe_df = pd.DataFrame()
ce_df = pd.DataFrame()
# loop through records
for x in range(len(data['records']['data'])):
# extract the record
record = data['records']['data'][x]
# print the record
print(record)
# extract the 'PE' and 'CE' data
pe_row = record['PE']
ce_row = record['CE']
# append the row to the appropriate dataframe
pe_df = pe_df.append(pd.Series(pe_row), ignore_index=True)
ce_df = ce_df.append(pd.Series(ce_row), ignore_index=True)
我假设记录数据是您在问题中显示的形式。