需要帮助将嵌套数组解包为pandas数据帧



我正在运行一些代码,该代码生成一个具有以下形状(长度为1843317600-885(的数组。我需要将其解压缩到一个pandas数据帧中,该数据帧包含17列和18433个实体的数据行,每个实体都有600到885个时间序列条目。生成数组的代码如下所示。我是一个相对的蟒蛇新手,已经达到了我的技术水平。我试着用for循环打开包装,但这需要很长时间。有没有更有效的库或方法?

# Generate full monthly cash flow arrays    
# define constant input parameters
eloss = 0
weight = 1.0
prod_wt = 1.0
inv_wt = 1.0
stx_oil = 0.0795
stx_gas = 0.0795
stx_ngl = 0.0795
adval = 0
aban = 150000
# Create function for slicing the volume array and calculating the monthly cash flow
def econ_ncf_iter(r):    
econ_ncf_iter = econ_cf(index = r, uid = prop_list.loc[r, 'PROPNUM'], wi = prop_list.loc[r, 'WI'], 
nri = prop_list.loc[r, 'NRI'], roy = prop_list.loc[r, 'Royalty'], eloss = eloss, 
weight = weight, prod_wt = prod_wt, inv_wt = inv_wt, 
shrink = np.round(prop_list.loc[r, 'SHRINK'] / 100, 6), 
btu = np.round(prop_list.loc[r, 'BTU'] / 1000, 6), 
ngl_yield = np.round(prop_list.loc[r, 'NGL/GAS'], 6), 
pri_oil = np.extract(oilprice[r][0] == prop_list.loc[r, 'PROPNUM'], oilprice[r][1]),
pri_gas = np.extract(gasprice[r][0] == prop_list.loc[r, 'PROPNUM'], gasprice[r][1]),
paj_oil = prop_list.loc[r, 'PAJ_OIL'], 
paj_gas = np.extract(gasdiff[r][0] == prop_list.loc[r, 'PROPNUM'], gasdiff[r][1]), 
paj_ngl = prop_list.loc[r, 'PAJ_NGL'], stx_oil = stx_oil, stx_gas = stx_gas, stx_ngl = stx_ngl,
adval = adval, opc_fix = np.round(prop_list.loc[r, 'OPC/T'], 2), 
opc_oil = np.round(prop_list.loc[r, 'OIL_OPEX'], 2), 
opc_gas = np.round(prop_list.loc[r, 'GAS_OPEX'], 2), 
capex = np.round(prop_list.loc[r, 'CAPITAL'] * 1000, 2), aban = aban)
return econ_ncf_iter
# generate net cash flow array
econ_ncf = lambda r: econ_ncf_iter(r)
vecon_ncf = np.vectorize(econ_ncf_iter, otypes = [object])
ncf_arr_packed = vecon_ncf(R)

我想明白了,而且很容易''

ncf_pd_dflist = []
columns = ['UID', 'Month', 'Grs Oil', 'Grs Gas', 'Net Oil', 'Net Gas', 'Net NGL', 'Oil Revenue', 'Gas Revenue', 
'NGL Revenue', 'Total Revenue', 'Total Tax', 'OPEX', 'Operating Income', 'Cumulative Op CF', 'Net Cashflow',
'Cumulative Net CF']
pbar = tqdm(len(R))
for r in R:
ncf_pd_dflist.append(pd.DataFrame(np.transpose(ncf_arr_packed[r])))
pbar.update()
ncf_pd = pd.concat(ncf_pd_dflist)
ncf_pd.columns = columns
pbar.close()

"在数组中循环并创建panda数据帧列表的简单代码。循环结束后,我将数据帧列表连接到一个数据帧中。这花了大约5秒的时间完成。

尽管您已经找到了解决方案,但这里有一个没有显式循环的通用替代方案。它需要一些简单的步骤:

  • 如果所需的水平轴(中轴线(不是最后一个,请交换它们
  • 将形状重塑为水平行的2D阵列
  • 根据其他轴的笛卡尔乘积生成具有多重索引的DataFrame

假设阵列为arr:

x, y, z = arr.shape
df = pd.DataFrame(arr.swapaxes(1, 2).reshape(x*z, -1),
pd.MultiIndex.from_product([np.arange(x), np.arange(z)]))

相关内容

最新更新