-香农熵



我编写了一个简短的代码来计算股票的日志收益和数据的香农熵。然而,我得到的香农熵是负值,这非常奇怪。我用S=-plogp。p不是离散区间会有问题吗?如何将p划分为区间,使熵计算为S = - SUM_k(pklogpk)?

import yfinance as yf
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.stats import norm

plot_lreturnshist = False
plot_lreturns = True
#Import the data from yfinance. What Ticker, what period of time we want
AAPL = yf.Ticker("AAPL")
history = AAPL.history(period = "5y")
#Extract only the close data
Close = history["Close"]

#Set up a recurrence to add a column in our dataframe for the logarithmic returns of the stock
#Log returns are calculated as log_2(Close(day x)/Close(day x-1))  
logreturn = []
for i in range(len(Close)):
if i == 0:
logreturn.append(0) 
else:
x = np.log2(abs(Close[i]))-np.log2(abs(Close[i-1]))
logreturn.append(x)
#Now we have an array with the logarithmic returns, we add it to the pandas dataframe
history["logreturn"] = logreturn
#We then pull it out for ease of use
lreturn = history["logreturn"]
if plot_lreturns == True:
fig,ax = plt.subplots()
ax.plot(lreturn, color = "dodgerblue")  

#We plot the data in a histogram, by 
if plot_lreturnshist == True:
mu, std = norm.fit(lreturn)
plt.hist(lreturn, bins=50, density=True, alpha=0.6, color='g', ec = 'black')

xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)
plt.plot(x, p, 'k', linewidth=2)
title = r"Fit results: $mu$ = $%.2f$,  $sigma$ = $%.2f$" % (mu, std)
plt.title(title)
plt.xlabel(r"$ln(Y_{t+1}/Y_t$)")
plt.show()
mu, std = norm.fit(lreturn)
p = norm.pdf(x, mu, std)
S = np.sum(-p*np.log(p))
print("S")

我制作了一个基于移动体积直方图的熵指标作为proba输入,我也得到了负值。在热力学中,负熵意味着你获得热量,所以它可能意味着市场活动增加,但它没有告诉你方向。

你可以在我的指示器库@ github中找到我的指示器尝试。它就叫做"熵">

编辑:根据你的评论,我修改了我的熵函数,现在它给出了正值

def entropy(c_close, c_volume, period, bins=2):
size = len(c_close)
out = np.array([np.nan] * size)
# ROLLING WINDOW
for i in range(period - 1, size):
e = i + 1
s = e - period
close_w = c_close[s:e]
volume_w = c_volume[s:e]
# HISTO BASED ON CLOSE / VOLUME
min_w = np.min(close_w)
norm = 1.0 / (np.max(close_w) - min_w)
sum_h = np.array([0.0] * bins)
for j in range(period):
sum_h[int((close_w[j] - min_w) * bins * norm)] += volume_w[j] ** 2
count = np.sqrt(sum_h)
# NORMALIZE HISTO COUNT (CONVERT TO PROBA)
count = count / sum(count)
# DELETE PROBAS = 0 TO AVOID GAPS
count = count[np.nonzero(count)]
# ENTROPY 
out[i] = -sum(count * np.log2(count))
return out

相关内容

  • 没有找到相关文章

最新更新