所以,我试图编写的代码是解决一个月的哪几天是历史上最好的买卖股票的日子。我特别关注的股票是UVXY。我试着找出哪些日子是历史上的月最低,哪些日子是历史上的月最高,然后把它们取平均值。到目前为止,我的代码不起作用,因为在一个月的某些日子里,20日或10日不是交易日。实际的字符串会更长,有更多的日期,但我愿意使用yfinance来获得历史价格,我只是不确定它是如何工作的。谢谢!
from bs4 import BeautifulSoup
content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
Fri 09-24-2021 22.22 22.27 20.38 20.49 47101392
Thu 09-23-2021 22.52 22.63 21.32 21.48 48145436
Wed 09-22-2021 24.88 25.37 22.88 23.68 59917888
Tue 09-21-2021 26.03 28.18 25.20 25.86 73069928
Mon 09-20-2021 26.26 30.81 25.36 27.31 104578920
Fri 09-17-2021 21.56 23.58 21.33 23.48 61526336
Thu 09-16-2021 21.91 22.66 21.04 21.38 42485960
....
Wed 12-07-2016 9150.00 9390.00 8780.00 9270.00 37485
Tue 12-06-2016 9530.00 9660.00 9130.00 9210.00 27220
</pre>"""
soup = BeautifulSoup(content, "html.parser")
stuff = soup.find('pre').text
lines = stuff.split("n")
listOfStuff=[]
openPriceOfTrades=[]
closePriceOfTrades=[]
difference=[]
for line in lines:
if(line[7:9]=="20"):
closePriceOfTrades.append(line[20:-46])
if line[7:9]=="10":
openPriceOftrades.append(line[20:-46])
difference = [] # initialization of result list
for i in range(len(openPriceOfTrades)-1):
print(len(openPriceOfTrades))
difference.append(float(closePriceOfTrades[i])-float(openPriceOfTrades[i]))
print(difference)
你应该学习pandas.DataFrame
。
首先,我将删除带有<pre>
的行以只包含数据。
content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
Fri 09-24-2021 22.22 22.27 20.38 20.49 47101392
Thu 09-23-2021 22.52 22.63 21.32 21.48 48145436
Wed 09-22-2021 24.88 25.37 22.88 23.68 59917888
Tue 09-21-2021 26.03 28.18 25.20 25.86 73069928
Mon 09-20-2021 26.26 30.81 25.36 27.31 104578920
Fri 09-17-2021 21.56 23.58 21.33 23.48 61526336
Thu 09-16-2021 21.91 22.66 21.04 21.38 42485960
Wed 12-07-2016 9150.00 9390.00 8780.00 9270.00 37485
Tue 12-06-2016 9530.00 9660.00 9130.00 9210.00 27220
</pre>"""
# remove lines with `<>`
content = 'n'.join(line for line in content.split('n') if not line.startswith('<')).strip()
print(content)
然后它看起来像CSV文件,以空格作为分隔符,你可以使用io
在内存中模拟文件并读取它
import pandas as pd
import io
df = pd.read_csv(io.StringIO(content), sep='s+', names=['day', 'date', 'A', 'B', 'C', 'D', 'volumen'])
然后你可以用date-day
df['date-day'] = df['date'].str[3:5]
然后选择date-day
列中所有20
的行,计算average
(mean)
day_20 = df[ df['date-day'] == '20' ]
print(day_20.mean())
或者您可以使用groupby
在同一时间与所有天一起工作。
for value, group in df.groupby('date-day'):
print('--- date-day:', value, '---')
#print(group.mean())
print('mean "A":', group['A'].mean())
print('mean "B":', group['B'].mean())
print('mean "C":', group['C'].mean())
print('mean "D":', group['D'].mean())
完整工作代码:
content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
Fri 09-24-2021 22.22 22.27 20.38 20.49 47101392
Thu 09-23-2021 22.52 22.63 21.32 21.48 48145436
Wed 09-22-2021 24.88 25.37 22.88 23.68 59917888
Tue 09-21-2021 26.03 28.18 25.20 25.86 73069928
Mon 09-20-2021 26.26 30.81 25.36 27.31 104578920
Fri 09-17-2021 21.56 23.58 21.33 23.48 61526336
Thu 09-16-2021 21.91 22.66 21.04 21.38 42485960
Wed 12-07-2016 9150.00 9390.00 8780.00 9270.00 37485
Tue 12-06-2016 9530.00 9660.00 9130.00 9210.00 27220
</pre>"""
# remove lines with `<>`
content = 'n'.join(line for line in content.split('n') if not line.startswith('<')).strip()
import pandas as pd
import io
df = pd.read_csv(io.StringIO(content), sep='s+', names=['day', 'date', 'A', 'B', 'C', 'D', 'volumen'])
df['date-day'] = df['date'].str[3:5]
print(df)
day_20 = df[ df['date-day'] == '20' ]
print(day_20)
for value, group in df.groupby('date-day'):
print('--- date-day:', value, '---')
#print(group.mean())
print('mean "A":', group['A'].mean())
print('mean "B":', group['B'].mean())
print('mean "C":', group['C'].mean())
print('mean "D":', group['D'].mean())
结果:
day date A B C D volumen date-day
0 Fri 09-24-2021 22.22 22.27 20.38 20.49 47101392 24
1 Thu 09-23-2021 22.52 22.63 21.32 21.48 48145436 23
2 Wed 09-22-2021 24.88 25.37 22.88 23.68 59917888 22
3 Tue 09-21-2021 26.03 28.18 25.20 25.86 73069928 21
4 Mon 09-20-2021 26.26 30.81 25.36 27.31 104578920 20
5 Fri 09-17-2021 21.56 23.58 21.33 23.48 61526336 17
6 Thu 09-16-2021 21.91 22.66 21.04 21.38 42485960 16
7 Wed 12-07-2016 9150.00 9390.00 8780.00 9270.00 37485 07
8 Tue 12-06-2016 9530.00 9660.00 9130.00 9210.00 27220 06
day date A B C D volumen date-day
4 Mon 09-20-2021 26.26 30.81 25.36 27.31 104578920 20
--- date-day: 06 ---
mean "A": 9530.0
mean "B": 9660.0
mean "C": 9130.0
mean "D": 9210.0
--- date-day: 07 ---
mean "A": 9150.0
mean "B": 9390.0
mean "C": 8780.0
mean "D": 9270.0
--- date-day: 16 ---
mean "A": 21.91
mean "B": 22.66
mean "C": 21.04
mean "D": 21.38
--- date-day: 17 ---
mean "A": 21.56
mean "B": 23.58
mean "C": 21.33
mean "D": 23.48
--- date-day: 20 ---
mean "A": 26.26
mean "B": 30.81
mean "C": 25.36
mean "D": 27.31
--- date-day: 21 ---
mean "A": 26.03
mean "B": 28.18
mean "C": 25.2
mean "D": 25.86
--- date-day: 22 ---
mean "A": 24.88
mean "B": 25.37
mean "C": 22.88
mean "D": 23.68
--- date-day: 23 ---
mean "A": 22.52
mean "B": 22.63
mean "C": 21.32
mean "D": 21.48
--- date-day: 24 ---
mean "A": 22.22
mean "B": 22.27
mean "C": 20.38
mean "D": 20.49
如果您使用yfinance
,那么您将直接获得pandas.DataFrame
的数据