我有这种格式的分钟分辨率库存数据
19980102 930 3.29473 3.30923 3.29473 3.29473 76119.2 4 0 0 0
列在哪里"日期","时间","开放","高","低","关闭","音量","分裂因子","收入","股息"
(线终止于0x0a(lineFeed)不cr lf。它看起来像pandas.read_csv cops,读取此数据的读数非常好。)
数据来自一家名为www.quantquote.com
的公司我是Pandas和Python的新手,但我已经阅读了Wes McKinney的" Python进行数据分析"。这是我的第一个Python项目,但我已经写了很多年的C,C ,C#,汇编器等代码。
我的目的是为每个股票创建一个数据框,并将日期和时间合并为时间表" dateTime"索引。然后,我打算将其重新置于每日和每月的数据范围。
我已经尝试了多种方法来获取pandas.read_csv来一步读取此信息,但我找不到一种方法来读取"时间"列。它是一个奇怪的
930,931,932 ... 17:28,17:29,17:30
格式
有更好的方法吗?如果可能
这是我到目前为止所拥有的。
from pandas import Series, DataFrame
import pandas as pd
import os
mypath = "c:PythonStuffQuantQuoteDatatable_aapl.txt"
columnHeadings = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends']
minData = pd.read_csv(mypath, names=columnHeadings, dtype= {"Date":str, "Time":str}, parse_dates = False, sep=' ')
minData["Time"] = minData["Time"].map(lambda x : x[:-2] + ":" + x[-2:] + ":00+00:00")
minData["DateTimeStr"] = minData["Date"] + " " + minData["Time"]
minData["DateTime"] = pd.to_datetime(minData["DateTimeStr"])
minData.index = minData["DateTime"]
这是" table_aapl.txt"文件
的前几行19980102 930 3.29473 3.30923 3.29473 3.29473 76119.2 4 0 0 0 19980102 931 3.29473 3.29473 3.2778 3.29473 263522 4 0 0 0 19980102 932 3.29473 >
任何帮助都非常感谢
编辑:最后,这是我可以提出的最好的解决方案。回想起来,一切都很明显:)
感谢您的帮助:)
columnHeadings = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends']
minData = pd.read_csv(
myFile,
header = None,
names = columnHeadings,
parse_dates = [["Date", "Time"]],
date_parser = lambda x: datetime.datetime.strptime(x, '%Y%m%d %H%M'),
index_col = "Date_Time",
sep=' ')
我使用您的代码遇到了问题,但是以下情况对我有用,应该对您进行一些次要调整,重要的一步是使用Strptime从字符串中创建DateTime:p>导入数据:
minData = pd.read_csv(r'c:data.txt', parse_dates = [[0,1]], header=None,sep=' ')
minData
Out[17]:
0_1 2 3 4 5 6 7 8 9 10
0 19980102 930 3.29473 3.30923 3.29473 3.29473 76119.2 4 0 0 NaN
1 19980102 931 3.29473 3.29473 3.27780 3.29473 263522.0 4 0 0 NaN
2 19980102 932 3.29473 3.29473 3.27780 3.29473 120384.0 4 0 0 NaN
3 19980102 933 3.29473 3.29473 3.26330 3.26330 82738.3 4 0 0 NaN
4 19980102 934 3.26330 3.29473 3.26330 3.27780 11169.6 4 0 0 NaN
5 19980102 935 3.29473 3.29473 3.27780 3.27780 11997.0 4 0 0 NaN
6 19980102 936 3.26330 3.29473 3.26330 3.27780 109628.0 4 0 0 NaN
[7 rows x 10 columns]
#now convert the string using datetime.datetime.strptime:
# rename the first column (may not be necessary for you)
In [20]:
minData.rename(columns={'0_1':columnHeadings[0], 1:columnHeadings[1]},inplace=True)
minData
Out[20]:
Date 2 3 4 5 6 7 8 9 10
0 19980102 930 3.29473 3.30923 3.29473 3.29473 76119.2 4 0 0 NaN
1 19980102 931 3.29473 3.29473 3.27780 3.29473 263522.0 4 0 0 NaN
2 19980102 932 3.29473 3.29473 3.27780 3.29473 120384.0 4 0 0 NaN
3 19980102 933 3.29473 3.29473 3.26330 3.26330 82738.3 4 0 0 NaN
4 19980102 934 3.26330 3.29473 3.26330 3.27780 11169.6 4 0 0 NaN
5 19980102 935 3.29473 3.29473 3.27780 3.27780 11997.0 4 0 0 NaN
6 19980102 936 3.26330 3.29473 3.26330 3.27780 109628.0 4 0 0 NaN
[7 rows x 10 columns]
# now use strptime to format the string into a datetime object
In [21]:
import datetime
minData['Date'] = minData['Date'].apply(lambda x: datetime.datetime.strptime(x, '%Y%m%d %H%M'))
minData
Out[21]:
Date 2 3 4 5 6 7 8 9
0 1998-01-02 09:30:00 3.29473 3.30923 3.29473 3.29473 76119.2 4 0 0
1 1998-01-02 09:31:00 3.29473 3.29473 3.27780 3.29473 263522.0 4 0 0
2 1998-01-02 09:32:00 3.29473 3.29473 3.27780 3.29473 120384.0 4 0 0
3 1998-01-02 09:33:00 3.29473 3.29473 3.26330 3.26330 82738.3 4 0 0
4 1998-01-02 09:34:00 3.26330 3.29473 3.26330 3.27780 11169.6 4 0 0
5 1998-01-02 09:35:00 3.29473 3.29473 3.27780 3.27780 11997.0 4 0 0
6 1998-01-02 09:36:00 3.26330 3.29473 3.26330 3.27780 109628.0 4 0 0
10
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
[7 rows x 10 columns]
#confirm that we have converted the dates:
In [22]:
minData.dtypes
Out[22]:
Date datetime64[ns]
2 float64
3 float64
4 float64
5 float64
6 float64
7 int64
8 int64
9 int64
10 float64
dtype: object
您可以设置索引:
In [24]:
minData.set_index('Date')
Out[24]:
2 3 4 5 6 7 8 9
Date
1998-01-02 09:30:00 3.29473 3.30923 3.29473 3.29473 76119.2 4 0 0
1998-01-02 09:31:00 3.29473 3.29473 3.27780 3.29473 263522.0 4 0 0
1998-01-02 09:32:00 3.29473 3.29473 3.27780 3.29473 120384.0 4 0 0
1998-01-02 09:33:00 3.29473 3.29473 3.26330 3.26330 82738.3 4 0 0
1998-01-02 09:34:00 3.26330 3.29473 3.26330 3.27780 11169.6 4 0 0
1998-01-02 09:35:00 3.29473 3.29473 3.27780 3.27780 11997.0 4 0 0
1998-01-02 09:36:00 3.26330 3.29473 3.26330 3.27780 109628.0 4 0 0
10
Date
1998-01-02 09:30:00 NaN
1998-01-02 09:31:00 NaN
1998-01-02 09:32:00 NaN
1998-01-02 09:33:00 NaN
1998-01-02 09:34:00 NaN
1998-01-02 09:35:00 NaN
1998-01-02 09:36:00 NaN
[7 rows x 9 columns]
尝试以下:
import pandas as pd
import dateutil.parser as parser
def f(d):
if len(d)==12:
d = d[:9]+"0"+d[9:]
return parser.parse(d)
columnHeadings = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends']
minData = pd.read_csv("table_aapl.txt", names=columnHeadings, sep=' ', parse_dates=[[0,1]], index_col=0, date_parser=f)
print minData
输出:
Open High Low Close Volume
Date_Time
1998-01-02 09:30:00 3.29473 3.30923 3.29473 3.29473 76119.2
1998-01-02 09:31:00 3.29473 3.29473 3.27780 3.29473 263522.0
1998-01-02 09:32:00 3.29473 3.29473 3.27780 3.29473 120384.0
1998-01-02 09:33:00 3.29473 3.29473 3.26330 3.26330 82738.3
1998-01-02 09:34:00 3.26330 3.29473 3.26330 3.27780 11169.6
1998-01-02 09:35:00 3.29473 3.29473 3.27780 3.27780 11997.0
1998-01-02 09:36:00 3.26330 3.29473 3.26330 3.27780 109628.0
Split Factor Earnings Dividends
Date_Time
1998-01-02 09:30:00 4 0 0
1998-01-02 09:31:00 4 0 0
1998-01-02 09:32:00 4 0 0
1998-01-02 09:33:00 4 0 0
1998-01-02 09:34:00 4 0 0
1998-01-02 09:35:00 4 0 0
1998-01-02 09:36:00 4 0 0
[7 rows x 8 columns]
好吧,我刚刚尝试了两个参数函数,它收到了两个值的值,所以我设法做到了:
def g(d,t):
res = []
for dd,tt in zip(d,t):
date_time = dd
if len(tt)==3:
date_time += " 0"
else:
date_time += " "
date_time += tt
res.append(parser.parse(date_time))
return pd.DatetimeIndex(res)
现在只称pd.read_csv
为pd.read_csv("table_aapl.txt", names=columnHeadings, sep=' ', parse_dates=[[0,1]], index_col=0, date_parser=g)
输出:
Open High Low Close Volume
Date_Time
1998-01-02 09:30:00 3.29473 3.30923 3.29473 3.29473 76119.2
1998-01-02 09:31:00 3.29473 3.29473 3.27780 3.29473 263522.0
1998-01-02 09:32:00 3.29473 3.29473 3.27780 3.29473 120384.0
1998-01-02 09:33:00 3.29473 3.29473 3.26330 3.26330 82738.3
1998-01-02 09:34:00 3.26330 3.29473 3.26330 3.27780 11169.6
1998-01-02 09:35:00 3.29473 3.29473 3.27780 3.27780 11997.0
1998-01-02 09:36:00 3.26330 3.29473 3.26330 3.27780 109628.0
Split Factor Earnings Dividends
Date_Time
1998-01-02 09:30:00 4 0 0
1998-01-02 09:31:00 4 0 0
1998-01-02 09:32:00 4 0 0
1998-01-02 09:33:00 4 0 0
1998-01-02 09:34:00 4 0 0
1998-01-02 09:35:00 4 0 0
1998-01-02 09:36:00 4 0 0
[8 rows x 8 columns]