我试图通过在iloc中使用"Apple"来实现这一点,但它给了我一个回溯。我知道当使用iloc时,[]中的任何内容都必须是整数,所以我如何找到像"Apple"这样的单元格
file1 = pd.read_csv('SHARADAR_SF1_aafe962511a67db10c0a72fe536305b0.csv', usecols=['ticker','datekey','assets','eps','pe','price','revenue'])
print(file1.iloc['Apple'])
错误消息:
Traceback (most recent call last):
File "C:/Users/George Adamopoulos/Desktop/All My Files/Neptune Financial Inc/The White Tiger JV/Research/20 Variables Research Code/DataReader.py", line 16, in <module>
print(file1.iloc['Apple'])
File "C:UsersGeorge AdamopoulosAnaconda3libsite-packagespandascoreindexing.py", line 1767, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "C:UsersGeorge AdamopoulosAnaconda3libsite-packagespandascoreindexing.py", line 2134, in _getitem_axis
raise TypeError("Cannot index by location index with a non-integer key")
TypeError: Cannot index by location index with a non-integer key
CSV的几行:
ticker,dimension,calendardate,datekey,lastupdated,assets,assetsavg,cashneq,debt,debtc,debtusd,divyield,deposits,eps,epsusd,equity,equityavg,liabilities,netinc,pe,price,revenue
A,ARQ,1999-12-31,2000-03-15,2020-09-01,7107000000,,1368000000,665000000,111000000,665000000,0,0,0.3,0.3,4486000000,,2621000000,131000000,,114.3,2246000000
A,ARQ,2000-03-31,2000-06-12,2020-09-01,7321000000,,978000000,98000000,98000000,98000000,0,0,0.37,0.37,4642000000,,2679000000,166000000,,66,2485000000
A,ARQ,2000-06-30,2000-09-01,2020-09-01,7827000000,,703000000,129000000,129000000,129000000,0,0,0.34,0.34,4902000000,,2925000000,155000000,46.877,61.88,2670000000
A,ARQ,2000-09-30,2001-01-17,2020-09-01,8425000000,,996000000,110000000,110000000,110000000,0,0,0.67,0.67,5265000000,,3160000000,305000000,37.341,61.94,3372000000
A,ARQ,2000-12-31,2001-03-19,2020-09-01,9208000000,,433000000,556000000,556000000,556000000,0,0,0.34,0.34,5541000000,,3667000000,154000000,21.661,36.99,2841000000
pandas.read_csv
文档令人困惑,并且在意外的IMHO中表现出行为。默认情况下,panda将从CSV文件的前几行推断标头、索引和数据类型。如果标头的单元格比第一个数据行少一个,则会假定第一列是数据帧的索引(也称为标签(。
如果标题和第一行具有相同的列计数,它将生成一个从0开始的整数索引。你的案子就是这样。数据帧有两种方法,.loc
用于获得具有给定标签(也称为索引(的行,.iloc
用于获得";第i";整数行,与索引无关。
如果你尝试file1.iloc["A"]
,那是整数索引;A";不是整数。如果您尝试file1.loc["A"]
,数据帧不会按ticker进行索引,因此它也不起作用。
解决方案是在以后读取CSV或file1 = file1.set_index("ticker")
时命名索引。
file1 = pd.read_csv('test.csv',
usecols=['ticker','datekey','assets','eps','pe','price','revenue'],
index_col="ticker")
print(file1.loc['A'])
中的结果
datekey assets eps pe price revenue
ticker
A 2000-03-15 7107000000 0.30 NaN 114.30 2246000000
A 2000-06-12 7321000000 0.37 NaN 66.00 2485000000
A 2000-09-01 7827000000 0.34 46.877 61.88 2670000000
A 2001-01-17 8425000000 0.67 37.341 61.94 3372000000
A 2001-03-19 9208000000 0.34 21.661 36.99 2841000000
这种";索引"标签"loc";以及";iloc";可能会让人分心。
使用loc
而不是iloc
,因为iloc
使用索引值(整数(,其中loc
可以使用列名,还请确保数据库中有一个名为Apple
的索引。