如何通过定义CSV行第一个单元格中的文本来获取CSV行中的所有数据



我试图通过在iloc中使用"Apple"来实现这一点,但它给了我一个回溯。我知道当使用iloc时,[]中的任何内容都必须是整数,所以我如何找到像"Apple"这样的单元格

file1 = pd.read_csv('SHARADAR_SF1_aafe962511a67db10c0a72fe536305b0.csv', usecols=['ticker','datekey','assets','eps','pe','price','revenue'])
print(file1.iloc['Apple'])

错误消息:

Traceback (most recent call last):
File "C:/Users/George Adamopoulos/Desktop/All My Files/Neptune Financial Inc/The White Tiger JV/Research/20 Variables Research Code/DataReader.py", line 16, in <module>
print(file1.iloc['Apple'])
File "C:UsersGeorge AdamopoulosAnaconda3libsite-packagespandascoreindexing.py", line 1767, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "C:UsersGeorge AdamopoulosAnaconda3libsite-packagespandascoreindexing.py", line 2134, in _getitem_axis
raise TypeError("Cannot index by location index with a non-integer key")
TypeError: Cannot index by location index with a non-integer key

CSV的几行:

ticker,dimension,calendardate,datekey,lastupdated,assets,assetsavg,cashneq,debt,debtc,debtusd,divyield,deposits,eps,epsusd,equity,equityavg,liabilities,netinc,pe,price,revenue
A,ARQ,1999-12-31,2000-03-15,2020-09-01,7107000000,,1368000000,665000000,111000000,665000000,0,0,0.3,0.3,4486000000,,2621000000,131000000,,114.3,2246000000
A,ARQ,2000-03-31,2000-06-12,2020-09-01,7321000000,,978000000,98000000,98000000,98000000,0,0,0.37,0.37,4642000000,,2679000000,166000000,,66,2485000000
A,ARQ,2000-06-30,2000-09-01,2020-09-01,7827000000,,703000000,129000000,129000000,129000000,0,0,0.34,0.34,4902000000,,2925000000,155000000,46.877,61.88,2670000000
A,ARQ,2000-09-30,2001-01-17,2020-09-01,8425000000,,996000000,110000000,110000000,110000000,0,0,0.67,0.67,5265000000,,3160000000,305000000,37.341,61.94,3372000000
A,ARQ,2000-12-31,2001-03-19,2020-09-01,9208000000,,433000000,556000000,556000000,556000000,0,0,0.34,0.34,5541000000,,3667000000,154000000,21.661,36.99,2841000000

pandas.read_csv文档令人困惑,并且在意外的IMHO中表现出行为。默认情况下,panda将从CSV文件的前几行推断标头、索引和数据类型。如果标头的单元格比第一个数据行少一个,则会假定第一列是数据帧的索引(也称为标签(。

如果标题和第一行具有相同的列计数,它将生成一个从0开始的整数索引。你的案子就是这样。数据帧有两种方法,.loc用于获得具有给定标签(也称为索引(的行,.iloc用于获得";第i";整数行,与索引无关。

如果你尝试file1.iloc["A"],那是整数索引;A";不是整数。如果您尝试file1.loc["A"],数据帧不会按ticker进行索引,因此它也不起作用。

解决方案是在以后读取CSV或file1 = file1.set_index("ticker")时命名索引。

file1 = pd.read_csv('test.csv', 
usecols=['ticker','datekey','assets','eps','pe','price','revenue'],
index_col="ticker")
print(file1.loc['A'])

中的结果

datekey      assets   eps      pe   price     revenue
ticker                                                          
A       2000-03-15  7107000000  0.30     NaN  114.30  2246000000
A       2000-06-12  7321000000  0.37     NaN   66.00  2485000000
A       2000-09-01  7827000000  0.34  46.877   61.88  2670000000
A       2001-01-17  8425000000  0.67  37.341   61.94  3372000000
A       2001-03-19  9208000000  0.34  21.661   36.99  2841000000

这种";索引"标签"loc";以及";iloc";可能会让人分心。

使用loc而不是iloc,因为iloc使用索引值(整数(,其中loc可以使用列名,还请确保数据库中有一个名为Apple的索引。

相关内容

  • 没有找到相关文章

最新更新