处理numpy数组和数据框列



我有以下数据框架:

dates,values
2014-10-01 00:00,10.606
2014-10-01 01:00,10.595
2014-10-01 02:00,10.583
2014-10-01 03:00,10.572
2014-10-01 04:00,10.56
2014-10-01 05:00,10.564
2014-10-01 06:00,10.65
2014-10-01 07:00,10.801
2014-10-01 08:00,10.977
2014-10-01 09:00,11.316
2014-10-01 10:00,11.88
2014-10-01 11:00,12.427
2014-10-01 12:00,12.751
2014-10-01 13:00,12.863
2014-10-01 14:00,12.823
2014-10-01 15:00,12.686
2014-10-01 16:00,12.499
2014-10-01 17:00,12.293
2014-10-01 18:00,12.086
2014-10-01 19:00,11.89
2014-10-01 20:00,11.712
2014-10-01 21:00,11.552
2014-10-01 22:00,11.413
2014-10-01 23:00,11.292
2014-10-02 00:00,11.188
2014-10-02 01:00,11.1

假设我想选择与特定日期相关的所有数据。在这种情况下。例如2014-10-01。这些是在我的代码中使用的操作:

dfr       =  pd.read_csv(f_name, parse_dates=True,index_col=0,
infer_datetime_format=True)
yy  = dfr [dfr.index.floor('D')  == ' 2014-10-01 00:00:00'].to_numpy()

这是我得到的:

array([[10.606],
[10.595],
[10.583],
[10.572],
[10.56 ],
[10.564],
[10.65 ],
[10.801],
[10.977],
[11.316],
[11.88 ],
[12.427],
[12.751],
[12.863],
[12.823],
[12.686],
[12.499],
[12.293],
[12.086],
[11.89 ],
[11.712],
[11.552],
[11.413],
[11.292]])

但是,我希望yy以以下形式表示:

array([10.606,10.595,10.583,10.572,10.56 ,10.564,10.65 ,10.801,10.977, 11.316,11.88 ,12.427,12.751,12.863,12.823,12.686,12.499,12.293,12.086,11.89 ,11.712,11.552,11.413,11.292])

实际上,我必须使用另一个向量xx,它是:

xx=array([ 2.91833891,  2.84972246,  0.50386982,  5.35302713,  4.81822114,
3.33330121,  5.63819964, 11.20447123, 12.98512414,  9.95449998,
5.78945234,  9.90594599,  1.25708361,  3.02603884,  1.02683686,
3.84912813,  1.55641116, 13.04097404,  9.6277124 , 10.73849736,
5.39958019,  3.43633323, 13.5965677 ,  7.31914519])

这将帮助我使用np。和等等,不处理循环

Thanks in advance

useloc:

yy=dfr.loc[dfr.index.floor('D')  == ' 2014-10-01 00:00:00','values'].to_numpy()

使用flatten():

yy=dfr[dfr.index.floor('D')  == ' 2014-10-01 00:00:00'].to_numpy().flatten()
#yy=dfr[dfr.index.floor('D')  == ' 2014-10-01 00:00:00'].to_numpy().ravel()

另一种解决方案,使用df.loc只选择一列:

yy = dfr.loc[
dfr.index.floor("D") == " 2014-10-01 00:00:00", "values"
].to_numpy()
print(yy)

打印:

[10.606 10.595 10.583 10.572 10.56  10.564 10.65  10.801 10.977 11.316
11.88  12.427 12.751 12.863 12.823 12.686 12.499 12.293 12.086 11.89
11.712 11.552 11.413 11.292]

实际上你需要的是一个Series而不是DataFrame:

  1. 在文件级,使用squeeze=True参数读取csv:
dfr = pd.read_csv(f_name, parse_dates=True,index_col=0,
infer_datetime_format=True, squeeze=True)
  1. 使用numpyravel函数:
>>> dfr[dfr.index.floor('D')  == ' 2014-10-01 00:00:00'].to_numpy().ravel()
array([10.606, 10.595, 10.583, 10.572, 10.56 , 10.564, 10.65 , 10.801,
10.977, 11.316, 11.88 , 12.427, 12.751, 12.863, 12.823, 12.686,
12.499, 12.293, 12.086, 11.89 , 11.712, 11.552, 11.413, 11.292])
  1. 使用@AnuragDabas或@AndrejKesely提出的解决方案之一

最新更新