如何对一列CSV中的子集求和并用Python编写(每小时的数据行相加为天数据/8000+行相加为365)



我是Python的新手,用它来研究。我需要获取一堆文件(文件波纹管示例,前 49 行(使它们更短,因为我想合并两个csv文件,一个文件有一年中每一天的数据一次,另一个提供类似的数据,但行是每小时(每天 24 次,365 天(。我从 SO 尝试了很多东西,但似乎我缺乏知识使我无法成功组合多个命令(或数据类型?

我的代码:

data = pd.read_csv('HourlySurfaceEmissions.csv', header=0)
i = data['Total CH4 oxidized in Cover (g/m2/day)'].count()
g = 0
for h in range (i):
for j in range (24):
g=g+data.iloc[j,3]
l=data.iloc[j,0]
if j==24:
data.append(g)

也尝试过这样:

test_list=pd.read_csv("HourlySurfaceEmissions.csv")
res = [ sum(test_list[x : x + 24])  
for x in range(0, len(test_list), 24)]

前 49 行(标题和 2x24 小时=2 天/DOY(的示例如下:

DOY,Surface emission with oxidation (g/m2/day),Surface emissions without oxidation(g/m2/day),Total CH4 oxidized in Cover (g/m2/day)
0.006944444444444444,0.0,0.009640456293691613,-11.050865124798417
0.048611111111111105,0.0,0.00965194619432311,-11.076678943428105
0.09027777777777778,0.0,0.009670805122605135,-11.109966479947506
0.1319444444444444,0.0,0.00968945315690706,-11.14340896370453
0.17361111111111105,0.0,0.009705649616079596,-11.174967827473246
0.2152777777777778,0.0,0.009717268095524405,-11.203014257215516
0.25694444444444453,0.0,0.009722552966172965,-11.225477458228605
0.29861111111111127,0.0,0.009724124256746654,-11.24121710215802
0.34027777777777796,0.0,0.009721011637697558,-11.249792693463574
0.3819444444444445,0.0,0.009710336075834235,-11.25189835695853
0.423611111111111,0.0,0.009693294362800385,-11.24758026308563
0.4652777777777775,0.0,0.009671063350622646,-11.236969394828964
0.5069444444444441,0.0,0.009645088049109159,-11.220564696750134
0.5486111111111106,0.0,0.009617185341758622,-11.199264953875893
0.5902777777777771,0.0,0.009589224265734546,-11.174114980089575
0.6319444444444436,0.0,0.009563049606687848,-11.146251643323737
0.6736111111111102,0.0,0.009540407718328061,-11.116685043446322
0.7152777777777771,0.0,0.009523258672310022,-11.08680098084354
0.7569444444444441,0.0,0.009512384905543625,-11.057545055124129
0.798611111111111,0.0,0.009508670518647058,-11.029278794405451
0.840277777777778,0.0,0.009512661727449441,-11.002881748121855
0.881944444444445,0.0,0.00952351344623122,-10.97773709236616
0.9236111111111119,0.0,0.00954086056094301,-10.953696342508493
0.9652777777777789,0.0,0.009563077061452775,-10.930675397066185
1.0069444444444458,0.0,0.009589258645691398,-10.908521124174303
1.0486111111111127,0.0,0.009612332930178632,-10.888018955865018
1.0902777777777797,0.0,0.009633980489113781,-10.865257726415996
1.1319444444444466,0.0,0.00965520708644411,-10.840145827335935
1.1736111111111136,0.0,0.00967502739071609,-10.81119149049327
1.2152777777777806,0.0,0.009688774630252922,-10.778337840301566
1.2569444444444475,0.0,0.00969569215820134,-10.74019647299494
1.2986111111111145,0.0,0.00969592668116943,-10.696893492706971
1.3402777777777812,0.0,0.00968931970890368,-10.648206749669301
1.3819444444444473,0.0,0.00967607201768951,-10.594140416915286
1.4236111111111134,0.0,0.009656874404941456,-10.535050797245228
1.4652777777777795,0.0,0.009632855346341005,-10.471615970651886
1.5069444444444455,0.0,0.009605489625697603,-10.404775922333677
1.5486111111111116,0.0,0.009576486960140714,-10.33565555817896
1.5902777777777777,0.0,0.00954766999289027,-10.265477645235027
1.6319444444444438,0.0,0.00952105413826517,-10.195657825444055
1.6736111111111098,0.0,0.009497910541739782,-10.126978222694369
1.715277777777776,0.0,0.009480190056013282,-10.060470694066083
1.756944444444442,0.0,0.009469075701263357,-9.997509432663001
1.798611111111108,0.0,0.009464933381231574,-9.938139854922564
1.8402777777777741,0.0,0.009468689077078025,-9.883065879435845
1.8819444444444402,0.0,0.009479602575435297,-9.831948683825386
1.9236111111111063,0.0,0.009497056058672237,-9.784327743717228
1.9652777777777724,0.0,0.009520302331279774,-9.739751373553043

预期结果:

1,0.0,0.228,-9.739,264.456
2,0.0,0.227,-9.539,264.356
3,0.0,0.229,-9.839,264.256

我编造了这些,但大致上是我应该得到的。 请帮忙。

如果我理解正确,您只想每 24 行求和一次,因此以下代码将产生所需的结果

slim = df.groupby(df.index // 24).sum()
print(slim)
DOY  ...  Total CH4 oxidized in Cover (g/m2/day)
0  11.666667  ...                             -267.107334
1  35.666667  ...                             -249.341336

我们可以看到结果符合预期。

slim.iloc[0]
DOY                                               11.666667
Surface emission with oxidation (g/m2/day)         0.000000
Surface emissions without oxidation(g/m2/day)      0.230957
Total CH4 oxidized in Cover (g/m2/day)          -267.107334
Name: 0, dtype: float64

最后,让我们也更改索引,如您的 OP 所示。

slim.index = range(1, len(slim) + 1)
DOY  ...  Total CH4 oxidized in Cover (g/m2/day)
1  11.666667  ...                             -267.107334
2  35.666667  ...                             -249.341336
In [3]: data.head(4)                                                                                                                                                        
Out[3]:
DOY  Surface emission with oxidation (g/m2/day)  Surface emissions without oxidation(g/m2/day)  Total CH4 oxidized in Cover (g/m2/day)
0  0.006944                                         0.0                                       0.009640                              -11.050865
1  0.048611                                         0.0                                       0.009652                              -11.076679
2  0.090278                                         0.0                                       0.009671                              -11.109966
3  0.131944                                         0.0                                       0.009689                              -11.143409
4  0.173611                                         0.0                                       0.009706                              -11.174968
In [4]: data.index = data.index.values.astype('timedelta64[h]')
In [5]: data.resample('D').sum()
Out [5]: 
DOY  Surface emission with oxidation (g/m2/day)  Surface emissions without oxidation(g/m2/day)  Total CH4 oxidized in Cover (g/m2/day)
0 days  11.666667                                         0.0                                       0.230957                             -267.107334
1 days  35.666667                                         0.0                                       0.230030                             -249.341336

最新更新