Pandas系列重采样



我有以下熊猫系列:

dummy_array = pd.Series(np.array(range(-10, 11)), index=(np.array(range(0, 21))/10))

这产生了以下阵列:

0.0   -10
0.1    -9
0.2    -8
0.3    -7
0.4    -6
0.5    -5
0.6    -4
0.7    -3
0.8    -2
0.9    -1
1.0     0
1.1     1
1.2     2
1.3     3
1.4     4
1.5     5
1.6     6
1.7     7
1.8     8
1.9     9
2.0    10

如果我想重新采样,我该怎么做?我读了文件,它建议:

dummy_array.resample('20S').mean()

但它不起作用。有什么想法吗?

谢谢。

编辑:

我希望我的最终矢量的频率是原来的两倍。这样的东西:

0.0   -10
0.05   -9.5
0.1    -9
0.15    -8.5
0.2    -8
0.25    -7.5
etc.

以下是使用np.linspace().reindex()interpolate的解决方案:

如上所述创建数据帧CCD_ 4。

# get properties of original index
start = dummy_array.index.min()
end = dummy_array.index.max()
num_gridpoints_orig = dummy_array.index.size
# calc number of grid-points in new index
num_gridpoints_new = (num_gridpoints_orig  * 2) - 1 
# create new index, with twice the number of grid-points (i.e., smaller step-size)
idx_new = np.linspace(start, end, num_gridpoints_new)
# re-index the data frame.  New grid-points have value of NaN,
# and we replace these NaNs with interpolated values
df2 = dummy_array.reindex(index=idx_new).interpolate()
print(df2.head())
0.00   -10.0
0.05    -9.5
0.10    -9.0
0.15    -8.5
0.20    -8.0

基于原始数组创建差异列表。然后,我们将其分解为值和索引,以创建"pd.Series"。加入新的pd.Series并重新排序。

# new list
ups = [[x+0.05,y+0.5] for x,y in zip(dummy_array.index, dummy_array)]
idx = [i[0] for i in ups]
val = [i[1] for i in ups]
d2 = pd.Series(val, index=idx)
d3 = pd.concat([dummy_array,d2], axis=0)
d3.sort_values(inplace=True)
d3
0.00   -10.0
0.05    -9.5
0.10    -9.0
0.15    -8.5
0.20    -8.0
0.25    -7.5
0.30    -7.0
0.35    -6.5
0.40    -6.0
0.45    -5.5
0.50    -5.0
0.55    -4.5
0.60    -4.0
0.65    -3.5
0.70    -3.0
0.75    -2.5
0.80    -2.0
0.85    -1.5
0.90    -1.0
0.95    -0.5
1.00     0.0
1.05     0.5
1.10     1.0
1.15     1.5
1.20     2.0
1.25     2.5
1.30     3.0
1.35     3.5
1.40     4.0
1.45     4.5
1.50     5.0
1.55     5.5
1.60     6.0
1.65     6.5
1.70     7.0
1.75     7.5
1.80     8.0
1.85     8.5
1.90     9.0
1.95     9.5
2.00    10.0
2.05    10.5
dtype: float64

感谢大家的贡献。在看了答案并思考了更多之后,我找到了一个更通用的解决方案,可以处理所有可能的情况。在这种情况下,我想将dummy_arrayA的样本提升到与dummy_arrayB相同的索引。我所做的是创建一个既有a又有B的新索引。然后我使用reindex和interpole函数来计算新值,最后我去掉旧索引,这样我就得到了与dummy_array-B相同的数组大小。

import pandas as pd
import numpy as np
# Create Dummy arrays
dummy_arrayA = pd.Series(np.array(range(0, 4)), index=[0,0.5,1.0,1.5])
dummy_arrayB = pd.Series(np.array(range(0, 5)), index=[0,0.4,0.8,1.2,1.6])
# Create new index based on array A
new_ind = pd.Index(dummy_arrayA.index)
# merge index A and B
new_ind=new_ind.union(dummy_arrayB.index)
# Use the reindex function. This will copy all the values and add the missing ones with nan. Then we call the interpolate function with the index method. So that it's interpolates based on the time.
df2 = dummy_arrayA.reindex(index=new_ind).interpolate(method="index")
# Delete the points.
New_ind_inter = dummy_arrayA.index.intersection(new_ind)
# We need to prevent that common point are also deleted.
new_ind = new_ind.difference(New_ind_inter)
# Delete the old points. So that the final array matchs dummy_arrayB
df2 = df2.drop(new_ind)
print(df2)

最新更新