如何根据时间戳组合数据

我有一系列文本文件，其中包含具有两个独立时间戳的数据，并且希望找出在给定时间所有值的总和。文件可能有不同数量的行，但总是有三列：value timestamp1 timestamp2，带有等条目

6.2 1 4 
4.3 2 9 
7.2 3 10

或

1.2 2 3 
0.3 3 9 
0.1 5 12

以下是输出是如何形成的解释：

来自两个输入的时间戳被统一为具有唯一值的向量(对于上述示例{1,2,3}∪{2,3,5} -> {1,2,3,5}或{4,9,10}∪{3,9,12} -> {3,4,9,10,12}也是如此)
对于每个唯一的时间戳，从每个输入中选择一个数据点，这样：
- 如果查询的时间戳低于可用的最小时间戳，则取第一个数据值
- 否则，采用具有更低或相等时间戳的数据值
将这两个值相加，并处理下一个唯一的时间戳(如果可用)。

如果我使用timestamp1将此算法应用于上面的示例数据，我会得到：

7.4 1  %6.2+1.2
5.5 2  %4.3+0.3
7.5 3  %7.2+0.3
7.3 5  %7.2+0.1

对于timestamp2:

7.4 3  %6.2+1.2
7.4 4  %6.2+1.2
4.6 9  %4.3+0.3
7.5 10 %7.2+0.3
7.3 12 %7.2+0.1

我认为我需要对时间序列做一些事情，因此我已经有了以下转换器代码：

logs = dir('log1/*.txt');
k=1
for log = logs' 
t{k}=timeseries(load(log.name))
k=k+1
end

我想下一步应该是类似sum(t)的东西，但这不起作用。有人知道如何像上面那样把它们组合在一起吗？

对于任何感兴趣的人来说，这些是cpu和实时时间戳(自算法启动以来)，用于测量算法的性能。

我已经考虑了很长一段时间，终于想出了下面的解决方案。虽然它在概念上与Steve的答案没有什么不同，但至少它是矢量化的：)

%% Preparations:
%{
In the same folder:
data1.txt:
6.2 1 4
4.3 2 9
7.2 3 10
data2.txt:
1.2 2 3
0.3 3 9
0.1 5 12
%}

function out = q47303825(fname1,fname2,whichStamp)
%% Input handling:
if nargin < 3
whichStamp = 1;
end
if nargin == 0
fname1 = 'data1.txt';
fname2 = 'data2.txt';
end
%% Reading the data :
d1 = dlmread(fname1,' ');
d2 = dlmread(fname2,' ');
%% Preallocation:
out = union(d1(:,whichStamp+1), d2(:,whichStamp+1)) .* [NaN,1];
%% Modifying the data slightly to allow vectorization:
d1 = [d1(1), -Inf, -Inf; d1; d1(size(d1,1)), +Inf, +Inf];
d2 = [d2(1), -Inf, -Inf; d2; d2(size(d2,1)), +Inf, +Inf];
%% Find indices:
[~,I1] = min(d1(:,whichStamp+1) <= out(:,2).',[],1);
[~,I2] = min(d2(:,whichStamp+1) <= out(:,2).',[],1);
I1 = I1-1; I2 = I2-1;
%% Generate final output:
out(:,1) = d1(I1) + d2(I2);

在我看来，两个不同的时间戳是转移注意力——你可以为一个时间戳定义问题，而忽略另一个。

据我所知，你想：

考虑两个数据集中出现的所有时间(此处，仅考虑时间戳1，即[1,2,3,5])
使用两个列表中任何缺失数据点的最近邻居进行插值/外推：(第一个数据集中缺失5，第二个数据集中缺少1)
返回填充了缺失点的值的总和

如果没有读取操作，这就是我如何看待您的处理：

times1 = [1,2,3];
values1 = [6.2, 4.3, 7.2];
times2 = [2, 3, 5];
values2 = [1.2, 0.3, 0.1];
all_times = union(times1, times2)';
values1_interp = interp1(times1, values1, all_times, 'nearest', 'extrap');
values2_interp = interp1(times2, values2, all_times, 'nearest', 'extrap');
v_sum = values1_interp + values2_interp;

可以看到结果：

>> table(v_sum, all_times)
ans = 
v_sum    all_times
_____    _________
7.4      1        
5.5      2        
7.5      3        
7.3      5

如果我们使用

times1 = [4, 9, 10];
times2 = [3, 9, 12];

然后我们会得到

>> table(v_sum, all_times)
ans = 
v_sum    all_times
_____    _________
7.4       3       
7.4       4       
4.6       9       
7.5      10       
7.3      12

编辑：根据OP的评论，我们不太想要最近的邻居，而是想要最近的邻居，但如果我们在时间开始之前外推时间，我们会使用第一点(例如，当我们的times1是[2,3,4]时，将values1外推到时间1)：为此，您可以使用类似的东西

function [vq] = interp_left(x, v, xq)
%INTERP_LEFT Interpolate to the left-nearest point
% x must be sorted.
vq = nan(size(xq));
for ii = 1:length(xq)
% Find the index in x nearest to xq, only considering smaller x
[~,jj] = max(x(x<=xq(ii)));
% Special case, there are no smaller x; extrapolate using [x(1),v(1)]
if isempty(jj)
vq(ii) = v(1);
else
vq(ii) = v(jj);
end % if
end % for
end % function

然后使用

times1 = [1,2,3];
values1 = [6.2, 4.3, 7.2];
times2 = [2, 3, 5];
values2 = [1.2, 0.3, 0.1];
all_times = union(times1, times2)';
values1_interp = interp_left(times1, values1, all_times);
values2_interp = interp_left(times2, values2, all_times);
v_sum = values1_interp + values2_interp;

相关内容

最新更新

热门标签：