如何在Apache Spark Java或Scala中实现这一点



当行程开始时,汽车上的设备不会发送TRIP ID,但会在TRIP结束时发送一个。如何将相应的TRIP IDS应用于相应的记录

09:30,25,DEVICE_1
10:30,55,DEVICE_1
10:25,0,DEVICE_1,TRIP_ID_0
11:30,45,DEVICE_1
10:30,55,DEVICE_2
10:30,55,DEVICE_3
11:30,45,DEVICE_3
12:30,0,DEVICE_3,TRIP_ID_3
10:30,55,DEVICE_4
11:30,45,DEVICE_4
11:30,45,DEVICE_2
12:30,0,DEVICE_2,TRIP_ID_2
12:30,0,DEVICE_4,TRIP_ID_4
10:30,55,DEVICE_5
11:30,45,DEVICE_5
12:30,0,DEVICE_5,TRIP_ID_5
12:30,0,DEVICE_1,TRIP_ID_1

所以上面变成这样,

09:30,25,DEVICE_1,TRIP_ID_0
10:25,0,DEVICE_1,TRIP_ID_0
10:30,55,DEVICE_1,TRIP_ID_1
11:30,45,DEVICE_1,TRIP_ID_1
12:30,0,DEVICE_1,TRIP_ID_1
10:30,55,DEVICE_2,TRIP_ID_2
11:30,45,DEVICE_2,TRIP_ID_2
12:30,0,DEVICE_2,TRIP_ID_2
10:30,55,DEVICE_3,TRIP_ID_3
11:30,45,DEVICE_3,TRIP_ID_3
12:30,0,DEVICE_3,TRIP_ID_3
10:30,55,DEVICE_4,TRIP_ID_4
11:30,45,DEVICE_4,TRIP_ID_4
12:30,0,DEVICE_4,TRIP_ID_4
10:30,55,DEVICE_5,TRIP_ID_5
11:30,45,DEVICE_5,TRIP_ID_5
12:30,0,DEVICE_5,TRIP_ID_5

一个有趣的问题。必须修复一个错误!

您将需要转换为spark.sql,就像我在ORACLE中尝试的那样。但是在spark.sql中支持WITH子句。此外,由于时间已经很晚了,我没有使用日期字符串,而是使用数字来表示时间,所以您需要了解这一点。

但这里是您可以调整的SQL。

with X as (select device, time_asc, trip_id from trips where trip_id is not null)
select Y.TRIP_ID, Y.DEVICE, Y.TIME_ASC FROM (
select T1.TIME_ASC, T1.DEVICE, X.TRIP_ID, X.TIME_ASC AS TIME_ASC_COMPARE
,RANK() OVER (PARTITION BY T1.TIME_ASC, T1.DEVICE ORDER BY X.TIME_ASC) AS RANK_VAL       from trips T1, X
where T1.DEVICE = X.DEVICE
and T1.TIME_ASC <= X.TIME_ASC) Y
where RANK_VAL = 1
order by TRIP_ID, TIME_ASC

通过取消订单,只是用来显示。

此数据作为输入:

('1','A',null);
('2','A','TRIP_01');
('5','A',null);
('6','A',null);
('7','A',null);
('23','A','TRIP_02');
('56','A',null);
('60','A','TRIP_04');
('8','B',null);
('10','B','TRIP_03');
('1','E',null);
('2','E','TRIP_05');

在我导出并获得此格式时删除引号,返回以下内容,我认为这将满足您的需求-再次原谅格式化:

('TRIP_01','A','1');
('TRIP_01','A','2');
('TRIP_02','A','5');
('TRIP_02','A','6');
('TRIP_02','A','7');
('TRIP_02','A','23');
('TRIP_03','B','8');
('TRIP_03','B','10');
('TRIP_04','A','56');
('TRIP_04','A','60');
('TRIP_05','E','1');
('TRIP_05','E','2');

我想知道SPARK在引擎盖下的性能如何处理这一问题。这在深夜花费了一些努力,因此人们寻求一些赞赏。也很享受。

最新更新