从Spring启动发布大型数据的最佳方式



我们有Elastic DB,它有员工任务的详细信息,我们每天都使用spring boot应用程序向Kafka发布员工任务。

弹性数据库索引:employee_task

{
"employeeId":"E001",
"taskName":"task1",
"taskDesc":"task desc",
"startDate":"2022-10-10 11:00:00",
"endDate":"2022-10-10 16:00:00"
}
{
"employeeId":"E001",
"taskName":"task2",
"taskDesc":"task desc",
"startDate":"2022-10-10 16:00:00",
"endDate":"2022-10-10 18:02:00"
}
{
"employeeId":"E002",
"taskName":"task3",
"taskDesc":"task desc",
"startDate":"2022-10-10 09:00:00",
"endDate":"2022-10-10 18:00:00"
}

Spring Boot代码:

@Scheduled(cron = "${cron.task.expression}")
public void scheduleTasks() {
//Get District Employee Ids from index employee_task
List<String> employees = taskService.getAllEmployeeIds();   
//fetch tasks from index employee_task for each employee and publish to Kafka      
employees.parallelStream().forEach(employeeId -> {
Map<String, Object> tasksList = taskService.getAllTasksByEmployeeId(employeeId);
kafkaTemplate.send(topicName, mapper.writeValueAsString(tasksList));
});
}

它将每天以以下格式向Kafka发布任务详细信息,

Message.1
{
"employeeId":"E001",
"taskList":[
{
"employeeId":"E001",
"taskName":"task1",
"taskDesc":"task desc",
"startDate":"2022-10-10 11:00:00",
"endDate":"2022-10-10 16:00:00"
}
{
"employeeId":"E001",
"taskName":"task2",
"taskDesc":"task desc",
"startDate":"2022-10-10 16:00:00",
"endDate":"2022-10-10 18:02:00"
}
]
}
Message.2
{
"employeeId":"E002",
"taskList":[
{
"employeeId":"E002",
"taskName":"task3",
"taskDesc":"task desc",
"startDate":"2022-10-10 09:00:00",
"endDate":"2022-10-10 18:00:00"
}
]
}

到目前为止,一切都很好,因为数据很低。但现在,

Current No. of employees: 10,000
Average Task per Employee: 100

因此,当cron运行时,它会查询弹性数据库10K次。有人能提出处理这种案件的最佳方法吗?

数据库世界中的一句老话"逐行等于慢速乘慢速";。考虑到问题中提供的内容,我猜服务会在某种循环中调用DB。需要编写一些方法才能一次性从数据库中获得数据

最新更新