如何通过Java中的执行器框架在DynamoDB中获得最佳的散装插入率



我使用dynamodb sdk在本地发电机db上使用java的批量写入(约5.5k个项目(进行POC。我知道每个批量写入都不能超过25个写操作,因此我将整个数据集分为25个项目的块。然后,我将这些块作为执行人框架中的可召唤动作传递。不过,我没有令人满意的结果,因为5.5k记录在超过100秒内被插入。

我不确定该如何优化。在创建桌子时,我将WriteCapacityUnit提供为400(不确定我可以给出的最大值是什么(,并进行了一些实验,但从未有任何区别。我还尝试更改执行程序中的线程数。

这是执行批量写入操作的主要代码:


    public static void main(String[] args) throws Exception {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient().withEndpoint("http://localhost:8000");
        final AmazonDynamoDB aws = new AmazonDynamoDBClient(new BasicAWSCredentials("x", "y"));
        aws.setEndpoint("http://localhost:8000");
        JSONArray employees = readFromFile();
        Iterator<JSONObject> iterator = employees.iterator();
        List<WriteRequest> batchList = new ArrayList<WriteRequest>();
        ExecutorService service = Executors.newFixedThreadPool(20);
        List<BatchWriteItemRequest> listOfBatchItemsRequest = new ArrayList<>();
        while(iterator.hasNext()) {
            if (batchList.size() == 25) {
                Map<String, List<WriteRequest>> batchTableRequests = new HashMap<String, List<WriteRequest>>();
                batchTableRequests.put("Employee", batchList);
                BatchWriteItemRequest batchWriteItemRequest = new BatchWriteItemRequest();
                batchWriteItemRequest.setRequestItems(batchTableRequests);
                listOfBatchItemsRequest.add(batchWriteItemRequest);
                batchList = new ArrayList<WriteRequest>();
            }
            PutRequest putRequest = new PutRequest();
            putRequest.setItem(ItemUtils.fromSimpleMap((Map) iterator.next()));
            WriteRequest writeRequest = new WriteRequest();
            writeRequest.setPutRequest(putRequest);
            batchList.add(writeRequest);
        }
        StopWatch watch = new StopWatch();
        watch.start();
        List<Future<BatchWriteItemResult>> futureListOfResults = listOfBatchItemsRequest.stream().
                map(batchItemsRequest -> service.submit(() -> aws.batchWriteItem(batchItemsRequest))).collect(Collectors.toList());
        service.shutdown();
        while(!service.isTerminated());
        watch.stop();
        System.out.println("Total time taken : " + watch.getTotalTimeSeconds());
    }
}

这是用于创建DynamoDB表的代码:

    public static void main(String[] args) throws Exception {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient().withEndpoint("http://localhost:8000");
        DynamoDB dynamoDB = new DynamoDB(client);
        String tableName = "Employee";
        try {
            System.out.println("Creating the table, wait...");
            Table table = dynamoDB.createTable(tableName, Arrays.asList(new KeySchemaElement("ID", KeyType.HASH)
            ), Arrays.asList(new AttributeDefinition("ID", ScalarAttributeType.S)),
                    new ProvisionedThroughput(1000L, 1000L));
            table.waitForActive();
            System.out.println("Table created successfully.  Status: " + table.getDescription().getTableStatus());
        } catch (Exception e) {
            System.err.println("Cannot create the table: ");
            System.err.println(e.getMessage());
        }
    }

dynamodb本地是为需要开发DynamoDB且不为规模或性能而设计的开发人员的工具。因此,它不打算用于比例测试,如果您需要测试散装负载或其他高速工作负载,则最好使用真实的表。直播桌子上开发测试产生的实际成本通常很少,因为在测试运行期间只需要为高容量提供桌子。

相关内容

  • 没有找到相关文章

最新更新