AWS DynamoDB BOTO3混乱扫描



基本上,如果我循环一个日期时间执行扫描与日期范围每天,如:

table_hook = dynamodb_resource.Table('table1')
date_filter = Key('date_column').between('2021-01-01T00:00:00+00:00', '2021-01-01T23:59:59+00:00')
response = table_hook.scan(FilterExpression=date_filter)
incoming_data = response['Items']
if (response['Count']) == 0:
return
_counter = 1
while 'LastEvaluatedKey' in response:
response = table_hook.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
if ( 
parser.parse(response['Items'][0]['date_column']).replace(tzinfo=None) < parser.parse('2021-01-01T00:00:00+00:00').replace(tzinfo=None) 
or 
parser.parse(response['Items'][0]['date_column']).replace(tzinfo=None).replace(tzinfo=None) > parser.parse('2021-06-07T23:59:59+00:00').replace(tzinfo=None) 
):
break

incoming_data.extend(response['Items'])
_counter+=1
print("|->   Getting page %s" % _counter)

在Day1到Day2循环结束时,它检索我X行,

但是如果我以相同的方式执行相同的扫描(分页),具有相同的范围(Day1到Day2),而不做循环,它检索我Y行,

为了变得更好,当我执行一个table.describe_table(TableName='table1'), row_count字段有Z行,我真的不明白发生了什么!

基于上述家伙的帮助,我发现了我的错误,基本上我没有在执行分页时再次传递过滤器,所以固定的代码是:

table_hook = dynamodb_resource.Table('table1')
date_filter = Key('date_column').between('2021-01-01T00:00:00+00:00', '2021-01-01T23:59:59+00:00')
response = table_hook.scan(FilterExpression=date_filter)
incoming_data = response['Items']
_counter = 1
while 'LastEvaluatedKey' in response:
response = table_hook.scan(FilterExpression=date_filter,
ExclusiveStartKey=response['LastEvaluatedKey'])

incoming_data.extend(response['Items'])
_counter+=1
print("|->   Getting page %s" % _counter)

最新更新