我想创建一个列表,该列表提供一个输出,如果原始列表有一个空值,它将使用相邻值的平均和来替换它。假设缺失数据用-99 表示
def clean_missing_data():
data_list = []
for number, adjacent in enumerate(a):
if (number != -99):
data_list.append(number)
else:
adjacent_left = a[number-1]
adjacent_right = a[number+1]
fill_in = (adjacent_left + adjacent_right) / 2
data_list.append(fill_in)
return data_list
a = [1,2,3,-99,5]
check_data = clean_missing_data()
print('original test case:', a)
print('After clearing, the test case became:', check_data)
输出
original test case: [1, 2, 3, -99, 5]
After clearing, the test case became: [0, 1, 2, 3, 4]
例如,对于这个测试用例,缺失的值是列表的第四个数字(用-99表示(,这意味着列表取相邻数据的和平均值;值3和5,并将其替换回列表。
本质上,它意味着:[1,2,3,(3+5(/2,5]
请帮忙!
需求有点不清楚,所以我不能100%确定这是否符合您的要求,但这是我目前的最佳猜测。
def get_right_number(numbers, i):
""" Recursive function to search for the first valid number to the right """
if i >= len(numbers) - 1:
right = -99
else:
right = numbers[i + 1]
if right == -99:
right = get_right_number(numbers, i+1)
return right
def clean_missing_data(numbers):
print(f'Input: {numbers}')
if all(x == -99 for x in numbers):
print('All values in list are invalid. Could not compute.')
return
clean_numbers = []
for i in range(len(numbers)):
if numbers[i] != -99:
clean_numbers.append(numbers[i])
else:
valid_count = 0
if i == 0:
left = 0
else:
left = clean_numbers[i - 1]
valid_count += 1
right = get_right_number(numbers, i)
if right == -99:
right = 0
else:
valid_count += 1
average = (left + right) / valid_count
clean_numbers.append(average)
print(f'Output: {clean_numbers}n')
return clean_numbers
以下是我的测试用例(打印嵌入在上面的clean方法中(:
clean_missing_data([1, 2, 3, 4, 5])
clean_missing_data([1, 2, 3, -99, 5])
clean_missing_data([-99, 2, 3, 4, 5])
clean_missing_data([-99, -99, 3, 4, 5])
clean_missing_data([1, 2, 3, 4, -99])
clean_missing_data([1, 2, 3, -99, -99])
clean_missing_data([1, -99, -99, -99, 5])
clean_missing_data([-99, -99, -99, -99, -99])
以下是输出:
Input: [1, 2, 3, 4, 5]
Output: [1, 2, 3, 4, 5]
Input: [1, 2, 3, -99, 5]
Output: [1, 2, 3, 4.0, 5]
Input: [-99, 2, 3, 4, 5]
Output: [2.0, 2, 3, 4, 5]
Input: [-99, -99, 3, 4, 5]
Output: [3.0, 3.0, 3, 4, 5]
Input: [1, 2, 3, 4, -99]
Output: [1, 2, 3, 4, 4.0]
Input: [1, 2, 3, -99, -99]
Output: [1, 2, 3, 3.0, 3.0]
Input: [1, -99, -99, -99, 5]
Output: [1, 3.0, 4.0, 4.5, 5]
Input: [-99, -99, -99, -99, -99]
All values in list are invalid.
请注意,当您有一个无效数字字符串时,我们将获取最正确的有效数字,并取其平均值。这个新的平均值将被考虑在下一个数字的计算中,等等。这执行了一种插值,但严格来说它不是线性插值。如果没有完整的要求,这将不得不做现在(在时间和预算下!(
如果您需要更改需求,您可以调整上面的代码,直到所有测试用例都满足您的需求。我也相信有一种更干净的方法可以做到这一点,但我会让你自己去想。祝你好运
您混淆了变量number
和adjacent
。约定是讨论enumerate(a)
返回index
作为数组中的位置,返回element
作为元素本身。在这种情况下,您的代码将变为
def clean_missing_data():
data_list = []
for index, element in enumerate(a):
if (element != -99):
data_list.append(element)
else:
adjacent_left = a[index - 1]
adjacent_right = a[index + 1]
fill_in = (adjacent_left + adjacent_right) / 2
data_list.append(fill_in)
return data_list
a = [1,2,3,-99,5]
check_data = clean_missing_data()
print('original test case:', a)
print('After clearing, the test case became:', check_data)
它给出了[1, 2, 3, 4.0, 5]
,其中4.0当然相当于4
您确实需要了解代码仍然存在一些问题。如果第一个或最后一个数字是-99怎么办?如果两个相邻的数字是-99怎么办?但这至少应该适用于你举的例子!