我的任务是使用Z分数检测异常值,并将其值替换为先前的有效值。
signal = ['229.84', '227.8', '221.16', '220.6', '217.52', '225.2', '221.68', '221.68', '225.24', '218.6', '218.6', '222.08', '219.96', '219.52', '223.8', '223.72', '222.6', '222.68', '228.2', '221.84', '229.36', '227.48', '227.48', '226.56', '226.24', '215.32', '220.76', '222.44', '234.12', '226.56', '228.04', '236.64', '228.32', '236.72', '236.84', '237.64', '213.92', '235.52', '238.0', '239.12', '237.12', '217.24', '229.4', '229.4', '239.56', '236.2', '236.2', '220.04', '232.24', '223.92', '220.6', '242.96', '220.4', '242.2', '243.28', '241.72', '241.12', '241.8', '236.6', '234.24', '233.84', '234.8', '236.88', '244.8', '236.0', '230.84', '229.6', '229.84', '214.8', '231.48', '239.6', '239.56', '222.88', '238.24', '238.92', '235.36', '217.48', '217.2', '217.12', '218.08', '222.04', '89.48', '88.8', '223.2', '213.6', '239.6', '214.52', '95.8', '210.8', '209.92', '210.4', '215.76', '210.28', '211.76', '210.64', '211.36', '210.84', '201.84', '211.16', '242.16', '233.28', '212.8', '207.44', '209.0', '208.52', '207.44', '212.08', '210.96', '203.12', '207.76', '202.8', '203.16', '208.36', '209.76', '211.24', '211.24', '211.24', '206.04', '209.76', '210.2', '195.96', '195.84', '207.2', '201.92', '203.8', '199.96', '206.24', '204.12', '233.92', '230.68', '226.4', '221.6', '226.68', '226.56', '225.6', '223.72', '220.44', '223.64', '225.52', '223.96', '228.0', '227.44', '224.4', '223.32', '220.08', '220.2', '221.8', '218.08', '218.08', '216.96']
import numpy as np
results = [ float(s) for s in signal]
mean = np.mean(results)
std = np.std(results)
threshold = -1.5
outlier = []
new_list = []
for i in results:
z = (i-mean)/std
if z < threshold:
outlier.append(i)
outlier in the dataset is [89.48, 88.8, 95.8]
最终列表应该将这些值替换为前一个值(仅当前一个值的z分数不符合z < threshold
条件时)。
编辑:
当我试图扩大到整个文件,与类似的元素,它给了一个错误。文件
with open(f "File.txt") as f:
img_intensity_list = f.readlines()
for count,value in enumerate(img_intensity_list):
img_intensity_list[count] = value.split("[")[1].split("]")[0].split(", ")
# print(img_intensity_list)
for elem,val in enumerate(img_intensity_list):
results = [ float(elem) for elem in img_intensity_list]
mean = np.mean(results)
std = np.std(results)
threshold = -1.5
outlier = []
# new_list = [0 for k in range(len(results))]
for i, value in enumerate(results):
z = (value-mean)/std
if float(z) < threshold:
outlier.append(value)
results[i] = results[i-1]
else:
results[i] = value
Error:float() argument must be a string or a number, not 'list'
代码中的i
是列表中的值。在计算z
值时使用它作为值,在分配前一个结果的值时使用它作为索引。
使用enumerate
获取列表中每个元素的索引和值,如下所示:
for i, value in enumerate( results):
z = (value-mean)/std
if z - threshold:
outlier.append(value)
results[i] = results[i-1]
如果我理解你的代码,这个版本应该会给你预期的结果。
import numpy as np
signal = ['229.84', '227.8', '221.16', '220.6', '217.52', '225.2', '221.68', '221.68', '225.24', '218.6', '218.6', '222.08', '219.96', '219.52', '223.8', '223.72', '222.6', '222.68', '228.2', '221.84', '229.36', '227.48', '227.48', '226.56', '226.24', '215.32', '220.76', '222.44', '234.12', '226.56', '228.04', '236.64', '228.32', '236.72', '236.84', '237.64', '213.92', '235.52', '238.0', '239.12', '237.12', '217.24', '229.4', '229.4', '239.56', '236.2', '236.2', '220.04', '232.24', '223.92', '220.6', '242.96', '220.4', '242.2', '243.28', '241.72', '241.12', '241.8', '236.6', '234.24', '233.84', '234.8', '236.88', '244.8', '236.0', '230.84', '229.6', '229.84', '214.8', '231.48', '239.6', '239.56', '222.88', '238.24', '238.92', '235.36', '217.48', '217.2', '217.12', '218.08', '222.04', '89.48', '88.8', '223.2', '213.6', '239.6', '214.52', '95.8', '210.8', '209.92', '210.4', '215.76', '210.28', '211.76', '210.64', '211.36', '210.84', '201.84', '211.16', '242.16', '233.28', '212.8', '207.44', '209.0', '208.52', '207.44', '212.08', '210.96', '203.12', '207.76', '202.8', '203.16', '208.36', '209.76', '211.24', '211.24', '211.24', '206.04', '209.76', '210.2', '195.96', '195.84', '207.2', '201.92', '203.8', '199.96', '206.24', '204.12', '233.92', '230.68', '226.4', '221.6', '226.68', '226.56', '225.6', '223.72', '220.44', '223.64', '225.52', '223.96', '228.0', '227.44', '224.4', '223.32', '220.08', '220.2', '221.8', '218.08', '218.08', '216.96']
# Converting the strings to floats
results = [ float(s) for s in signal]
mean = np.mean(results)
std = np.std(results)
threshold = -1.5
outlier = []
new_list = [0 for k in range(len(results))]
for i, value in enumerate(results):
z = (value-mean)/std
if float(z) < threshold:
outlier.append(value)
new_list[i] = new_list[i-1]
else:
new_list[i] = value
您需要将浮点值转换为整数值,对于每个"i"在for循环中,使用内置函数int()