我正在试图找到csv文件的单个提取列的标准偏差(σ=√[(∑(x-MEAN))2÷n])。csv文件包含大约45000个实例和17个用";"表示的属性。为了找到标准偏差,它需要在while循环的每一次迭代中的平均值,用于具有Xi的子动作。所以我认为MEAN需要在while循环迭代之前找到标准偏差。但我不知道如何做到这一点,也不知道有什么方法可以做到。我被困在这里了。然后我就把旧的Xi换成了新的Xi。然后写入(生成)新的csv文件。
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.io.FileWriter;
import java.io.*;
import static java.lang.Math.sqrt;
public class Main {
public static void main(String[] args) throws IOException {
String filename = "ly.csv";
File file = new File(filename);
BufferedWriter writer = null;
try {
writer = new BufferedWriter(new FileWriter("bank-full_updated.csv"));
}
catch (IOException e) {
}
try {
double Tuple,avg;
double temp;
Tuple = 0;
double stddev=0;
Scanner inputStream = new Scanner(file);
inputStream.next();
while (inputStream.hasNext()) {
String data1 = inputStream.next();
String[] values = data1.split(";");
double Xi = Double.parseDouble(values[1]);
//now finding standard deviation
temp1 += (Xi-MEAN);
// temp2=(temp1*temp1);
// temp3=(temp2/count);
// standard deviation=Math.sqrt(temp3);
Xi=standard deviation * Xi
//now replace new Xi to original values1
values[1] = String.valueOf(Xi);
// iterate through the values and build a string out of them for write a new file
StringBuilder sb = new StringBuilder();
String newData = sb.toString();
for (int i = 0; i < values.length; i++) {
sb.append(values[i]);
if (i < values.length - 1) {
sb.append(";");
}
}
// get the new string
System.out.println(sb.toString());
writer.write(sb.toString()+"n");
}
writer.close();
inputStream.close();
}
catch (FileNotFoundException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
可以在一次通过中计算标准偏差。Donald Knuth教授有一种使用Kahan求和的算法。这是论文:http://researcher.ibm.com/files/us-ytian/stability.pdf
这是另一种方法,但它会受到舍入误差的影响:
double std_dev2(double a[], int n) {
if(n == 0)
return 0.0;
double sum = 0;
double sq_sum = 0;
for(int i = 0; i < n; ++i) {
sum += a[i];
sq_sum += a[i] * a[i];
}
double mean = sum / n;
double variance = sq_sum / n - mean * mean;
return sqrt(variance);
}