如何计算文本 (.txt) 文件的标准偏差


Category;currency;sellerRating;Duration;endDay;ClosePrice;OpenPrice;Competitive?
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;Yes

实际文件中没有白空格,否则它会显示错误。我想计算每个类别的标准除法。

我尝试使用它:statistics.stdev() 但这不起作用。谁能帮我,当你有遮阳篷时,你能解释一下,这样我就可以学习了。

from csv import DictReader
from collections import defaultdict
from statistics import median
from locale import setlocale
from locale import LC_ALL
from locale import atof
setlocale(LC_ALL, 'Dutch_Netherlands.1252')
median_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
print ("Mediaan : ")
data = defaultdict(list)
with open('bijlage.txt') as f:
    csvreader = DictReader(f, delimiter=';')
    for dic in csvreader:
        for header, value in dic.items():
            data[header].append(value)
for median_name in median_names:
    med = median(map(atof, data[median_name]))
    print('{:<13} {:>10}'.format(median_name, med))
from collections import defaultdict
import csv
import locale
import statistics
from pprint import pprint, pformat
import locale
locale.setlocale(locale.LC_ALL, 'Dutch_Netherlands.1252')
avg_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
averages = {avg_name: 0 for avg_name in avg_names}
seller_ratings = defaultdict(list)
num_values = 0
with open('bijlage.txt', newline='') as bestand:
     csvreader = csv.DictReader(bestand, delimiter=';')
     for row in csvreader:
        num_values += 1
        for avg_name in avg_names:
             averages[avg_name] += locale.atof(row[avg_name])
seller_ratings[row['Category']].append(locale.atof(row['sellerRating']))
for avg_name, total in averages.items():
    averages[avg_name] = total / num_values
print()
print('Averages:')
for avg_name in avg_names:
    rounded = locale.format_string('%.2f', round(averages[avg_name], 2),
                               grouping=True)
    print('  {:<13} {:>10}'.format(avg_name, rounded))
modes = {}
for category, values in seller_ratings.items():
    try:
        modes[category] = statistics.mode(values)
    except statistics.StatisticsError:
        modes[category] = None  # No unique mode.
print()
print('Modes:')
for category, mode in modes.items():
    if mode is None:
         print('  {:<20} {:>10}'.format(category, '-'))
    else:
        rounded = locale.format_string('%.2f', round(mode, 2), grouping=True)
        print('  {:<20} {:>10}'.format(category, rounded))

在你之前的问题中,已经描述了如何获得平均值、中位数和类似的东西:https://stackoverflow.com/a/54021108/8181134
使用相同的,但比.std()函数,您可以得到标准偏差:

import pandas as pd
df = pd.read_csv('bijlage.csv', delimiter=';', decimal=',')  # 'bijlage.txt' in your case
sellerRating_std = df['sellerRating'].std()
print('Seller rating standard deviation: {}'.format(sellerRating_std)

首先,请注意,median_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'并没有做你可能期望的。

你需要的是分配一个元组,稍后会迭代,如下所示: median_names = ('sellerRating', 'Duration', 'ClosePrice', 'OpenPrice')

完成此操作后,您可以像计算中位数一样计算标准偏差:

from csv import DictReader
from collections import defaultdict
from statistics import median
from locale import setlocale
from locale import LC_ALL
from locale import atof
setlocale(LC_ALL, 'Dutch_Netherlands.1252')
stddev_names = ('sellerRating', 'Duration', 'ClosePrice', 'OpenPrice')
print ("std dev : ")
data = defaultdict(list)
with open('bijlage.txt') as f:
    csvreader = DictReader(f, delimiter=';')
    for dic in csvreader:
        for header, value in dic.items():
            data[header].append(value)
for name in stddev_name:
    stddev_val = stdev(map(atof, data[name]))
    print('{:<13} {:>10}'.format(name, stddev_val))

你的第一种方法(对于中位数)是你想使用 statistics 模块:

setlocale(LC_ALL, 'Dutch_Netherlands.1252')
median_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
print ("Mediaan : ")
data = defaultdict(list)
with open('bijlage.txt') as f:
    csvreader = DictReader(f, delimiter=';')
    for dic in csvreader:
        for header, value in dic.items():
            data[header].append(value)
for median_name in median_names:
    med = median(map(atof, data[median_name]))
    print('{:<13} {:>10}'.format(median_name, med))

这部分没有变化,你只需要在它之后立即处理 stdev,因为你可以使用相同的列表data字典:

from statistics import stdev
print("nStd Dev (sample)")
for median_name in median_names:
    std= stdev(map(atof, data[median_name]))
    print('{:<13} {:>10}'.format(median_name, std))

最新更新