为什么在传递numpy.ndarray或列表时,python statistics.mean()函数的作用不同



为什么statistics.mean的行为如此怪异?当通过时,numpy.ndarray输出平均

statistics.mean(np.array([1,4,9])) 
4

当通过时,列表输出实际平均

statistics.mean([1,4,9]) 
4.666666666666667

我使用的是python 3.7

不,在第一种情况下不会返回中值。它将平均值返回为numpy.int64,因为输入是非原始整数的数组。

如果将非基元对象传递给statistics.mean,则结果将转换为输入数据类型。在您的情况下,statistics.mean做的事情相当于:

numpy.int64(sum(np.array([1,4,9]))/len(np.array([1,4,9])))

我使用的是Python 3.8。这是mean:的代码

def mean(data):
"""Return the sample arithmetic mean of data.
>>> mean([1, 2, 3, 4, 4])
2.8
>>> from fractions import Fraction as F
>>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
Fraction(13, 21)
>>> from decimal import Decimal as D
>>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
Decimal('0.5625')
If ``data`` is empty, StatisticsError will be raised.
"""
if iter(data) is data:
data = list(data)
n = len(data)
if n < 1:
raise StatisticsError('mean requires at least one data point')
T, total, count = _sum(data)
assert count == n
return _convert(total/n, T)

这是_sum:的代码

def _sum(data, start=0):
"""_sum(data [, start]) -> (type, sum, count)
Return a high-precision sum of the given numeric data as a fraction,
together with the type to be converted to and the count of items.
If optional argument ``start`` is given, it is added to the total.
If ``data`` is empty, ``start`` (defaulting to 0) is returned.

Examples
--------
>>> _sum([3, 2.25, 4.5, -0.5, 1.0], 0.75)
(<class 'float'>, Fraction(11, 1), 5)
Some sources of round-off error will be avoided:
# Built-in sum returns zero.
>>> _sum([1e50, 1, -1e50] * 1000)
(<class 'float'>, Fraction(1000, 1), 3000)
Fractions and Decimals are also supported:
>>> from fractions import Fraction as F
>>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)])
(<class 'fractions.Fraction'>, Fraction(63, 20), 4)
>>> from decimal import Decimal as D
>>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")]
>>> _sum(data)
(<class 'decimal.Decimal'>, Fraction(6963, 10000), 4)
Mixed types are currently treated as an error, except that int is
allowed.
"""
count = 0
n, d = _exact_ratio(start)
partials = {d: n}
partials_get = partials.get
T = _coerce(int, type(start))
for typ, values in groupby(data, type):
T = _coerce(T, typ)  # or raise TypeError
for n,d in map(_exact_ratio, values):
count += 1
partials[d] = partials_get(d, 0) + n
if None in partials:
# The sum will be a NAN or INF. We can ignore all the finite
# partials, and just look at this special one.
total = partials[None]
assert not _isfinite(total)
else:
# Sum all the partial sums using builtin sum.
# FIXME is this faster if we sum them in order of the denominator?
total = sum(Fraction(n, d) for d, n in sorted(partials.items()))
return (T, total, count)

这是_convert:的代码


def _convert(value, T):
"""Convert value to given numeric type T."""
if type(value) is T:
# This covers the cases where T is Fraction, or where value is
# a NAN or INF (Decimal or float).
return value
if issubclass(T, int) and value.denominator != 1:
T = float
try:
# FIXME: what do we do if this overflows?
return T(value)
except TypeError:
if issubclass(T, Decimal):
return T(value.numerator)/T(value.denominator)
else:
raise

不,它不是中值。statistics.mean()除了列表之外,由于您传递了一个numpy数组的整数,所以您会得到四舍五入的值。使用np.mean(np.array([1,4,9]))计算numpy数组的平均值

这是由于函数statistics.mean的定义。该函数使用子例程_convert。

  • 在列表的情况下,它将被称为_convert(Fraction(14, 3), int)

    由于intint的子类,执行的代码将是

    if issubclass(T, int) and value.denominator != 1:
    T = float
    try:
    return T(value)
    
  • 在numpy数组的情况下,它将被称为_convert(Fraction(14, 3), np.int64),执行的代码将只是

    try:
    return T(value) 
    

    因为CCD_ 13不是CCD_。

最新更新