如何在pydruid中获得分位数/中位数



我的目标是查询druid数据源中height列的中值。我能够使用其他聚合,如计数和计数不同的值。以下是我到目前为止的查询:

group = query.groupby(
datasource=datasource,
granularity='all',
intervals='2020-01-01T00:00:00+00:00/2101-01-01T00:00:00+00:00',
dimensions=[
"category_a"
],
filter=(Dimension("country") == country_id),
aggregations={
'count': longsum('count'),
'count_distinct_city': aggregators.thetasketch('city'),
}
)

在postaggregator.py下有一个类Quantile,所以我尝试使用这个。

class Quantile(Postaggregator):
def __init__(self, name, probability):
Postaggregator.__init__(self, None, None, name)
self.post_aggregator = {
"type": "quantile",
"fieldName": name,
"probability": probability,
}

这是我得到中位数的尝试:

post_aggregations={
'median_value': postaggregator.Quantile(
'height', 50 
)
}

我得到的错误是'Could not resolve type id 'quantile' as a subtype of [simple type, class io.druid.query.aggregation.PostAggregator]:

Druid Error: {'error': 'Unknown exception', 'errorMessage': 'Could not resolve type id 'quantile' as a subtype of [simple type, class io.druid.query.aggregation.PostAggregator]: known type ids = [arithmetic, constant, doubleGreatest, doubleLeast, expression, fieldAccess, finalizingFieldAccess, hyperUniqueCardinality, javascript, longGreatest, longLeast, quantilesDoublesSketchToHistogram, quantilesDoublesSketchToQuantile, quantilesDoublesSketchToQuantiles, quantilesDoublesSketchToString, sketchEstimate, sketchSetOper, thetaSketchEstimate, thetaSketchSetOp] (for POJO property 'postAggregations')n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 856] (through reference chain: io.druid.query.groupby.GroupByQuery["postAggregations"]->java.util.ArrayList[0])', 'errorClass': 'com.fasterxml.jackson.databind.exc.InvalidTypeIdException', 'host': None}

我修改了pydruid的代码,使它在我们的终端上工作。我在/pydruid/utils下创建了新的aggregator和postaggregator。

aggregator.py

def quantilesDoublesSketch(raw_column, k=128):
return {"type": "quantilesDoublesSketch", "fieldName": raw_column, "k": k}

postaggregator.py

class QuantilesDoublesSketchToQuantile(Postaggregator):
def __init__(self, name: str, field_name: str, fraction: float):
self.post_aggregator = {
"type": "quantilesDoublesSketchToQuantile",
"name": name,
"fraction": fraction,
"field": {
"fieldName": field_name,
"name": field_name,
"type": "fieldAccess",
},
}

我第一次创建一个PR!希望他们接受并正式发表。

https://github.com/druid-io/pydruid/pull/287

最新更新