PostGIS最近邻居搜索结果混乱



我有一个Django/PostgreSQL应用程序,显示哪些用户最接近某个特定用户。它在ORDER BY子句中使用PostGIS 2.0 KNN (K Nearest Neighbors) <->运算符来列出用户,最近的放在第一位。在我的初始数据集中,我发现有两个搜索结果是无序的(所有距离都是从加利福尼亚州洛杉矶开始测量的):

Member, City, State, Distance (miles)
user1, North Las Vegas, NV, 239
user2, Phoenix, AZ, 365
user3, Provo, UT, 568
user4, Twin Falls, ID, 630
user5, Albuquerque, NM, 673
user6, Portland, OR, 828
user7, Bozeman, MT, 896
user8, Seattle, WA, 962
user9, Boulder, CO, 834       <- Out of order!
user10, Laramie, WY, 862      <- Out of order!
user11, Naperville, IL, 1756

Member name是Django的contrib.auth.models User类中的username列。包含几何信息的UserAccount类定义如下:

class UserAccount(models.Model):
    user = models.OneToOneField(User, primary_key=True, unique=True)
    address_line_1 = models.CharField(max_length=30)
    address_line_2 = models.CharField(max_length=30, blank=True)
    city = models.CharField(max_length=30)
    region = models.CharField(max_length=30, blank=True)
    postal_code = models.CharField(max_length=10, blank=True)
    country = models.ForeignKey('Country')
    measurement_sys = models.CharField(max_length=5)  # US or Metric
    # User's home (default) and current longitude and latitude
    home_lon = models.FloatField(default=0.0)
    home_lat = models.FloatField(default=0.0)
    current_lon = models.FloatField(default=0.0)
    current_lat = models.FloatField(default=0.0)
    # GeoDjango-specific fields 
    home_point = models.PointField(srid=4326)
    current_point = models.PointField(srid=4326)
    objects = models.GeoManager()

下面是Django视图中的查询:

def members(request, template):
    """View all members of the website."""
    uid = request.session['uid']   # PK from User table
    # Get the current user's lon/lat and measurement system
    try:
        ua = UserAccount.objects.get(user_id=uid)
        lon = ua.current_lon
        lat = ua.current_lat
        measurement_sys = ua.measurement_sys
    except UserAccount.DoesNotExist as e:
        return HttpResponseRedirect(reverse('unable-to-display-members'))
    # Define the proximity query.
    if measurement_sys == 'US':
        multiplier = 0.000621371  # Convert to miles
    else:
        multiplier = 0.001  # Convert to kilometers
    query = "SELECT 
                ua.user_id, 
                au.username, 
                ua.city, 
                ua.region, 
                ST_Distance( 
                    ua.current_point::geography, 
                    ST_GeographyFromText( 
                        'SRID=4326;POINT(" 
                            + str(lon) 
                            + " " 
                            + str(lat) + 
                        ")' 
                    ) 
                )*" + str(multiplier) + " AS distance 
            FROM 
                user_account ua 
                INNER JOIN 
                auth_user au 
                ON (ua.user_id = au.id) 
            WHERE ua.user_id != %s 
            ORDER BY 
                ua.current_point::geometry 
                <-> 
                'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geometry 
            LIMIT 250;"
    # Run the proximity query
    raw_queryset = UserAccount.objects.raw(query, [uid])
    # Paginate results
    user_list = [user for user in raw_queryset]
    list_size = len(list(user_list))
    paginator = Paginator(user_list, 10, 4)
    paginator._count = list_size
    page = request.GET.get('page')
    try:
        users = paginator.page(page)
    except PageNotAnInteger:
        users = paginator.page(1)
    except EmptyPage:
        users = paginator.page(paginator.num_pages)
    return render(request, template, {'users': users})

我在查询中做错了什么吗?KNN运算符有时会"打嗝"并返回一些无序的结果吗?我之所以这样问,是因为当我尝试从表中取出两个乱序记录,然后为地址更远的用户添加额外的记录(即在IL, LA, MI, NC, PA, NY和ME)时,所有结果都是正确的顺序。

顺便说一下,我的输入位于这里。

谢谢!

更新后的答案:

自2011年9月以来,Postgis为kNN邻居功能提供了两个近似解决方案:
  • 使用<->操作符,您可以使用边界框的中心获得最近的邻居,以计算对象间的距离。
  • 使用<#>操作符,您使用边界框本身来计算对象间距离,从而获得最近的邻居。

你的问题是,两者都是近似值,所以它们不是完美的。所以,如果你想要最好的250个结果,你可以使用它们中的任何一个来检索,例如最好的1000个结果,然后通过ST_DISTANCE和LIMIT 250对相同的结果排序,以从这大约1000个结果中获得最好的250个结果。

的例子:

SELECT * FROM 
    (SELECT *,ST_DISTANCE(current_point::geography, 'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geography ) AS st_dist
    FROM ua
    ORDER BY current_point::geometry <-> 'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geometry 
    LIMIT 1000) AS s
    ORDER BY st_dist LIMIT 250;

最新更新