r语言 - Count区域内对象的个数



对于许多图像,我有一个带有图像上对象坐标的标签。我想计算每个对象周围指定大小的框中驻留的图像数量(类似于邻居数量)。到目前为止,我想出了for循环,它是索引的子集,并计算行数。

raw.data <- structure(list(ImageNumber = c(67, 67, 67, 67, 67, 67, 67, 67, 
67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 
67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 
67), ObjectNumber = c(1, 2, 5, 6, 7, 10, 11, 13, 16, 34, 35, 
42, 44, 46, 54, 58, 67, 77, 82, 90, 94, 107, 153, 158, 169, 201, 
223, 254, 294, 315, 386, 493, 508, 553, 599, 606, 612, 625, 676, 
678, 697), Location_Center_X.nuc = c(46.3557910673732, 189.630407911001, 
238.322766570605, 253.236234458259, 134.482566248257, 45.7193336698637, 
136.949320148331, 292.452631578947, 238.591869918699, 147.364275668073, 
93.859943977591, 169.394435351882, 253.794247787611, 97.1797752808989, 
258.430194805195, 233.346428571429, 202.378378378378, 297.966403162055, 
229.343333333333, 298.730679156909, 243.604806408545, 256.607266435986, 
279.823886639676, 288.966666666667, 278.035714285714, 264.86592178771, 
161.519230769231, 280.364672364672, 299.832929782082, 271.572481572482, 
7.72075471698113, 5.81395348837209, 284.742857142857, 291.826747720365, 
5.4331983805668, 295.924778761062, 198.463709677419, 282.083094555874, 
248.316239316239, 281.019867549669, 19.6458333333333), Location_Center_Y.nuc = c(237.48145344436, 
56.1885043263288, 175.412103746398, 144.548845470693, 199.902370990237, 
122.95406360424, 23.9406674907293, 266.46015037594, 116.671544715447, 
122.617440225035, 20.5756302521008, 152.31914893617, 93.3495575221239, 
167.223314606742, 195.261363636364, 26.0714285714286, 123.351351351351, 
227.009881422925, 85.19, 41.9789227166276, 290.567423230975, 
34.9671280276817, 164.975708502024, 91.5090909090909, 39.7205882352941, 
222.66852886406, 238.157692307692, 73.1880341880342, 191.019370460048, 
128.415233415233, 107.4, 37.5488372093023, 210.244155844156, 
131.577507598784, 150.072874493927, 152.650442477876, 3.77016129032258, 
110.702005730659, 2.28205128205128, 3.02649006622517, 2.59027777777778
)), row.names = c(NA, -41L), class = c("tbl_df", "tbl", "data.frame"
))
radius = 80
raw.data$Density.80 = NA;
for (i in 1:nrow(raw.data)){
x = raw.data$Location_Center_X.nuc[i]
y = raw.data$Location_Center_Y.nuc[i]
imN = raw.data$ImageNumber[i]
sub_samp = raw.data[which(raw.data$Location_Center_X.nuc >= x-radius &
raw.data$Location_Center_X.nuc <= x+radius &
raw.data$Location_Center_Y.nuc >= y-radius &
raw.data$Location_Center_Y.nuc <= y+radius &
raw.data$ImageNumber == imN),]
raw.data$Density.80[i] = nrow(sub_samp) - 1
}

问题是,对于大型数据集(数百到数千张图像中的数万到数十万个对象),这个过程需要几个小时。因此,盒子尺寸的优化将永远持续下去。

我想写一个函数来加速这个过程。这是我的尝试,每个图像返回单个数字,而不是每个对象的数字。我也纠结于如何将这种功能应用于purrr::map_*

count_neighbors <- function(.data, radius, ...){
.data %>%
group_by(ImageNumber) %>%
filter(between(Location_Center_X.nuc, Location_Center_X.nuc - radius, Location_Center_X.nuc + radius) &
between(Location_Center_Y.nuc, Location_Center_Y.nuc - radius, Location_Center_Y.nuc + radius)) %>%
tally()

}
count_neighbors(raw.data, radius = 80)

您可以编写一个函数,计算一个对象在该区域内的对象数量。

count_values <- function(x, y, xVal, yVal, radius) {
sum(xVal >= x-radius & xVal <= x+radius &
yVal >= y-radius & yVal <= y+radius) - 1
}

您可以对图像中的每个对象使用此函数。

library(dplyr)
library(purrr)
raw.data %>%
group_by(ImageNumber) %>%
mutate(result = map2_dbl(Location_Center_X.nuc, Location_Center_Y.nuc, 
~count_values(.x, .y, Location_Center_X.nuc, 
Location_Center_Y.nuc, 80))) -> raw.data
raw.data

一个解决方案是使用1:nrow(df)作为purrr的映射的主要参数。

get_image_counts <- function(df, distance){

purrr::map(1:nrow(df), function(idx){

x <- df[idx,] %>% pull(Location_Center_X.nuc)
y <- df[idx,] %>% pull(Location_Center_Y.nuc)

df %>% filter(Location_Center_X.nuc > x - distance & Location_Center_X.nuc < x + distance &
Location_Center_Y.nuc > y - distance & Location_Center_Y.nuc < y + distance) %>% 
nrow

}) %>% unlist
}
raw.data %>% tibble::add_column(neighbs = get_image_counts(raw.data, radius))

这个解决方案的一个优点是它可以很好地处理多个图像。

raw.data %>% group_split(ImageNumber) %>% purrr::map(function(df){

df %>% tibble::add_column(neighbs = get_image_counts(df, radius))

})

这将给你一个列表的标题与一个新的列,邻居,它给出相邻对象的计数在图像中,我认为这是你正在寻找的。在没有看到完整的数据之前,不能说这是否解决了您的问题。如果它太慢,您可能需要使用furr包,它提供并行映射函数。

最新更新