r—如果两列数据在其上一行中具有相同的值,则对其添加0.00001



我有一个数据框架,其中包含一列纬度和一列经度,如下所示

test <- data.frame("Latitude" = c(45.14565, 45.14565, 45.14565, 45.14565, 33.2222, 
31.22122, 31.22122), "Longitude" = c(-105.6666, -105.6666, -105.6666, -104.3333, 
-104.3333, -105.77777, -105.77777))

我想让每个值都移到小数点后5位,并检查是否纬度和经度对与上面的对相同,将纬度和经度值都添加0.00001。所以我的数据会变成这样:

test_updated <- data.frame("Latitude" = c(45.14565, 45.14566, 45.14567, 45.14565, 
33.22220, 31.22122, 31.22123), "Longitude" = c(-105.66660, -105.66661, -105.66662, 
-104.33330, -104.33330, -105.77777, -105.77778))

下面是更新test中的Latitude列以重现OP的预期结果的方法:

options(digits = 8) # required to print all significant digits of Longitude
library(data.table)
setDT(test)[, `:=`(Latitude  = Latitude  + (seq(.N) - 1) * 0.00001,
Longitude = Longitude + (seq(.N) - 1) * 0.00001), 
by = .(Latitude, Longitude)]
test
Latitude  Longitude
1: 45.14565 -105.66660
2: 45.14566 -105.66659
3: 45.14567 -105.66658
4: 45.14565 -104.33330
5: 33.22220 -104.33330
6: 31.22122 -105.77777
7: 31.22123 -105.77776

比较的

test_updated
Latitude  Longitude
1 45.14565 -105.66660
2 45.14566 -105.66661
3 45.14567 -105.66662
4 45.14565 -104.33330
5 33.22220 -104.33330
6 31.22122 -105.77777
7 31.22123 -105.77778

差异是由于OP要求对经纬度值加上0.00001和OP期望的结果,其中0.00001已从经度值中减去。

编辑

为了再现预期的结果,必须考虑值的符号。不幸的是,对于sign(0),基Rsign()函数返回0。所以,我们用fifelse(x < 0, -1, 1)代替。

此外,我们可以借鉴Henrik的绝妙想法,使用rowid()函数来避免分组。

options(digits = 8) # required to print all significant digits of Longitude
library(data.table)
cols <- c("Latitude", "Longitude")
setDT(test)[, (cols) := lapply(.SD, (x) x + fifelse(x < 0, -1, 1) * 
(rowidv(.SD, cols) - 1) * 0.00001), .SDcols = cols]
test
Latitude  Longitude
1: 45.14565 -105.66660
2: 45.14566 -105.66661
3: 45.14567 -105.66662
4: 45.14565 -104.33330
5: 33.22220 -104.33330
6: 31.22122 -105.77777
7: 31.22123 -105.77778

像往常一样,不需要使用循环:

library(dplyr)
test_updated = test %>% 
mutate(
across(c(Latitude, Longitutde), 
function(x) if_else(x == lag(x), x+0.00001, x)
)
)
format(round(test_updated, 5), nsmall = 5)
Latitude Longitutde
1 45.14566 -105.66659
2 45.14566 -105.66659
3 45.14566 -105.66659
4 45.14566 -104.33329
5 33.22221 -104.33329
6 31.22123 -105.77776
7 31.22123 -105.77776

不确定我是否理解正确,但也许是这样的?

rm(list=ls())
n <- nrow(test)
test_updated <- data.frame(Latitude = double(n),
Longitude = double(n))
add <- 0.00001
test_updated[1,] <- test[1,]
for (i in 2:nrow(test)){
if(test$Latitude[i-1] == test$Latitude[i] & test$Longitutde[i-1] == test$Longitutde[i]){
test_updated$Latitude[i] <- test$Latitude[i] + add
test_updated$Longitude[i] <- test$Longitutde[i] + add
} else{
test_updated[i,] <- test[i,]
}
}