Mandar从r中的这个Q/A中为我优雅地编写了这段代码,如何在其他向量的基础上评估这两个向量
names(df) <- c("a","b","c","d")
df_backup <- df
df$newcol <- NA
used <- c()
for (i in seq(1,length(df$a),1)){
print("######## Separator ########")
print(paste("searching right match that fits criteria for ",df$a[i],"in column 'a'",sep=""))
valuea <- df[i,1]
orderx <- order(abs(df$b-valuea))
index=1
while (is.na(df$newcol[i])) {
j=orderx[index]
if (df$b[j] %in% used){
print(paste("passing ",df$b[j], "as it has already been used",sep=""))
index=index+1
next
} else {
indexb <- j
valueb <- df$b[indexb]
print(paste("trying ",valueb,sep=""))
if (df$c[i] != df$d[indexb]) {
df$newcol[i] <- df$b[indexb]
print(paste("using ",valueb,sep=""))
used <- c(used,df$b[indexb])
} else {
df$newcol[i] <- NA
print(paste("cant use ",valueb,"as the column c (related to index in a) and d (related to index in b) values are matching",sep=""))
}
index=index+1
}
}
}
这就是我的数据
a b c d
12.9722051 297.9117268 1 1
69.64816997 298.1908749 2 2
318.8794557 169.0386352 3 3
326.1762208 169.3201391 4 4
137.5400592 336.6595313 5 5
358.0600171 94.70890334 6 6
258.9282428 94.77530919 7 7
98.57513917 290.1983195 8 8
98.46303072 290.4078981 9 9
17.2276417 344.383796 10 10
316.6442074 148.786547 11 11
310.7370168 153.3287735 12 12
237.3270752 107.8397117 13 13
250.6538555 108.0570571 14 14
337.0954288 180.6311769 15 15
137.0336521 1.0294907 16 16
301.2277242 185.2062845 17 17
332.935301 185.9792236 18 18
340.841266 220.4043846 19 19
a和b列中的值是罗盘方位。目前,该公式查看a列中的一个值,并将其与b列中的所有值进行比较,找到最接近的值。但我意识到我需要它做的是查看b列中的一个值,但不仅要根据绝对差找到最接近的值,还要考虑到它是一个指南针方位。例如:对于358.0600171的a列中的值,当前公式将返回344.383796的b列中的一个值,该值与358.060171相差约14度;然而,距离b列最近的实际轴承值应为1.0294907,与358.0600171仅相差3度。我想将一个函数合并到当前公式中,该函数可以解决这个指南针方位问题:它可以完成我所有其他需要的评估、筛选和列创建。
这里有一个类似的查询(查找指南针2度之间最接近的差异-Javascript),但我需要帮助了解该函数是否在R中工作,以及如何将其合并到现有公式中。
我们可以找到最近的指南针方位,如下所示:
nearest = function(i,df){
diff = abs(df[i, 1] - df[, 2])
diff = pmin(diff, 360-diff)
which.min(diff)
}
df$nearest_b = sapply(1:NROW(df), nearest, df[1:2])
df$nearest_a = sapply(1:NROW(df), nearest, df[2:1])
# a b nearest_b nearest_a
# 1 12.97221 297.911727 16 17
# 2 69.64817 298.190875 6 17
# 3 318.87946 169.038635 5 5
# 4 326.17622 169.320139 5 5
# 5 137.54006 336.659531 11 15
# 6 358.06002 94.708903 16 9
# 7 258.92824 94.775309 8 9
# 8 98.57514 290.198320 7 17
# 9 98.46303 290.407898 7 17
# 10 17.22764 344.383796 16 19
# 11 316.64421 148.786547 2 5
# 12 310.73702 153.328774 2 5
# 13 237.32708 107.839712 19 8
# 14 250.65386 108.057057 19 8
# 15 337.09543 180.631177 5 5
# 16 137.03365 1.029491 11 6
# 17 301.22772 185.206285 2 5
# 18 332.93530 185.979224 5 5
# 19 340.84127 220.404385 10 13
数据
df = read.table(text =
"a b c d
12.9722051 297.9117268 1 1
69.64816997 298.1908749 2 2
318.8794557 169.0386352 3 3
326.1762208 169.3201391 4 4
137.5400592 336.6595313 5 5
358.0600171 94.70890334 6 6
258.9282428 94.77530919 7 7
98.57513917 290.1983195 8 8
98.46303072 290.4078981 9 9
17.2276417 344.383796 10 10
316.6442074 148.786547 11 11
310.7370168 153.3287735 12 12
237.3270752 107.8397117 13 13
250.6538555 108.0570571 14 14
337.0954288 180.6311769 15 15
137.0336521 1.0294907 16 16
301.2277242 185.2062845 17 17
332.935301 185.9792236 18 18
340.841266 220.4043846 19 19",
header = T)[,1:2]
geosphere包有一些函数,其中包含了球面点之间的距离。
ftp://cran.r-project.org/pub/R/web/packages/geosphere/geosphere.pdf
我不确定这会是你想要的——你可能需要弄清楚数据中的指南针方位如何转化为你需要的输入。
我注意到你遇到的问题是,它将绝对距离视为数字差,而没有考虑到你应该在360时重置为0的事实。您可以通过编写一个函数来说明这一点,该函数规定求和(坐标和360之间的差)和(其他坐标)。
例如:-c1是输入坐标-c2是你将其与进行比较的坐标
if c1 - c2 > 180 { (360 - c1) + c2 }
我不完全理解你想做什么,所以不确定这是否正确,但希望它能有所帮助。
检查此代码:刚刚更改了使用ifelse定义orderx的方式现在,如果角度之间的abs差>180,则使用360差(较低的值);否则,如果差值已经<180使用较小的差异本身
setwd("~/Desktop/")
df <- read.table("trial.txt",header=T,sep="t")
names(df) <- c("a","B","C","D")
df_backup <- df
df$newcol <- NA
used <- c()
for (i in seq(1,length(df$a),1)){
print("######## Separator ########")
print(paste("searching right match that fits criteria for ",df$a[i],"in column 'a'",sep=""))
valueA <- df[i,1]
# orderx <- order(abs(df$B-valueA))
orderx <- order(ifelse(360-abs(df$B-valueA) > 180, abs(df$B-valueA) ,360-abs(df$B-valueA)))
index=1
while (is.na(df$newcol[i])) {
j=orderx[index]
if (df$B[j] %in% used){
print(paste("passing ",df$B[j], "as it has already been used",sep=""))
index=index+1
next
} else {
indexB <- j
valueB <- df$B[indexB]
print(paste("trying ",valueB,sep=""))
if (df$C[i] != df$D[indexB]) {
df$newcol[i] <- df$B[indexB]
print(paste("using ",valueB,sep=""))
used <- c(used,df$B[indexB])
} else {
df$newcol[i] <- NA
print(paste("cant use ",valueB,"as the column C (related to index in A) and D (related to index in B) values are matching",sep=""))
}
index=index+1
}
}
}