r-逐行分析二进制数据集的1之后的值



我想计算一个人是否从一年存活到下一年。0表示它死了,1表示它活了下来。数据集由不同年份(2007年至2020年(组成,计算应从2008年开始。我只希望R使用我所拥有的数据的一部分。

我的数据集如下所示:

我的数据集的前17行

> ID 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
1   0    1    0    0    0    0    0    0    0    0    0     0   0   0  
3   0    1    1    1    0    0    0    0    0    0    0     0   0   0 
4   0    1    1    1    0    0    0    0    0    0    0     0   0   0
9   0    1    0    0    0    0    0    0    0    0    0     0   0   0
24  0    0    1    1    1    1    1    1    1    1    1     1   1   0
...

我总共有1121个条目,共有16列。

我希望R在2008年的第一排开始,看看是否有1。如果有1,我希望R查看下一列(2009(,看看是否也有1(应该给我1作为输出(或0(应该给我们0作为输出(。如果没有1,我希望R检查下一列,直到它找到一个有1的年份,那么它应该如上所述检查下一个列。在它找到1并进行检查后,它应该忽略剩余的列,并移动到下一行并重复该过程。输出应该保存在一个新列中。

我尝试了循环和if-else语句以及if-else,if。。。

我最接近我的目标是使用以下代码

for(x in foal_fates_2)) {
if (foal_fates_2$`2008`=="1" && foal_fates_2$`2009` =="1") {
print("1")
} else if (foal_fates_2$`2008`== "1" && foal_fates_2$`2009` =="0") {
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="1" && foal_fates_2$`2010` == "1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="1" && foal_fates_2$`2010`== "0") {
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="1" && 
foal_fates_2$`2011`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="1" && 
foal_fates_2$`2011`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="1" && foal_fates_2$`2012`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="1" && foal_fates_2$`2012`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="1" && foal_fates_2$`2013`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="1" && foal_fates_2$`2013`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="1" &&
foal_fates_2$`2014`== "1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="1" &&
foal_fates_2$`2014`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "1" && foal_fates_2$`2015`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "1" && foal_fates_2$`2015`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="1" && foal_fates_2$`2016` =="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="1" && foal_fates_2$`2016` =="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="1" &&
foal_fates_2$`2017`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="1" &&
foal_fates_2$`2017`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="1" && foal_fates_2$`2018`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="1" && foal_fates_2$`2018`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="1" && foal_fates_2$`2019`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="1" && foal_fates_2$`2019`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="0" && foal_fates_2$`2019`=="1" &&
foal_fates_2$`2020`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="0" && foal_fates_2$`2019`=="1" &&
foal_fates_2$`2020`=="0"){
print("0")
} 

}

有了这个代码,R至少做了一些事情,结果有正确数量的实体,但输出是不正确的。R给我0和1,但不是在正确的位置。意味着例如对于前五行R给了我结果"0";0"0"0"1〃"0";但它应该是";0"1〃"1〃"1〃"0〃;。至少如果我理解正确的话。我是R的新手,所以也许循环和其他工具不是我想做的事情的合适工具。所以,问题是我如何才能达到我的目标。如果有任何帮助,我将不胜感激。

我会编写一个函数应用于每一行。类似以下内容(当然可以更详细,但应该可以完成任务(:

numberAfterFirstOne <- function(myRow){
x <- which(myRow == 1)[1] 
if (length(x + 1) < length(myRow)) # 
return(myRow[x + 1])
else 
return(NA)
}

说明:

  1. 哪些指数等于一,只需选择第一个;如果none为1,则x将为NA
  2. 如果在第一个值之后有一个值,则返回
  3. return NA(也可以是0或您希望的任何"键值">

对于测试,这里有一个示例数据集:

n <- 5
m <- 16
set.seed(1562) # for reproducability
dataset <- as.data.frame(matrix(ncol = m, nrow = n, data = round(runif(m * n, 0, 0.7))))
dataset <- rbind(dataset, rep(0, 16))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1  1  0  0  1  0  0  0  1  0   1   0   1   0   0   1   0
2  1  1  0  0  0  1  1  0  0   1   1   0   0   0   1   0
3  0  0  0  0  0  1  0  0  0   0   0   1   0   0   0   1
4  1  0  0  0  0  0  0  0  0   1   0   0   1   0   1   0
5  0  1  1  0  0  1  0  1  0   1   0   1   0   0   1   0
6  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0

然后apply—每行上的函数numberAfterFirstOne(apply类似于for循环,但更便于写入和读取(。

apply(dataset, 1, numberAfterFirstOne)
[1]  0  1  0  0  1 NA

这类似于带有for循环的更结块的构造:

result <- c()
for (i in 1:nrow(dataset)){
result[i] <- numberAfterFirstOne(dataset[i, ])
}

您现在可以调整函数以返回您想要的内容。目前可能会返回0、1或NA,也许你只想要1和0或1和NA。不需要使用if (length(x+1))进行检查,因为如果索引不跳动,则由myRow[x+1]返回NA,这将使函数更加简单。

您也可以修改代码,以便也返回年份:

colnames(dataset) <- 2007:2020 # name the columns of the example dataset
numberAfterFirstOne <- function(myRow){
x <- which(myRow == 1)[1]
return(c(x, myRow[x + 1])) # return the column index + the value
}
result <- apply(dataset, 1, numberAfterFirstOne) #save the result
result[1, ] <- names(dataset)[result[1, ]] # set column index to name of dataset column
[,1]   [,2]   [,3]   [,4]   [,5]   [,6]
[1,] "2007" "2007" "2012" "2007" "2008" NA  
[2,] "0"    "1"    "0"    "0"    "1"    NA  

最新更新