我有多个县的人口数据,并希望最大程度地减少每个县的重复拟合逻辑增长曲线。
county year pop
lake 1970 69305
lake 1980 104870
lake 1990 152104
lake 2000 210528
lake 2010 297052
marion 1970 69030
marion 1980 122488
marion 1990 194833
marion 2000 258916
marion 2010 331298
seminole 1970 83692
seminole 1980 179752
seminole 1990 287529
seminole 2000 365196
seminole 2010 422718
目前我正在为每个县组成:
lake<-countypop[1:5,2:3]
colnames(lake)<-c("year", "pop")
marion<-countypop[6:10,2:3]
colnames(marion)<-c("year", "pop")
seminole<-countypop[11:15,2:3]
然后使用sslogis并绘制每个县的曲线,例如:
lake.model <- nls(pop ~ SSlogis(year, phi1, phi2, phi3, data = lake)))
alpha <- coef(lake.model)
plot(pop ~ year, data = lake, main = "Logistic Growth Model of Lake County",
xlab = "Year", ylab = "Population", xlim = c(1920, 2030),ylim=c(0,1000000))
curve(alpha[1]/(1 + exp(-(x - alpha[2])/alpha[3])), add = T, col = "blue")
我有大约60个县,我知道必须有一种更干净的方法来做到这一点。如何使用应用功能,循环或其他内容来消除代码中的重复?
尝试以下:
pdf("countypop.pdf")
models <- by(countypop, countypop$county, function(x) {
fm <- nls(pop ~ SSlogis(year, phi1, phi2, phi3), data = x)
plot(pop ~ year, x, main = county[1])
lines(fitted(fm) ~ year, x)
fm
})
dev.off()
注意:我们将其用作输入:
countypop <-
structure(list(county = c("lake", "lake", "lake", "lake", "lake",
"marion", "marion", "marion", "marion", "marion", "seminole",
"seminole", "seminole", "seminole", "seminole"), year = c(1970L,
1980L, 1990L, 2000L, 2010L, 1970L, 1980L, 1990L, 2000L, 2010L,
1970L, 1980L, 1990L, 2000L, 2010L), pop = c(69305L, 104870L,
152104L, 210528L, 297052L, 69030L, 122488L, 194833L, 258916L,
331298L, 83692L, 179752L, 287529L, 365196L, 422718L)), .Names = c("county",
"year", "pop"), class = "data.frame", row.names = c(NA, -15L))