我正在尝试使用最高温度、降水和月份来正确地模拟最低温度。我知道有很多关于如何在线性模型中使用因子的问题,但老实说,似乎没有一个能回答我的问题。R处理和使用虚拟变量的方式让我很困惑。以下是我的数据的一个小样本,代码如下:
data <- structure(list(month = c(5, 6, 9, 8, 9, 9, 10, 10, 1, 3, 6, 4,
11, 1, 3, 12, 8, 5, 12, 3, 10, 12, 9, 1, 1, 10, 12, 4, 7, 7,
11, 8, 10, 3, 7, 1, 3, 9, 10, 11, 5, 1, 7, 10, 9, 11, 7, 4, 6,
12, 10, 11, 11, 7, 5, 7, 5, 1, 6, 6, 5, 1, 1, 5, 5, 11, 12, 6,
10, 6, 2, 6, 4, 11, 9, 6, 11, 3, 8, 12, 6, 2, 6, 3, 10, 9, 4,
4, 5, 11, 11, 11, 1, 8, 4, 4, 10, 12, 9, 8), tmax = c(54, 84,
74, 82, 63, 87, 68, 59, -4, 17, 69, 42, 46, 29, 38, 42, 95, 67,
22, 48, 50, 34, 74, 40, 1, 71, 49, 32, 89, 74, 56, 92, 69, 23,
86, 49, 47, 84, 48, 73, 62, 8, 83, 60, 69, 17, 90, 69, 77, 37,
55, 43, 38, 93, 52, 84, 73, 35, 75, 83, 53, 33, 33, 81, 68, 55,
31, 98, 72, 80, 13, 85, 71, 48, 68, 85, 53, 48, 92, 4, 61, 34,
89, 62, 50, 62, 73, 63, 63, 33, 31, 57, 7, 72, 45, 64, 63, 31,
65, 85), tmin = c(0.04, 0.21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0.01, 0, 0, 0, 0.14, 0.18, NA, 0.13, 0, 0.15, NA, 0.02, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0.5, 0, 0, 0, 0.38, 0, 0, 0, 0.01, 0, 0.42,
NA, 0, NA, 0, NA, 0, 0, 0, 0, 0, 0, 0.25, 0, 0, 0.84, 0.03, 0,
0, 0, 0, 0, 0, 0, 0.01, 0, NA, 0.26, 0, 0, 0, 0.32, 0, 0, 0,
0, 0.2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0, 0, 0, NA,
0.02, 0), precip = c(0.04, 0.21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0.01, 0, 0, 0, 0.14, 0.18, NA, 0.13, 0, 0.15, NA, 0.02, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0, 0, 0, 0.38, 0, 0, 0, 0.01, 0,
0.42, NA, 0, NA, 0, NA, 0, 0, 0, 0, 0, 0, 0.25, 0, 0, 0.84, 0.03,
0, 0, 0, 0, 0, 0, 0, 0.01, 0, NA, 0.26, 0, 0, 0, 0.32, 0, 0,
0, 0, 0.2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0, 0, 0,
NA, 0.02, 0)), row.names = c(11604L, 32822L, 32919L, 35089L,
40958L, 3690L, 34052L, 19787L, 26818L, 14839L, 21143L, 32761L,
14364L, 14043L, 30552L, 30077L, 5846L, 2486L, 25352L, 13369L,
21268L, 6355L, 16844L, 26847L, 35593L, 20523L, 10359L, 9379L,
6200L, 26647L, 23129L, 19388L, 38057L, 12637L, 42724L, 15875L,
1314L, 7352L, 34397L, 12146L, 27310L, 20622L, 8026L, 12121L,
26709L, 7409L, 1091L, 11587L, 23699L, 31917L, 14328L, 19458L,
10322L, 351L, 43747L, 23350L, 31329L, 8939L, 42693L, 34279L,
18541L, 25011L, 37791L, 17834L, 2845L, 12519L, 19848L, 3978L,
5907L, 28075L, 15177L, 3616L, 32037L, 9955L, 1498L, 17858L, 10700L,
27624L, 4768L, 24624L, 20036L, 5683L, 43408L, 37485L, 21255L,
15747L, 15234L, 7933L, 27690L, 24227L, 17286L, 30781L, 2358L,
9885L, 28380L, 35327L, 8851L, 14743L, 37314L, 8057L), class = "data.frame")
如果使用以下代码,则输出中缺少1月份(下面的输出使用了包含42000行的整个数据集)。这是否意味着截距代表一月份?
tmin_model <- lm(data$tmin ~ data$tmax + data$precip + as.factor(data$month))
Call:
lm(formula = data$tmin ~ data$tmax + data$precip + as.factor(data$month))
Residuals:
Min 1Q Median 3Q Max
-41.663 -4.827 0.182 5.110 22.489
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -13.524700 0.148019 -91.371 < 2e-16 ***
data$tmax 0.674834 0.003098 217.837 < 2e-16 ***
data$precip 6.671204 0.164683 40.509 < 2e-16 ***
as.factor(data$month)2 1.090986 0.187072 5.832 5.52e-09 ***
as.factor(data$month)3 5.868886 0.189904 30.904 < 2e-16 ***
as.factor(data$month)4 7.325417 0.209629 34.945 < 2e-16 ***
as.factor(data$month)5 10.453276 0.230197 45.410 < 2e-16 ***
as.factor(data$month)6 14.364899 0.250073 57.443 < 2e-16 ***
as.factor(data$month)7 15.382325 0.260707 59.002 < 2e-16 ***
as.factor(data$month)8 14.269489 0.256420 55.649 < 2e-16 ***
as.factor(data$month)9 10.729316 0.238739 44.942 < 2e-16 ***
as.factor(data$month)10 7.209093 0.214178 33.659 < 2e-16 ***
as.factor(data$month)11 5.950449 0.192669 30.884 < 2e-16 ***
as.factor(data$month)12 2.752499 0.183948 14.963 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.286 on 39784 degrees of freedom
(4411 observations deleted due to missingness)
Multiple R-squared: 0.8929, Adjusted R-squared: 0.8929
F-statistic: 2.553e+04 on 13 and 39784 DF, p-value: < 2.2e-16
我需要创建"dummy"变量的每个月做这个正确吗?同样,我如何用几个数据点来做一个"predict"
?当我想要的只是使用模型返回的几个数据点时,我总是得到完整的42000行。例如,对于一月份的一个点,为什么下面的代码返回42000行?
predict.lm(tmin_model, newdata = data.frame(tmax = rnorm(1, 20, 13), month = 1, precip = 0, tmin = NA))
谢谢。
构建模型
data$month <- factor(data$month)
tmin_model <- lm(tmin ~tmax + precip + month, data = data)
只返回一行
predict.lm(tmin_model, newdata =
data.frame(tmax = rnorm(1, 20, 13), month = factor(1), precip = 0, tmin = NA))
1
-7.905385e-18