嗨,我在R中处理日期-时间对象时遇到了一些问题。我有一个列实际上只是一个时间列,但当它被购买到R中时,它被读取为字符向量,但也带有随机日期。我处理这个问题的最初想法是首先使用lubridate::mdy_hms()
将时间列转换为日期时间对象,然后使用strftime()
或srtptime()
仅提取时间,但我的理解是strptime()
仅适用于字符向量,而strftime()
适用于PosiXct值。我使用mdy_hms()
函数将字段转换为PosiXct对象,然后尝试使用strftime()
仅提取时间,但实际上并不起作用。我得到这个错误:
Error in as.POSIXlt.default(x, tz = tz) :
do not know how to convert 'x' to class “POSIXlt”
我又尝试了一次,但使用了hms
包,但它无法识别我的时间向量中的全部值,并将所有值转换为NA
值。这就是为什么我想首先将该字符向量中的所有值转换为日期-时间值;扔掉";日期值,因为它不正确。这是我试图运行的代码:
library(tidyverse)
library(lubridate)
library(hms)
OM <- read_csv('OM_sightings-1948-2019.csv', na = c("", "NA", "<Null>")) %>%
#Rename the ID column to OM_ID --> indicates that this entry came from OM database
rename(OM_ID = ID, Time = Time1, OM_Source = Source) %>%
# Drop the Time2, column
select(-Time2) %>%
# Convert 0.0 to NA in both the ActLat and ActLong columns
mutate_at(vars(ActLat, ActLong), na_if, y = 0)
#Fix the date and time objects in the db
OM_time <- OM %>%
mutate(Time = as_hms(Time),
SightDate = as.Date(mdy_hms(SightDate), tz = "US/Pacific"),
SightDateTime = mdy_hms(paste(SightDate, Time1), tz = "US/Pacific"))
它运行良好,我可以看到我的";突变的";df,然而,我收到了这些警告消息,我理解(大部分(,但不确定如何进行故障排除:
Warning messages:
1: Problem with `mutate()` input `Time`.
ℹ Lossy cast from <character> to <hms> at position(s) 58, 60, 61, 62, 63, ... (and 102131 more)
ℹ Input `Time` is `as_hms(Time)`.
2: Lossy cast from <character> to <hms> at position(s) 58, 60, 61, 62, 63, ... (and 102131 more)
3: Problem with `mutate()` input `SightDateTime`.
ℹ All formats failed to parse. No formats found.
ℹ Input `SightDateTime` is `mdy_hm(SightDate, tz = "US/Pacific") + Time`.
4: All formats failed to parse. No formats found.
这是我的数据的样本子集:
structure(list(OM_ID = c(94079, 75473, 95592, 50725, 24689, 73538,
10246, 107438, 10129, 74301, 107371, 63757, 43427, 93087, 16374,
28869, 38644, 42348, 89933, 83809, 53855, 96622, 52702, 28263,
991), SightDate = c("4/22/2015 0:00:00", "7/15/2011 0:00:00",
"6/30/2015 0:00:00", "6/26/2007 0:00:00", "8/12/2000 0:00:00",
"6/11/2011 0:00:00", "6/28/1990 0:00:00", "12/7/2018 0:00:00",
"6/20/1990 0:00:00", "6/26/2011 0:00:00", "12/5/2018 0:00:00",
"9/1/2009 0:00:00", "8/27/2005 0:00:00", "11/14/2014 0:00:00",
"6/11/1997 0:00:00", "9/10/2001 0:00:00", "9/8/2004 0:00:00",
"7/18/2005 0:00:00", "6/25/2014 0:00:00", "8/6/2012 0:00:00",
"5/16/2008 0:00:00", "7/25/2015 0:00:00", "9/10/2007 0:00:00",
"8/16/2001 0:00:00", "1/6/1977 0:00:00"), Time = c("12/30/1899 14:00:00",
"12/30/1899 15:00:00", "12/30/1899 19:21:00", "12/30/1899 9:30:00",
"12/30/1899 9:30:00", "12/30/1899 12:00:00", "12/30/1899 18:30:00",
"12/30/1899 13:00:00", "12/30/1899 18:00:00", "12/30/1899 11:52:00",
"12/30/1899 9:15:00", "12/30/1899 15:33:00", "12/30/1899 9:00:00",
"12/30/1899 13:48:00", "12/30/1899 15:00:00", "12/30/1899 5:45:00",
NA, "12/30/1899 16:15:00", "12/30/1899 12:30:00", NA, "12/30/1899 12:00:00",
"12/30/1899 13:00:00", "12/30/1899 12:30:00", "12/30/1899 8:45:00",
"12/30/1899 14:15:00"), Month = c(4, 7, 6, 6, 8, 6, 6, 12, 6,
6, 12, 9, 8, 11, 6, 9, 9, 7, 6, 8, 5, 7, 9, 8, 1), Day = c(22,
15, 30, 26, 12, 11, 28, 7, 20, 26, 5, 1, 27, 14, 11, 10, 8, 18,
25, 6, 16, 25, 10, 16, 6), Year = c(2015, 2011, 2015, 2007, 2000,
2011, 1990, 2018, 1990, 2011, 2018, 2009, 2005, 2014, 1997, 2001,
2004, 2005, 2014, 2012, 2008, 2015, 2007, 2001, 1977), Pod = c("Orcas",
"JpLp", "JK", "Orcas", "L", "J", "Orcas", "J", "J", "JK", "J",
"L12s", "Orcas", "J", "Orcas", "Orcas", "JKL", "J", "J", "J",
"J", "JKL", "JL", "JL", "Orcas"), LikelyPod = c("Ts", "JKLp",
"JpKp", NA, NA, "JL53", NA, NA, NA, "JpKp", NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, "SRs"), Direction = c(NA,
"mill", NA, NA, "E", "SE", "N", "N", "W", NA, "N", "NW", "N",
NA, "N", NA, "N", "N", "N", "N", "N", "SW", NA, "N", "N"), FishArea = c("17C",
"7", "7", "7", "19C", "7", "7", "9", "18C", "18C", "11", "7",
"7", "9", "7", "7", "7", "18C", "7", "7", "18C", "18", "29",
"7", "10"), Quadrant = c(89, 184, 181, 184, 257, 181, 185, 397,
151, 152, 420, 185, 181, 387, 169, 181, 181, 162, 176, 170, 163,
151, 80, 176, 413), Lat = c(48.96, 48.46, 48.5, 48.46, 48.31,
48.5, 48.44, 47.9, 48.76, 48.74, 47.33, 48.44, 48.5, 48.12, 48.62,
48.5, 48.5, 48.74, 48.56, 48.65, 48.71, 48.76, 49.01, 48.56,
47.55), Long = c(-123.73, -123.1, -123.17, -123.1, -123.36, -123.17,
-123.03, -122.46, -123.02, -123.08, -122.44, -123.03, -123.17,
-122.71, -123.17, -123.17, -123.17, -123.3, -123.21, -123.24,
-123.26, -123.02, -123.16, -123.21, -122.41), UTMx = c(446800,
492000, 487000, 492000, 473400, 487000, 497400, 539100, 497800,
493500, 540300, 497400, 487000, 520500, 486900, 487000, 487000,
477600, 484200, 482300, 480700, 497800, 488100, 484200, 542200
), UTMy = c(5423900, 5367800, 5372600, 5367800, 5351700, 5372600,
5365800, 5305200, 5401200, 5399200, 5242800, 5365800, 5372600,
5329700, 5386000, 5372600, 5372600, 5399000, 5378600, 5389300,
5395300, 5401200, 5428700, 5378600, 5266600), OM_Source = c("TWM-SA-Pub",
"TWM-SW", "TWM-HYD-Rel", "TWM-Pager", "TWM-Pager", "TWM-SW",
"TWM-SA-Rel", "TWM-SA-Rel", "TWM-SA-Rel", "SPOT", "TWM-SA-Pub",
"SPOT", "TWM-Pager", "TWM-HYD-Rel", "TWM-Pager", "TWM-SA-Pub",
"TWM-SA-Rel", "TWM-Pager", "TWM-SW", "BCCSN", "TWM-SW", "Soundwatch",
"BCCSN", "TWM-Pager", "TWM-SA-Rel"), ActLat = c(NA, 48.452, NA,
NA, NA, 48.488, NA, NA, NA, 48.7667, NA, 48.4585, NA, NA, NA,
NA, NA, NA, 48.5385, 48.682, 48.738, 48.7876, 49.0108, NA, NA
), ActLong = c(NA, -123.0777, NA, NA, NA, -123.1233, NA, NA,
NA, -123.0776, NA, -123.065, NA, NA, NA, NA, NA, NA, -123.1725,
-123.251, -123.253, -123.0389, -123.1659, NA, NA)), row.names = c(NA,
-25L), class = c("tbl_df", "tbl", "data.frame"))
我感谢任何帮助!我是一个新手/自学成才的人,我很难理解处理时间数据的最佳方法。
日期和时间在R中可能非常棘手,因为它有很多不同的表示方式。lubridate
没有一个只是时间的类,但相关的包hms
添加了一个时间类,可以让它变得更容易。
我经常发现同时有Date和Date/Time列很有帮助,这样你就可以使用最有意义的部分
对于示例数据,使用lubridate::mdy_hms(SighDate(将其转换为午夜的日期。hms::as_hms(Time)
对我很有用。然后将两者添加到date_time列中创建,如果需要分析,则将date转换为日期类。下面我为时间创建了一个新的列,以跟踪它在做什么,但倾向于替换时间列。
library(lubridate)
library(hms)
library(tidyverse)
# OM = your data provided above
OM2 <- OM %>%
mutate(SightDate = mdy_hms(SightDate, tz = "America/Los_Angeles"),
Time2 = as_hms(mdy_hms(Time)),
Date_time = SightDate+Time2,
SightDate = as.Date(SightDate)
)
OM2 %>%
select(OM_ID, SightDate, Time, Time2, Date_time, everything())
# # A tibble: 25 x 20
# OM_ID SightDate Time Time2 Date_time Month Day Year Pod LikelyPod Direction FishArea Quadrant Lat
# <dbl> <date> <chr> <tim> <dttm> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl>
# 1 94079 2015-04-22 12/3~ 14:00 2015-04-22 14:00:00 4 22 2015 Orcas Ts NA 17C 89 49.0
# 2 75473 2011-07-15 12/3~ 15:00 2011-07-15 15:00:00 7 15 2011 JpLp JKLp mill 7 184 48.5
# 3 95592 2015-06-30 12/3~ 19:21 2015-06-30 19:21:00 6 30 2015 JK JpKp NA 7 181 48.5
# 4 50725 2007-06-26 12/3~ 09:30 2007-06-26 09:30:00 6 26 2007 Orcas NA NA 7 184 48.5
# 5 24689 2000-08-12 12/3~ 09:30 2000-08-12 09:30:00 8 12 2000 L NA E 19C 257 48.3
# 6 73538 2011-06-11 12/3~ 12:00 2011-06-11 12:00:00 6 11 2011 J JL53 SE 7 181 48.5
# 7 10246 1990-06-28 12/3~ 18:30 1990-06-28 18:30:00 6 28 1990 Orcas NA N 7 185 48.4
# 8 107438 2018-12-07 12/3~ 13:00 2018-12-07 13:00:00 12 7 2018 J NA N 9 397 47.9
# 9 10129 1990-06-20 12/3~ 18:00 1990-06-20 18:00:00 6 20 1990 J NA W 18C 151 48.8
# 10 74301 2011-06-26 12/3~ 11:52 2011-06-26 11:52:00 6 26 2011 JK JpKp NA 18C 152 48.7
# # ... with 15 more rows, and 6 more variables: Long <dbl>, UTMx <dbl>, UTMy <dbl>, OM_Source <chr>, ActLat <dbl>,
# # ActLong <dbl>
我很确定Access将日期和时间表示为数值,表示自某个来源以来的时间量。看起来使用的原点是1899年12月30日,它显示在您的"时间"列中。当您从Access导出到csv时,也可以查看您的设置。我更熟悉excel,但我认为他们设置得很相似,这样你就可以在导入之前操作格式。