我有以下数据:
# A tibble: 7,971 x 10
symbol date open high low close volume adjusted start_date end_date
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <date> <date>
1 AAPL 2009-01-02 12.3 13.0 12.2 13.0 186503800 11.4 2009-07-31 2010-06-30
2 AAPL 2009-01-05 13.3 13.7 13.2 13.5 295402100 11.8 2009-07-31 2010-06-30
3 AAPL 2009-01-06 13.7 13.9 13.2 13.3 322327600 11.6 2009-07-31 2010-06-30
4 AAPL 2009-01-07 13.1 13.2 12.9 13.0 188262200 11.4 2009-07-31 2010-06-30
5 AAPL 2009-01-08 12.9 13.3 12.9 13.2 168375200 11.6 2009-07-31 2010-06-30
6 AAPL 2009-01-09 13.3 13.3 12.9 12.9 136711400 11.3 2009-07-31 2010-06-30
7 AAPL 2009-01-12 12.9 13.0 12.5 12.7 154429100 11.1 2009-07-31 2010-06-30
8 AAPL 2009-01-13 12.6 12.8 12.3 12.5 199599400 11.0 2009-07-31 2010-06-30
9 AAPL 2009-01-14 12.3 12.5 12.1 12.2 255416000 10.7 2009-07-31 2010-06-30
10 AAPL 2009-01-15 11.5 12.0 11.4 11.9 457908500 10.4 2009-07-31 2010-06-30
我正在尝试按符号、开始日期和结束日期进行分组,并取开始日期的第一个观察和结束日期的最后一个观察之间的差异。我似乎无法让它工作。
即取开始日期的"关闭"和结束日期的"关闭"之差。
任何帮助都会很棒,谢谢!
syms <- c("AAPL", "MSFT", "GOOG")
library(tidyquant)
data <- tq_get(syms)
data <- data %>%
mutate( start_date = paste(year(date %m+% months(6)), "07", "31", sep = "-"), # note this is the start_date for when we calculate the returns - we will have bought this portfolio on the 1st July but we get returns on the 31st
end_date = paste(year(date %m+% months(18)), "06", "30", sep = "-"),
start_date = as.Date(start_date),
end_date = as.Date(end_date))
我的尝试...
data %>%
group_by(symbol, start_date, end_date) %>%
summarise(diff = diff(close))
编辑:
我正在尝试按symbol
分组,然后采取start_date
- end_date
.因此,首先,我应该按符号分组并将date
列过滤到start_date
和end_date
值之间。即我只对start_date
天和end_date
天的"收盘价"感兴趣(这是固定的(。然后,只需取start_date
和end_date
的收盘价之间的差额。所以大多数股价数据在这里都是无用的,我只对start_date
的收盘价感兴趣,然后end_date
取这两个值之间的差值。
我认为您要查找的是减去first
并为每个组last
close
值
library(dplyr)
data %>%
group_by(symbol, start_date, end_date) %>%
summarise(diff = first(close) - last(close))
# symbol start_date end_date diff
# <chr> <date> <date> <dbl>
# 1 AAPL 2009-07-31 2010-06-30 -7.38
# 2 AAPL 2010-07-31 2011-06-30 -15.5
# 3 AAPL 2011-07-31 2012-06-30 -12.5
# 4 AAPL 2012-07-31 2013-06-30 -34.4
# 5 AAPL 2013-07-31 2014-06-30 28.0
# 6 AAPL 2014-07-31 2015-06-30 -34.5
# 7 AAPL 2015-07-31 2016-06-30 -31.9
# 8 AAPL 2016-07-31 2017-06-30 31
# 9 AAPL 2017-07-31 2018-06-30 -48.1
#10 AAPL 2018-07-31 2019-06-30 -41.6
# … with 26 more rows
另一种写法可能是
data %>%
group_by(symbol, start_date, end_date) %>%
summarise(diff = close[1L] - close[n()])
或者也可以使用基本 R aggregate
aggregate(close~symbol +start_date + end_date,data,function(x) x[1L] - x[length(x)])
你可以采用这种方法...
# create df of unique symbol, start, and end date combos
df1 <- df %>% distinct(symbol,start_date,end_date)
# join original data that match the desired start/end dates
df1 <- df %>% select(start_close=close,symbol,start_date=date) %>% left_join(df1,.)
df1 <- df %>% select(end_close=close,symbol,end_date=date) %>% left_join(df1,.)
# find difference in close values
df1 %>% mutate(diff=end_close - start_close)
# A tibble: 36 x 6
symbol start_date end_date start_close end_close diff
<chr> <date> <date> <dbl> <dbl> <dbl>
1 AAPL 2009-07-31 2010-06-30 23.3 35.9 12.6
2 AAPL 2010-07-31 2011-06-30 NA 48.0 NA
3 AAPL 2011-07-31 2012-06-30 NA NA NA
4 AAPL 2012-07-31 2013-06-30 87.3 NA NA
5 AAPL 2013-07-31 2014-06-30 64.6 92.9 28.3
6 AAPL 2014-07-31 2015-06-30 95.6 125. 29.8
7 AAPL 2015-07-31 2016-06-30 121. 95.6 -25.7
8 AAPL 2016-07-31 2017-06-30 NA 144. NA
9 AAPL 2017-07-31 2018-06-30 149. NA NA
10 AAPL 2018-07-31 2019-06-30 190. NA NA
# ... with 26 more rows
有NA
,因为并非每个开始/结束日期都在原始日期列中。