下面是Python代码,它从url读取CSV并隔离"股票代码符号"。然后将其转换为列表。我是一个全新的R,我希望有一个简单,快速的方法来转换这个python代码到R之前,我太深入地弄清楚自己。
# Read contents of csv link into string variable
cboe_csv_link = 'https://www.cboe.com/available_weeklys/get_csv_download/'
output = requests.get(cboe_csv_link).text
# Find number of rows before string
find_str = "Available Weeklys - Exchange Traded Products (ETFs and ETNs)"
# Find index of search string in output
idx = output.find(find_str)
# Count number of newlines until search string is encountered
skiprows_val = output[:idx+len(find_str)].count("n")
# Filter out rows and columns to isolate ticker symbols
cboe_csv = pd.read_csv(cboe_csv_link, skiprows=skiprows_val, usecols=[0], header=None)
tickers_df = cboe_csv[(cboe_csv[0] != 'Available Weeklys - Exchange Traded Products (ETFs and ETNs)')
& (cboe_csv[0] != 'Available Weeklys - Equity')]
# Convert dataframe column to list
tickers = tickers_df[0].tolist()
这里有一个可能的方法来解决你的问题:
library(magrittr)
tickers = readLines("https://www.cboe.com/available_weeklys/get_csv_download/") %>%
gsub(pattern='"', replacement="") %>%
subset(nzchar(.) & !grepl("Available Weekly|\d+/\d+/\d+", .)) %>%
sub(pattern="([A-Z]+).+", replacement="\1")
# [1] "AMLP" "ARKF" "ARKG" "ARKK" "ASHR" "BRZU" "DIA" "DUST" "EEM"
# [10] "EFA" "EMB" "ERX" "EWH" "EWJ" "EWU" "EWW" "EWY" "EWZ"
# [19] "FAS" "FAZ" "FEZ" "FXE" "FXI" "FXY" "GDX" "GDXJ" "GLD"
# [28] "HYG" "IAU" "IBB" "ICLN" "IEF" "INDA" "ITB" "IVV" "IWF"
# [37] "IWM" "IYR" "JDST" "JETS" "JNK" "JNUG" "KRE" "KWEB" "LABD"
# ...
不是你的Python代码的翻译,但希望是一个公平的解释。
cboe_csv_link <- "https://www.cboe.com/available_weeklys/get_csv_download/"
rr <- readLines(cboe_csv_link)
ss <- c(grep("Available Weeklys", rr), length(rr))
l <- list()
for (i in 1:(length(ss)-1)) {
l[[i]] <- read.csv(text=rr[(ss[i]+1):(ss[i+1]-1)], header=FALSE)
}
names(l) <- rr[head(ss, -1)]
lapply(l, head)
# $`Available Weeklys - Exchange Traded Products (ETFs and ETNs)`
# V1 V2
# 1 AMLP ALPS ETF TR ALERIAN MLP
# 2 ARKF ARK ETF TR FINTECH INNOVA
# 3 ARKG ARK ETF TR GENOMIC REV ETF
# 4 ARKK ARK ETF TR INNOVATION ETF
# 5 ASHR DBX ETF TR XTRACK HRVST CSI
# 6 BRZU DIREXION SHS ETF TR BRZ BL 2X SHS
#
# $`Available Weeklys - Equity`
# V1 V2
# 1 AA ALCOA CORP COM
# 2 AAL AMERICAN AIRLS GROUP INC COM
# 3 AAOI APPLIED OPTOELECTRONICS INC COM
# 4 AAPL APPLE INC COM
# 5 ABBV ABBVIE INC COM
# 6 ABC AMERISOURCEBERGEN CORP COM
这里有一个稍微不同的方法。首先,我们将数据下载到一个文件中,例如weeklysmf.csv
。
> url <- "https://www.cboe.com/available_weeklys/get_csv_download/"
> download.file(url, "weeklysmf.csv", quiet=TRUE)
>
然后,我们使用您对感兴趣的所有行都有两个用逗号分隔的字段。使用awk
调用过滤所有恰好有两个字段的行,使用,
作为字段分隔符:
$ awk -F, 'NF==2 {print $0}' weeklysmf.csv |head
"AMLP","ALPS ETF TR ALERIAN MLP"
"ARKF","ARK ETF TR FINTECH INNOVA"
"ARKG","ARK ETF TR GENOMIC REV ETF"
"ARKK","ARK ETF TR INNOVATION ETF"
"ASHR","DBX ETF TR XTRACK HRVST CSI"
"BRZU","DIREXION SHS ETF TR BRZ BL 2X SHS"
"DIA","SPDR DOW JONES INDL AVERAGE ET UT SER 1"
"DUST","DIREXION SHS ETF TR DAILY GOLD MINER"
"EEM","ISHARES TR MSCI EMG MKT ETF"
"EFA","ISHARES TR MSCI EAFE ETF"
$
我们可以在R中的许多csv阅读器中使用它,这些阅读器可以从命令中读取(因为R提供了一个连接接口,其中pipe()
是file()
和url()
的选项)。我输入data.table
,所以这里变成
> dat <- data.table::fread(cmd="awk -F, 'NF==2 {print $0}' weeklysmf.csv")
> dat
AMLP ALPS ETF TR ALERIAN MLP
1: ARKF ARK ETF TR FINTECH INNOVA
2: ARKG ARK ETF TR GENOMIC REV ETF
3: ARKK ARK ETF TR INNOVATION ETF
4: ASHR DBX ETF TR XTRACK HRVST CSI
5: BRZU DIREXION SHS ETF TR BRZ BL 2X SHS
---
611: YY JOYY INC ADS REPSTG COM A
612: Z ZILLOW GROUP INC CL C CAP STK
613: ZM ZOOM VIDEO COMMUNICATIONS INC CL A
614: ZNGA ZYNGA INC CL A
615: ZS ZSCALER INC COM
>
(如果您愿意,fread
也可以返回data.frame
,有一个选项)。