我正处于分析前整理数据的最后阶段,在删除数据表中的空白时遇到了一个我无法真正理解的问题。有关代码中步骤的描述,请参阅下面的完整代码。
从下一页开始(如何从字符串中删除所有空白?(,并尝试在其他页面中解决有关原子向量错误/警告的问题,但运气不佳。
在第6步,我收到了流动警告
In stri_replace_all_fixed(allData, " ", "") :
argument is not an atomic vector; coercing
在步骤7,以下警告
> #Change sold and taxed columes from character to numerical
> allData$SoldAmount <- as.numeric(allData$SoldAmount)
Warning message:
NAs introduced by coercion
> allData$Tax <- as.numeric(allData$Tax)
Warning message:
NAs introduced by coercion
第6步和第7步似乎都在运行,但结果在两个列中都是NA(见图(
删除空白后的结果
下面列出了完整的代码,我很想知道如何让第6步和第7步给我一些没有空格和数字的列。
#Step 1: Load needed library
library(tidyverse)
library(rvest)
library(jsonlite)
library(stringi)
#Step 2: Access the URL
url <- "https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/10/"
#Step 3: Direct JSON as format of data in URL
data <- jsonlite::fromJSON(url, flatten = TRUE)
#Step 4: Access all items in API
totalItems <- data$TotalNumberOfItems
#Step 5: Summarize all data from API
allData <- paste0('https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/', totalItems,'/') %>%
jsonlite::fromJSON(., flatten = TRUE) %>%
.[1] %>%
as.data.frame() %>%
rename_with(~str_replace(., "ListItems.", ""), everything())
#Step 6: removing colums not needed
allData <- allData[, -c(1,4,8,9,11,12,13,14,15)]
#Step 6: remove whitespace in all colums
stri_replace_all_fixed(allData, " ", "")
#Step 7: Change sold and taxed columes from character to numerical
allData$SoldAmount <- as.numeric(allData$SoldAmount)
allData$Tax <- as.numeric(allData$Tax)
您调用stri_replace_all_fixed(allData, " ", "")
,但忽略/放弃其输出保存到某个地方
#Step 6: remove whitespace in all colums
allData[] <- lapply(allData, gsub, pattern = " ", replacement = "")
#Step 7: Change sold and taxed columes from character to numerical
allData$SoldAmount <- as.numeric(allData$SoldAmount)
allData$Tax <- as.numeric(allData$Tax)
head(allData)
# County Municipality Tax SoldAmount Type Date
# 1 Akershus FROGN 2400000 2550000 Bolig 2004
# 2 Akershus FROGN 2225000 2100000 Bolig 2004
# 3 Akershus SKI 7600000 18000000 Næringstomt 2006
# 4 Østfold SARPSBORG 3000000 3815000 Tomt 2004
# 5 Østfold RYGGE 10000000 16000000 Næringseiendom 2006
# 6 Vestfold LARVIK 61950 61950 Tomt 2013
或者,只对您需要的列执行一次操作:
# allData <- paste0(...) %>% ...
allData <- allData[, -c(1,4,8,9,11,12,13,14,15)]
allData[c("Tax", "SoldAmount")] <- lapply(allData[c("Tax", "SoldAmount")], function(z) as.numeric(gsub(" ", "", z)))
head(allData)
# County Municipality Tax SoldAmount Type Date
# 1 Akershus FROGN 2400000 2550000 Bolig 2004
# 2 Akershus FROGN 2225000 2100000 Bolig 2004
# 3 Akershus SKI 7600000 18000000 Næringstomt 2006
# 4 Østfold SARPSBORG 3000000 3815000 Tomt 2004
# 5 Østfold RYGGE 10000000 16000000 Næringseiendom 2006
# 6 Vestfold LARVIK 61950 61950 Tomt 2013
只替换这两列的特殊性很重要,因为其他列中有很多值都有空格,我不知道你是否打算压缩它们:
str(sapply(allData, function(z) unique(grep(" ", z, value = TRUE)), simplify = FALSE))
# List of 6
# $ County : chr [1:2] "Møre og Romsdal" "Sogn- og fjordane"
# $ Municipality: chr [1:4] "EVJE OG HORNNES" "VESTRE TOTEN" "ØSTRE TOTEN" "NORDRE LAND"
# $ Tax : chr [1:414] " 2 400 000" " 2 225 000" " 7 600 000" " 3 000 000" ...
# $ SoldAmount : chr [1:538] " 2 550 000" " 2 100 000" " 18 000 000" " 3 815 000" ...
# $ Type : chr "Annen kategori"
# $ Date : chr(0)