R中的OVER()函数是否使用sqldf而不使用RPostgreSQL



我正在处理一些敏感数据,所以我担心使用RPostgreSQL。我已经将所有必要的数据加载到R中的数据帧中。我正试图使用R中的sqldf()函数对数据运行查询。这些查询是几年前为Oracle SQL Developer编写的,因此我们试图避免完全重写脚本。能够重用预先编写的SQL脚本将为我们节省大量时间。当我们调用over()SQL函数时,脚本似乎会出错。我知道基本sqldf不支持over()函数。我已经读到over()函数与RPostgreSQL包配合使用,但这是否需要我将数据帧发送到外部数据库?根据我对RpostgreSQL的理解,您需要连接到PostgreSQL并创建一个新的数据库。我们无法将此数据发送到外部数据存储系统。有没有其他方法可以使用over()功能,同时将数据帧保留在我的电脑本地?

select program, importance_level, count( distinct subject_id )                      
from                        
(                       
select r.subject_id,                        
case                        
when  rc_level is not null  and  rc_level <> 'NA'                   
then  'bad_guy'
when  (rc_level is null  or  rc_level = 'NA') and
(substr( r.base_category, 2, 2 )  in  ( '5R', '8Q', '8P' )     
or r.process_name in ('On The Way'))    
then 'run_away'                     
when  (rc_level is null  or  rc_level = 'NA') and r.process_name =
'Fancy Order'                      
then 'repeater'                     
when  (rc_level is null  or  rc_level = 'NA') and
(a.current_program_code  in  ( 'BOP', 'IAS', 'LIS', 'SIS' )                        
or  method_code  in  ( 'SIP', 'POB' )                        
or  substr( r.base_category, 2, 2 )  in  ( '9F', '7G' ))                     
then  'NEWBIE'
else 'Other'                      
end                     
as  importance_level,       
case                        
when a.current_program_code in ('123', 'ABC', 'DEF', 'HIJ', 'KLM', 'NOP', 'QRS' ) then 'YAW'                        
when a.current_program_code in ( 'RE', 'FDS', 'QWE', 'WER', 'ERT','RTY','TYU' ) then 'PO'                       
when a.current_program_code in ( 'LEP' ) then 'MOM'                     
else a.current_program_code                     
end                     
as program                      
from FY16DATA r left join (select distinct * from (select subject_id, first_value(current_program_code) over (partition by subject_id order by start_date desc) as current_program_code, first_value(process_name) over (partition by subject_id order by start_date desc) as process_name, first_value(method_code) over (partition by subject_id order by start_date desc) as method_code, max(load_fy) over (partition by subject_id) as load_fy from FY16NAME)) a on r.subject_id = a.subject_id                        
where r.load_fy = '2016' and r.thing_status <> 'Over'  and r.thing_status in ('Head','Hair','Face')                     
)                       
group by program, importance_level;

您认为RPostgreSQL包用于连接到外部数据库是正确的,与用于在R数据帧上运行SQL的sqldf有些不同。sqldf依赖于其他包来处理数据库连接。

"sqldf不支持over功能">是错误的。sqldf默认驱动程序sqlite是一个没有over()的SQL变体。但是,您可以在本地postgreSQL安装中使用sqldf(sqldf可以在后台使用RPostgreSQL)。请参阅sqldf常见问题解答如何在PostgreSQL中使用sqldf?,我将在下面发布大部分内容。您会注意到SQL查询使用over()

安装1。PostgreSQL,2。CCD_ 19 R包3。CCD_ 20本身。CCD_ 21和CCD_。

确保您创建了一个空数据库,例如"test"。PostgreSQL附带的createdb程序可以用于此目的。例如,从控制台/shell创建一个名为test的数据库,如下所示:

createdb --help
createdb --username=postgres test

这里是一个使用RPostgreSQL的示例,之后我们展示了一个使用RpgSQL的示例。下面显示的选项语句可以直接输入,也可以放在.Rprofile中。此处显示的值实际上是默认值:

options(sqldf.RPostgreSQL.user = "postgres", 
sqldf.RPostgreSQL.password = "postgres",
sqldf.RPostgreSQL.dbname = "test",
sqldf.RPostgreSQL.host = "localhost", 
sqldf.RPostgreSQL.port = 5432)
Lines <- "Group_A Group_B Group_C Value 
A1 B1 C1 10 
A1 B1 C2 20 
A1 B1 C3 30 
A1 B2 C1 40 
A1 B2 C2 10 
A1 B2 C3 5 
A1 B2 C4 30 
A2 B1 C1 40 
A2 B1 C2 5 
A2 B1 C3 2 
A2 B2 C1 26 
A2 B2 C2 1 
A2 B3 C1 23 
A2 B3 C2 15 
A2 B3 C3 12 
A3 B3 C4 23 
A3 B3 C5 23"
DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE)
library(RPostgreSQL)
library(sqldf)
# upper case is folded to lower case by default so surround DF with double quotes
sqldf('select count(*) from "DF" ')
sqldf('select *, rank() over  (partition by "Group_A", "Group_B" order by "Value") 
from "DF" 
order by "Group_A", "Group_B", "Group_C" ')

最新更新