r语言 - 更新h2o.frame子集的列



我做了一个简单的例子,我想做的事情,我打算更灵活。

我希望能够在h2o.frame的行上进行子集,对这些行进行一些计算,然后将结果分配给这些相同的行。在这个例子中,我计算了相对的"mpg"在每组" cycle "

library(h2o)
packageVersion("h2o")
[1] ‘3.32.1.3’
version
_                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          4                           
minor          1.1                         
year           2021                        
month          08                          
day            10                          
svn rev        80725                       
language       R                           
version.string R version 4.1.1 (2021-08-10)
nickname       Kick Things  
h2o.init()
mtcars <- as.h2o(mtcars)
mtcars$mpg_rel <- NA
mtcars
mpg cyl disp  hp drat    wt  qsec vs am gear carb mpg_rel
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4     NaN
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4     NaN
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1     NaN
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1     NaN
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2     NaN
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1     NaN
[32 rows x 12 columns]
mtcars[mtcars[["cyl"]] == 4, "mpg"] / h2o.mean(mtcars[mtcars[["cyl"]] == 4, "mpg"])
mpg
1 0.8550972
2 0.9151040
3 0.8550972
4 1.2151381
5 1.1401296
6 1.2713945
[11 rows x 1 column] 
# however, the assignment throws an error and corrupts `mtcars`
mtcars[mtcars[["cyl"]] == 4, "mpg_rel"] <- mtcars[mtcars[["cyl"]] == 4, "mpg"] / h2o.mean(mtcars[mtcars[["cyl"]] == 4, "mpg"])
ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = http://localhost:54321/99/Rapids)
water.exceptions.H2OIllegalArgumentException
[1] "water.exceptions.H2OIllegalArgumentException: unimplemented"                                                 
[2] "    water.H2O.unimpl(H2O.java:1310)"                                                                         
[3] "    water.rapids.ast.prims.assign.AstRectangleAssign.apply(AstRectangleAssign.java:93)"                      
[4] "    water.rapids.ast.prims.assign.AstRectangleAssign.apply(AstRectangleAssign.java:30)"                      
[5] "    water.rapids.ast.AstExec.exec(AstExec.java:63)"                                                          
[6] "    water.rapids.ast.prims.assign.AstTmpAssign.apply(AstTmpAssign.java:48)"                                  
[7] "    water.rapids.ast.prims.assign.AstTmpAssign.apply(AstTmpAssign.java:17)"                                  
[8] "    water.rapids.ast.AstExec.exec(AstExec.java:63)"                                                          
[9] "    water.rapids.Session.exec(Session.java:85)"                                                              
[10] "    water.rapids.Rapids.exec(Rapids.java:94)"                                                                
[11] "    water.api.RapidsHandler.exec(RapidsHandler.java:38)"                                                     
[12] "    java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"                          
[13] "    java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)"        
[14] "    java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"
[15] "    java.base/java.lang.reflect.Method.invoke(Method.java:566)"                                              
[16] "    water.api.Handler.handle(Handler.java:60)"                                                               
[17] "    water.api.RequestServer.serve(RequestServer.java:470)"                                                   
[18] "    water.api.RequestServer.doGeneric(RequestServer.java:301)"                                               
[19] "    water.api.RequestServer.doPost(RequestServer.java:227)"                                                  
[20] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:707)"                                            
[21] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:790)"                                            
[22] "    org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)"                                  
[23] "    org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535)"                              
[24] "    org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)"                       
[25] "    org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)"                      
[26] "    org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)"                        
[27] "    org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)"                               
[28] "    org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)"                        
[29] "    org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)"                       
[30] "    org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)"                           
[31] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)"                   
[32] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)"                         
[33] "    water.webserver.jetty9.Jetty9ServerAdapter$LoginHandler.handle(Jetty9ServerAdapter.java:130)"            
[34] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)"                   
[35] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)"                         
[36] "    org.eclipse.jetty.server.Server.handle(Server.java:531)"                                                 
[37] "    org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)"                                       
[38] "    org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)"                             
[39] "    org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)"             
[40] "    org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)"                                       
[41] "    org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)"                                    
[42] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)"                  
[43] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)"                
[44] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)"               
[45] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)"                      
[46] "    org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)"
[47] "    org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)"                        
[48] "    org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)"                         
[49] "    java.base/java.lang.Thread.run(Thread.java:834)"                                                         
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page,  : 

ERROR MESSAGE:
unimplemented

我发现的一个解决方案是使用行号的vector来进行子集。但是,当使用大数据时,as.vector转换效率非常低。如果能有类似上述方法的方法,那就太好了。

which_rows <- as.vector(h2o.which(mtcars[["cyl"]] == 4))
mtcars[which_rows, "mpg_rel"] <- mtcars[which_rows, "mpg"] / h2o.mean(mtcars[which_rows, "mpg"])
mtcars
mpg cyl disp  hp drat    wt  qsec vs am gear carb   mpg_rel
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4       NaN
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4       NaN
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 0.8550972
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1       NaN
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2       NaN
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1       NaN
[32 rows x 12 columns] 

我不确定这是否是最好的方法,但使用以下方法应该比使用as.vector更有效(一次传递所有行以获得mpg_mean和一次传递h2o.ifelse)。

mpg_mean <-  h2o.mean(mtcars[mtcars[["cyl"]] == 4, "mpg"])
mtcars[["mpg_rel"]] <- h2o.ifelse(mtcars[["cyl"]] == 4, mtcars[["mpg"]] / mpg_mean, mtcars[["mpg"]])

最新更新