我想将数据从每个分区保存到mySQL数据库。为此,我创建了实现VoidFunction<>
的类:
public class DatabaseSaveFunction implements VoidFunction<Iterator<String>> {
/**
*
*/
private static final long serialVersionUID = -7039277486852158360L;
public void call(Iterator<String> it) {
Connection connect = null;
PreparedStatement preparedStatement = null;
try {
Class.forName("com.mysql.jdbc.Driver");
connect = DriverManager.getConnection("jdbc:mysql://"
+ "xxx.us-west-2.rds.amazonaws.com" + "/"
+ "xxx", "xxx", "xxx");
preparedStatement = connect
.prepareStatement("insert into testdatabase.test values (default, ?)");
while (it.hasNext()) {
String outputElement = it.next();
preparedStatement.setString(1, "" + outputElement.length());
preparedStatement.executeUpdate();
}
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
} finally {
try {
connect.close();
preparedStatement.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
和我正在调用的主要方法课:
output.foreachPartition(new DatabaseSaveFunction());
我正在遵循以下错误:
15/05/06 15:34:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 4, ip-172-31-36-44.us-west-2.compute.internal): java.lang.ClassNotFoundException: DatabaseSaveFunction
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
工人日志:
15/05/06 15:34:00 ERROR executor.Executor: Exception in task 1.0 in stage 1.0 (TID 5)
java.lang.ClassNotFoundException: DatabaseSaveFunction
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
有人可以告诉我我在做什么错吗?我会非常感谢。
将外部类导出到jar,并添加像sc.addjar("/path/path/x.jar"),其中sc在您的主机中为javasparkcontext。然后,您将不会收到此错误。错误是因为您Spark程序无法找到该类。此外,在Spark 1.3及更大的情况下,您可以简单地使用JDBC的地图选项,然后使用LOAD(" JDBC",选项)创建数据框架并从任何RDBMS加载数据。它真的很方便。我不确定此方法是否可以将任何RDBMS连接到Spark中。请告诉我您是否还有其他问题。