我试图运行下面的代码,它利用图形框架,我现在得到一个错误,据我所知,经过几个小时的谷歌搜索,我无法解决。似乎不能加载类,但我真的不知道我还应该做什么。
有人能再看看下面的代码和错误吗?我遵循了这里的说明,如果你想快速尝试一下,你可以在这里找到我的数据集。
"""
Program: RUNNING GRAPH ANALYTICS WITH SPARK GRAPH-FRAMES:
Author: Dr. C. Hadjinikolis
Date: 14/09/2016
Description: This is the application's core module from where everything is executed.
The module is responsible for:
1. Loading Spark
2. Loading GraphFrames
3. Running analytics by leveraging other modules in the package.
"""
# IMPORT OTHER LIBS -------------------------------------------------------------------------------#
import os
import sys
import pandas as pd
# IMPORT SPARK ------------------------------------------------------------------------------------#
# Path to Spark source folder
USER_FILE_PATH = "/Users/christoshadjinikolis"
SPARK_PATH = "/PycharmProjects/GenesAssociation"
SPARK_FILE = "/spark-2.0.0-bin-hadoop2.7"
SPARK_HOME = USER_FILE_PATH + SPARK_PATH + SPARK_FILE
os.environ['SPARK_HOME'] = SPARK_HOME
# Append pySpark to Python Path
sys.path.append(SPARK_HOME + "/python")
sys.path.append(SPARK_HOME + "/python" + "/lib/py4j-0.10.1-src.zip")
try:
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
from graphframes import *
except ImportError as ex:
print "Can not import Spark Modules", ex
sys.exit(1)
# GLOBAL VARIABLES --------------------------------------------------------------------------------#
# Configure spark properties
CONF = (SparkConf()
.setMaster("local")
.setAppName("My app")
.set("spark.executor.memory", "10g")
.set("spark.executor.instances", "4"))
# Instantiate SparkContext object
SC = SparkContext(conf=CONF)
# Instantiate SQL_SparkContext object
SQL_CONTEXT = SQLContext(SC)
# MAIN CODE ---------------------------------------------------------------------------------------#
if __name__ == "__main__":
# Main Path to CSV files
DATA_PATH = '/PycharmProjects/GenesAssociation/data/'
FILE_NAME = 'gene_gene_associations_50k.csv'
# LOAD DATA CSV USING PANDAS -----------------------------------------------------------------#
print "STEP 1: Loading Gene Nodes -------------------------------------------------------------"
# Read csv file and load as df
GENES = pd.read_csv(USER_FILE_PATH + DATA_PATH + FILE_NAME,
usecols=['OFFICIAL_SYMBOL_A'],
low_memory=True,
iterator=True,
chunksize=1000)
# Concatenate chunks into list & convert to dataFrame
GENES_DF = pd.DataFrame(pd.concat(list(GENES), ignore_index=True))
# Remove duplicates
GENES_DF_CLEAN = GENES_DF.drop_duplicates(keep='first')
# Name Columns
GENES_DF_CLEAN.columns = ['id']
# Output dataFrame
print GENES_DF_CLEAN
# Create vertices
VERTICES = SQL_CONTEXT.createDataFrame(GENES_DF_CLEAN)
# Show some vertices
print VERTICES.take(5)
print "STEP 2: Loading Gene Edges -------------------------------------------------------------"
# Read csv file and load as df
EDGES = pd.read_csv(USER_FILE_PATH + DATA_PATH + FILE_NAME,
usecols=['OFFICIAL_SYMBOL_A', 'OFFICIAL_SYMBOL_B', 'EXPERIMENTAL_SYSTEM'],
low_memory=True,
iterator=True,
chunksize=1000)
# Concatenate chunks into list & convert to dataFrame
EDGES_DF = pd.DataFrame(pd.concat(list(EDGES), ignore_index=True))
# Name Columns
EDGES_DF.columns = ["src", "dst", "rel_type"]
# Output dataFrame
print EDGES_DF
# Create vertices
EDGES = SQL_CONTEXT.createDataFrame(EDGES_DF)
# Show some edges
print EDGES.take(5)
print "STEP 3: Generating the Graph -----------------------------------------------------------"
GENES_GRAPH = GraphFrame(VERTICES, EDGES)
print "STEP 4: Running Various Basic Analytics ------------------------------------------------"
print "Vertex in-Degree -----------------------------------------------------------------------"
GENES_GRAPH.inDegrees.sort('inDegree', ascending=False).show()
print "Vertex out-Degree ----------------------------------------------------------------------"
GENES_GRAPH.outDegrees.sort('outDegree', ascending=False).show()
print "Vertex degree --------------------------------------------------------------------------"
GENES_GRAPH.degrees.sort('degree', ascending=False).show()
print "Triangle Count -------------------------------------------------------------------------"
RESULTS = GENES_GRAPH.triangleCount()
RESULTS.select("id", "count").show()
print "Label Propagation ----------------------------------------------------------------------"
GENES_GRAPH.labelPropagation(maxIter=10).show() # Convergence is not guaranteed
print "PageRank -------------------------------------------------------------------------------"
GENES_GRAPH.pageRank(resetProbability=0.15, tol=0.01)
.vertices.sort('pagerank', ascending=False).show()
print "STEP 5: Find Shortest Paths w.r.t. Landmarks -------------------------------------------"
# Shortest paths
SHORTEST_PATH = GENES_GRAPH.shortestPaths(landmarks=["ARF3", "MAP2K4"])
SHORTEST_PATH.select("id", "distances").show()
print "STEP 6: Save Vertices and Edges --------------------------------------------------------"
# Save vertices and edges as Parquet to some location.
# Note: You can't overwrite existing vertices and edges directories.
GENES_GRAPH.vertices.write.parquet("vertices")
GENES_GRAPH.edges.write.parquet("edges")
print "STEP 7: Load "
# Load the vertices and edges back.
SAME_VERTICES = GENES_GRAPH.read.parquet("vertices")
SAME_EDGES = GENES_GRAPH.read.parquet("edges")
# Create an identical GraphFrame.
SAME_GENES_GRAPH = GF.GraphFrame(SAME_VERTICES, SAME_EDGES)
# END OF FILE -------------------------------------------------------------------------------------#
输出:
Ivy Default Cache set to: /Users/username/.ivy2/cache
The jars for the packages stored in: /Users/username/.ivy2/jars
:: loading settings :: url = jar:file:/Users/username/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found graphframes#graphframes;0.2.0-spark2.0-s_2.11 in list
found com.typesafe.scala-logging#scala-logging-api_2.11;2.1.2 in list
found com.typesafe.scala-logging#scala-logging-slf4j_2.11;2.1.2 in list
found org.scala-lang#scala-reflect;2.11.0 in list
[2.11.0] org.scala-lang#scala-reflect;2.11.0
found org.slf4j#slf4j-api;1.7.7 in list
:: resolution report :: resolve 391ms :: artifacts dl 14ms
:: modules in use:
com.typesafe.scala-logging#scala-logging-api_2.11;2.1.2 from list in [default]
com.typesafe.scala-logging#scala-logging-slf4j_2.11;2.1.2 from list in [default]
graphframes#graphframes;0.2.0-spark2.0-s_2.11 from list in [default]
org.scala-lang#scala-reflect;2.11.0 from list in [default]
org.slf4j#slf4j-api;1.7.7 from list in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 5 | 0 | 0 | 0 || 5 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 5 already retrieved (0kB/11ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/09/20 11:00:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
OK1
Traceback (most recent call last):
File "/Users/username/PycharmProjects/GenesAssociation/main.py", line 32, in <module>
g = GraphFrame(v, e)
File "/Users/tjhunter/work/graphframes/python/graphframes/graphframe.py", line 62, in __init__
File "/Users/tjhunter/work/graphframes/python/graphframes/graphframe.py", line 34, in _java_api
File "/Users/christoshadjinikolis/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
File "/Users/christoshadjinikolis/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/Users/username/PycharmProjects/GenesAssociation/spark-2.0.0-bin-hadoop2.7/python/lib"
"/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o53.newInstance.
: java.lang.NoClassDefFoundError: com/typesafe/scalalogging/slf4j/LazyLogging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.graphframes.GraphFrame$.<init>(GraphFrame.scala:677)
at org.graphframes.GraphFrame$.<clinit>(GraphFrame.scala)
at org.graphframes.GraphFramePythonAPI.<init>(GraphFramePythonAPI.scala:11)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.typesafe.scalalogging.slf4j.LazyLogging
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 43 more
Process finished with exit code 1
我在spark/scala中遇到了同样的问题,我通过将以下jar添加到类路径中来解决它:
spark-shell——jar scala-logging_2.12-3.5.0.jar
你可以在这里找到jar:https://mvnrepository.com/artifact/com.typesafe.scala-logging/scala-logging_2.12/3.5.0
来源:https://github.com/graphframes/graphframes/issues/113