我有一个很长的火花代码类似于下面有很多关键字:
df = spark.sql("""
select * from abc
""")
df.persist()
df2 = spark.sql("""
select * from def
""")
df2.persist()
df3 = spark.sql("""
select * from mno""")
我想找出所有被持久化的数据帧,并将它们存储在一个列表中。
输出:
l1 = [df, df2]
我们该怎么做呢?
试试这个:
from pyspark.sql import DataFrame
df = spark.sql("""
select * from abc
""")
df.persist()
df2 = spark.sql("""
select * from def
""")
df2.persist()
df3 = spark.sql("""
select * from mno
""")
dfNameList = []
for k, v in globals().items():
if isinstance(v, DataFrame):
# k is the name of DF, v is DF itself.
if v.storageLevel.useMemory == True:
dfNameList.append(k)
print(dfNameList)
输出:
['df', 'df2']
- Loop
globals().items()
; 查找 - 确定DF是否在内存中持久化;
- 收集DF名称并打印
DataFrame
实例如果你想把所有DF放在列表中而不是DF名称,只需将v
添加到列表中。
输出如下:
[DataFrame[fieldOne: typeOne, fieldTwo: typeTwo, ……], DataFrame[fieldOne: typeOne, fieldTwo: typeTwo,……]]