Spark LocalFS#

class onetl.connection.file_df_connection.spark_local_fs.SparkLocalFS(*, spark: SparkSession)#

Spark connection to local filesystem. support_hooks

Based on Spark Generic File Data Source.

Warning

To use SparkLocalFS connector you should have PySpark installed (or injected to sys.path) BEFORE creating the connector instance.

You can install PySpark as follows:

pip install onetl[spark]  # latest PySpark version

# or
pip install onetl pyspark=3.5.0  # pass specific PySpark version

See Spark installation instruction for more details.

Warning

Currently supports only Spark sessions created with option spark.master: local.

Note

Supports only reading files as Spark DataFrame and writing DataFrame to files.

Does NOT support file operations, like create, delete, rename, etc.

Parameters:
sparkpyspark.sql.SparkSession

Spark session

Examples

from onetl.connection import SparkLocalFS
from pyspark.sql import SparkSession

# create Spark session
spark = SparkSession.builder.master("local").appName("spark-app-name").getOrCreate()

# create connection
local_fs = SparkLocalFS(spark=spark).check()
check()#

Check source availability. support_hooks

If not, an exception will be raised.

Returns:
Connection itself
Raises:
RuntimeError

If the connection is not available

Examples

connection.check()