Skip to content

Spark LocalFS

Bases: SparkFileDFConnection

Spark connection to local filesystem. support hooks

Based on Spark Generic File Data Source.

Warning

To use SparkHDFS connector you should have PySpark installed (or injected to sys.path) BEFORE creating the connector instance.

See Spark install installation instruction for more details.

Warning

Currently supports only Spark sessions created with option spark.master: local.

Note

Supports only reading files as Spark DataFrame and writing DataFrame to files.

Does NOT support file operations, like create, delete, rename, etc.

Added in 0.9.0

Parameters:

  • spark (SparkSession) –

    Spark session

Examples:

from onetl.connection import SparkLocalFS
from pyspark.sql import SparkSession

# create Spark session
spark = SparkSession.builder.master("local").appName("spark-app-name").getOrCreate()

# create connection
local_fs = SparkLocalFS(spark=spark).check()

check()

Check source availability. support hooks

If not, an exception will be raised.

Returns:

  • Self

    Connection itself

Raises:

  • RuntimeError

    If the connection is not available

Examples:

connection.check()