Excel#

class onetl.file.format.excel.Excel(*, header: bool = False, **kwargs)#

Excel file format. support_hooks

Based on Spark Excel file format.

Supports reading/writing files with .xlsx (read/write) and .xls (read only) extensions.

New in version 0.9.4.

Version compatibility
  • Spark versions: 3.2.x - 3.5.x.

    Warning

    Not all combinations of Spark version and package version are supported. See Maven index and official documentation.

  • Scala versions: 2.12 - 2.13

  • Java versions: 8 - 20

See documentation from link above.

Note

You can pass any option to the constructor, even if it is not mentioned in this documentation. Option names should be in camelCase!

The set of supported options depends on Spark version. See link above.

Examples

Describe options how to read from/write to Excel file with specific options:

from onetl.file.format import Excel
from pyspark.sql import SparkSession

# Create Spark session with Excel package loaded
maven_packages = Excel.get_packages(spark_version="3.5.0")
spark = (
    SparkSession.builder.appName("spark-app-name")
    .config("spark.jars.packages", ",".join(maven_packages))
    .getOrCreate()
)

excel = Excel(
    header=True,
    inferSchema=True,
)
classmethod get_packages(spark_version: str | Version, scala_version: str | Version | None = None, package_version: str | Version | None = None) list[str]#

Get package names to be downloaded by Spark. support_hooks

Warning

Not all combinations of Spark version and package version are supported. See Maven index and official documentation.

Parameters:
spark_versionstr

Spark version in format major.minor.patch.

scala_versionstr, optional

Scala version in format major.minor.

If None, spark_version is used to determine Scala version.

package_versionstr, optional

Package version in format major.minor.patch. Default is 0.20.3.

Warning

Version 0.14 and below are not supported.

Note

It is not guaranteed that custom package versions are supported. Tests are performed only for default version.

Examples

from onetl.file.format import Excel

Excel.get_packages(spark_version="3.5.0")
Excel.get_packages(spark_version="3.5.0", scala_version="2.13")
Excel.get_packages(
    spark_version="3.5.0",
    scala_version="2.13",
    package_version="0.20.3",
)