Excel#
- class onetl.file.format.excel.Excel(*, header: bool = False, **kwargs)#
-
Based on Spark Excel file format.
Supports reading/writing files with
.xlsx
(read/write) and.xls
(read only) extensions.New in version 0.9.4.
Version compatibility
Spark versions: 3.2.x - 3.5.x.
Warning
Not all combinations of Spark version and package version are supported. See Maven index and official documentation.
Scala versions: 2.12 - 2.13
Java versions: 8 - 20
See documentation from link above.
Note
You can pass any option to the constructor, even if it is not mentioned in this documentation. Option names should be in
camelCase
!The set of supported options depends on Spark version. See link above.
Examples
Describe options how to read from/write to Excel file with specific options:
from onetl.file.format import Excel from pyspark.sql import SparkSession # Create Spark session with Excel package loaded maven_packages = Excel.get_packages(spark_version="3.5.0") spark = ( SparkSession.builder.appName("spark-app-name") .config("spark.jars.packages", ",".join(maven_packages)) .getOrCreate() ) excel = Excel( header=True, inferSchema=True, )
- classmethod get_packages(spark_version: str | Version, scala_version: str | Version | None = None, package_version: str | Version | None = None) list[str] #
Get package names to be downloaded by Spark.
Warning
Not all combinations of Spark version and package version are supported. See Maven index and official documentation.
- Parameters:
- spark_versionstr
Spark version in format
major.minor.patch
.- scala_versionstr, optional
Scala version in format
major.minor
.If
None
,spark_version
is used to determine Scala version.- package_versionstr, optional
Package version in format
major.minor.patch
. Default is0.20.3
.Warning
Version
0.14
and below are not supported.Note
It is not guaranteed that custom package versions are supported. Tests are performed only for default version.
Examples
from onetl.file.format import Excel Excel.get_packages(spark_version="3.5.0") Excel.get_packages(spark_version="3.5.0", scala_version="2.13") Excel.get_packages( spark_version="3.5.0", scala_version="2.13", package_version="0.20.3", )