XML#
- class onetl.file.format.xml.XML(*, rowTag: str, **kwargs)#
-
Based on Databricks Spark XML file format.
Supports reading/writing files with
.xml
extension.Warning
Due to bug written files currently does not have
.xml
extension.New in version 0.9.5.
Version compatibility
Spark versions: 3.2.x - 3.5.x.
Scala versions: 2.12 - 2.13
Java versions: 8 - 20
See documentation from link above.
Note
You can pass any option to the constructor, even if it is not mentioned in this documentation. Option names should be in
camelCase
!The set of supported options depends on Spark version. See link above.
Warning
By default, reading is done using
mode=PERMISSIVE
which replaces columns with wrong data type or format withnull
values. Be careful while parsing values like timestamps, they should match thetimestampFormat
option. Usingmode=FAILFAST
will throw an exception instead of producingnull
values. FollowExamples
Describe options how to read from/write to XML file with specific options:
from onetl.file.format import XML from pyspark.sql import SparkSession # Create Spark session with XML package loaded maven_packages = XML.get_packages(spark_version="3.5.0") spark = ( SparkSession.builder.appName("spark-app-name") .config("spark.jars.packages", ",".join(maven_packages)) .getOrCreate() ) xml = XML(row_tag="item")
- classmethod get_packages(spark_version: str | Version, scala_version: str | Version | None = None, package_version: str | Version | None = None) list[str] #
Get package names to be downloaded by Spark.
- Parameters:
- spark_versionstr
Spark version in format
major.minor.patch
.- scala_versionstr, optional
Scala version in format
major.minor
.If
None
,spark_version
is used to determine Scala version.- package_versionstr, optional
Package version in format
major.minor.patch
. Default is0.17.0
.See Maven index for list of available versions.
Warning
Version
0.13
and below are not supported.Note
It is not guaranteed that custom package versions are supported. Tests are performed only for default version.
Examples
from onetl.file.format import XML XML.get_packages(spark_version="3.5.0") XML.get_packages(spark_version="3.5.0", scala_version="2.12") XML.get_packages( spark_version="3.5.0", scala_version="2.12", package_version="0.17.0", )