XML#

class onetl.file.format.xml.XML(*, rowTag: str, **kwargs)#

XML file format. support_hooks

Based on Databricks Spark XML file format.

Supports reading/writing files with .xml extension.

Warning

Due to bug written files currently does not have .xml extension.

New in version 0.9.5.

Version compatibility
  • Spark versions: 3.2.x - 3.5.x.

  • Scala versions: 2.12 - 2.13

  • Java versions: 8 - 20

See documentation from link above.

Note

You can pass any option to the constructor, even if it is not mentioned in this documentation. Option names should be in camelCase!

The set of supported options depends on Spark version. See link above.

Warning

By default, reading is done using mode=PERMISSIVE which replaces columns with wrong data type or format with null values. Be careful while parsing values like timestamps, they should match the timestampFormat option. Using mode=FAILFAST will throw an exception instead of producing null values. Follow

Examples

Describe options how to read from/write to XML file with specific options:

from onetl.file.format import XML
from pyspark.sql import SparkSession

# Create Spark session with XML package loaded
maven_packages = XML.get_packages(spark_version="3.5.0")
spark = (
    SparkSession.builder.appName("spark-app-name")
    .config("spark.jars.packages", ",".join(maven_packages))
    .getOrCreate()
)

xml = XML(row_tag="item")
classmethod get_packages(spark_version: str | Version, scala_version: str | Version | None = None, package_version: str | Version | None = None) list[str]#

Get package names to be downloaded by Spark. support_hooks

Parameters:
spark_versionstr

Spark version in format major.minor.patch.

scala_versionstr, optional

Scala version in format major.minor.

If None, spark_version is used to determine Scala version.

package_versionstr, optional

Package version in format major.minor.patch. Default is 0.17.0.

See Maven index for list of available versions.

Warning

Version 0.13 and below are not supported.

Note

It is not guaranteed that custom package versions are supported. Tests are performed only for default version.

Examples

from onetl.file.format import XML

XML.get_packages(spark_version="3.5.0")
XML.get_packages(spark_version="3.5.0", scala_version="2.12")
XML.get_packages(
    spark_version="3.5.0",
    scala_version="2.12",
    package_version="0.17.0",
)