Avro#
- class onetl.file.format.avro.Avro(*, avroSchema: Dict | None = None, avroSchemaUrl: str | None = None, **kwargs)#
-
Based on Spark Avro file format.
Supports reading/writing files with
.avro
extension.Version compatibility
Spark versions: 2.4.x - 3.5.x
Java versions: 8 - 20
Scala versions: 2.11 - 2.13
See documentation from link above.
Note
You can pass any option to the constructor, even if it is not mentioned in this documentation. Option names should be in
camelCase
!The set of supported options depends on Spark version. See link above.
Examples
Describe options how to read from/write to Avro file with specific options:
from onetl.file.format import Avro from pyspark.sql import SparkSession # Create Spark session with Avro package loaded maven_packages = Avro.get_packages(spark_version="3.5.0") spark = ( SparkSession.builder.appName("spark-app-name") .config("spark.jars.packages", ",".join(maven_packages)) .getOrCreate() ) # Describe file format schema = { "type": "record", "name": "Person", "fields": [{"name": "name", "type": "string"}], } avro = Avro(schema_dict=schema, compression="snappy")
- classmethod get_packages(spark_version: str | Version, scala_version: str | Version | None = None) list[str] #
Get package names to be downloaded by Spark.
See Maven package index for all available packages.
- Parameters:
- spark_versionstr
Spark version in format
major.minor.patch
.- scala_versionstr, optional
Scala version in format
major.minor
.If
None
,spark_version
is used to determine Scala version.
Examples
from onetl.file.format import Avro Avro.get_packages(spark_version="3.2.4") Avro.get_packages(spark_version="3.2.4", scala_version="2.13")