Greenplum connection#

class onetl.connection.db_connection.greenplum.connection.Greenplum(*, spark: SparkSession, user: str, password: SecretStr, host: Host, database: str, port: int = 5432, extra: GreenplumExtra = GreenplumExtra(tcpKeepAlive='true'))#

Greenplum connection. support_hooks

Based on package io.pivotal:greenplum-spark:2.2.0 (VMware Greenplum connector for Spark).

Warning

Before using this connector please take into account Prerequisites

Parameters:
hoststr

Host of Greenplum master. For example: test.greenplum.domain.com or 193.168.1.17

portint, default: 5432

Port of Greenplum master

userstr

User, which have proper access to the database. For example: some_user

passwordstr

Password for database connection

databasestr

Database in RDBMS, NOT schema.

See this page for more details

sparkpyspark.sql.SparkSession

Spark session.

extradict, default: None

Specifies one or more extra parameters by which clients can connect to the instance.

For example: {"tcpKeepAlive": "true", "server.port": "50000-65535"}

Supported options are:

Examples

Greenplum connection initialization

from onetl.connection import Greenplum
from pyspark.sql import SparkSession

# Create Spark session with Greenplum connector loaded
maven_packages = Greenplum.get_packages(spark_version="3.2")
spark = (
    SparkSession.builder.appName("spark-app-name")
    .config("spark.jars.packages", ",".join(maven_packages))
    .config("spark.executor.allowSparkContext", "true")
    # IMPORTANT!!!
    # Set number of executors according to "Prerequisites" -> "Number of executors"
    .config("spark.dynamicAllocation.maxExecutors", 10)
    .config("spark.executor.cores", 1)
    .getOrCreate()
)

# IMPORTANT!!!
# Set port range of executors according to "Prerequisites" -> "Network ports"
extra = {
    "server.port": "41000-42000",
}

# Create connection
greenplum = Greenplum(
    host="master.host.or.ip",
    user="user",
    password="*****",
    database="target_database",
    extra=extra,
    spark=spark,
)
check()#

Check source availability. support_hooks

If not, an exception will be raised.

Returns:
Connection itself
Raises:
RuntimeError

If the connection is not available

Examples

connection.check()
classmethod get_packages(*, scala_version: str | Version | None = None, spark_version: str | Version | None = None, package_version: str | Version | None = None) list[str]#

Get package names to be downloaded by Spark. support_hooks

Warning

You should pass either scala_version or spark_version.

Parameters:
scala_versionstr, optional

Scala version in format major.minor.

If None, spark_version is used to determine Scala version.

spark_versionstr, optional

Spark version in format major.minor.

Used only if scala_version=None.

package_versionstr, optional, default 2.2.0

Package version in format major.minor.patch

Examples

from onetl.connection import Greenplum

Greenplum.get_packages(scala_version="2.12")
Greenplum.get_packages(spark_version="3.2", package_version="2.3.0")