Greenplum connection#
- class onetl.connection.db_connection.greenplum.connection.Greenplum(*, spark: SparkSession, user: str, password: SecretStr, host: Host, database: str, port: int = 5432, extra: GreenplumExtra = GreenplumExtra(tcpKeepAlive='true'))#
-
Based on package
io.pivotal:greenplum-spark:2.2.0
(VMware Greenplum connector for Spark).Warning
Before using this connector please take into account Prerequisites
- Parameters:
- hoststr
Host of Greenplum master. For example:
test.greenplum.domain.com
or193.168.1.17
- portint, default:
5432
Port of Greenplum master
- userstr
User, which have proper access to the database. For example:
some_user
- passwordstr
Password for database connection
- databasestr
Database in RDBMS, NOT schema.
See this page for more details
- spark
pyspark.sql.SparkSession
Spark session.
- extradict, default:
None
Specifies one or more extra parameters by which clients can connect to the instance.
For example:
{"tcpKeepAlive": "true", "server.port": "50000-65535"}
- Supported options are:
Properties from Greenplum connector for Spark documentation page, but only starting with
server.
orpool.
Examples
Greenplum connection initialization
from onetl.connection import Greenplum from pyspark.sql import SparkSession # Create Spark session with Greenplum connector loaded maven_packages = Greenplum.get_packages(spark_version="3.2") spark = ( SparkSession.builder.appName("spark-app-name") .config("spark.jars.packages", ",".join(maven_packages)) .config("spark.executor.allowSparkContext", "true") # IMPORTANT!!! # Set number of executors according to "Prerequisites" -> "Number of executors" .config("spark.dynamicAllocation.maxExecutors", 10) .config("spark.executor.cores", 1) .getOrCreate() ) # IMPORTANT!!! # Set port range of executors according to "Prerequisites" -> "Network ports" extra = { "server.port": "41000-42000", } # Create connection greenplum = Greenplum( host="master.host.or.ip", user="user", password="*****", database="target_database", extra=extra, spark=spark, )
- check()#
-
If not, an exception will be raised.
- Returns:
- Connection itself
- Raises:
- RuntimeError
If the connection is not available
Examples
connection.check()
- classmethod get_packages(*, scala_version: str | Version | None = None, spark_version: str | Version | None = None, package_version: str | Version | None = None) list[str] #
Get package names to be downloaded by Spark.
Warning
You should pass either
scala_version
orspark_version
.- Parameters:
- scala_versionstr, optional
Scala version in format
major.minor
.If
None
,spark_version
is used to determine Scala version.- spark_versionstr, optional
Spark version in format
major.minor
.Used only if
scala_version=None
.- package_versionstr, optional, default
2.2.0
Package version in format
major.minor.patch
Examples
from onetl.connection import Greenplum Greenplum.get_packages(scala_version="2.12") Greenplum.get_packages(spark_version="3.2", package_version="2.3.0")