Prerequisites#

Version Compatibility#

  • Oracle Server versions: 23, 21, 19, 18, 12.2 and __probably__ 11.2 (tested, but it’s not mentioned in official docs).

  • Spark versions: 2.3.x - 3.5.x

  • Java versions: 8 - 20

See official documentation.

Installing PySpark#

To use Oracle connector you should have PySpark installed (or injected to sys.path) BEFORE creating the connector instance.

See Spark installation instruction for more details.

Connecting to Oracle#

Connection port#

Connection is usually performed to port 1521. Port may differ for different Oracle instances. Please ask your Oracle administrator to provide required information.

Connection host#

It is possible to connect to Oracle by using either DNS name of host or it’s IP address.

If you’re using Oracle cluster, it is currently possible to connect only to one specific node. Connecting to multiple nodes to perform load balancing, as well as automatic failover to new master/replica are not supported.

Connect as proxy user#

It is possible to connect to database as another user without knowing this user password.

This can be enabled by granting user a special CONNECT THROUGH permission:

ALTER USER schema_owner GRANT CONNECT THROUGH proxy_user;

Then you can connect to Oracle using credentials of proxy_user but specify that you need permissions of schema_owner:

oracle = Oracle(
    ...,
    user="proxy_user[schema_owner]",
    password="proxy_user password",
)

See official documentation.

Required grants#

Ask your Oracle cluster administrator to set following grants for a user, used for creating a connection:

-- allow user to log in
GRANT CREATE SESSION TO username;

-- allow creating tables in user schema
GRANT CREATE TABLE TO username;

-- allow read & write access to specific table
GRANT SELECT, INSERT ON username.mytable TO username;
More details can be found in official documentation: