Prerequisites#

Version Compatibility#

SQL Server versions: 2014 - 2022
Spark versions: 2.3.x - 3.5.x
Java versions: 8 - 20

See official documentation and official compatibility matrix.

Installing PySpark#

To use MSSQL connector you should have PySpark installed (or injected to sys.path) BEFORE creating the connector instance.

See Spark installation instruction for more details.

Connecting to MSSQL#

Connection port#

Connection is usually performed to port 1443. Port may differ for different MSSQL instances. Please ask your MSSQL administrator to provide required information.

Connection host#

It is possible to connect to MSSQL by using either DNS name of host or it’s IP address.

If you’re using MSSQL cluster, it is currently possible to connect only to one specific node. Connecting to multiple nodes to perform load balancing, as well as automatic failover to new master/replica are not supported.

Required grants#

Ask your MSSQL cluster administrator to set following grants for a user, used for creating a connection:

-- allow creating tables for user
GRANT CREATE TABLE TO username;

-- allow read & write access to specific table
GRANT SELECT, INSERT ON username.mytable TO username;

-- only if if_exists="replace_entire_table" is used:
-- allow dropping/truncating tables in any schema
GRANT ALTER ON username.mytable TO username;

-- allow creating tables for user
GRANT CREATE TABLE TO username;

-- allow managing tables in specific schema, and inserting data to tables
GRANT ALTER, SELECT, INSERT ON SCHEMA::someschema TO username;

-- allow read access to specific table
GRANT SELECT ON someschema.mytable TO username;

More details can be found in official documentation: