Prerequisites#
Version Compatibility#
MongoDB server versions: 4.0 or higher
Spark versions: 3.2.x - 3.4.x
Scala versions: 2.12 - 2.13
Java versions: 8 - 20
Installing PySpark#
To use MongoDB connector you should have PySpark installed (or injected to sys.path
)
BEFORE creating the connector instance.
See Spark installation instruction for more details.
Connecting to MongoDB#
Connection host#
It is possible to connect to MongoDB host by using either DNS name of host or it’s IP address.
It is also possible to connect to MongoDB shared cluster:
mongo = MongoDB(
host="master.host.or.ip",
user="user",
password="*****",
database="target_database",
spark=spark,
extra={
# read data from secondary cluster node, switch to primary if not available
"readPreference": "secondaryPreferred",
},
)
Supported readPreference
values are described in official documentation.
Connection port#
Connection is usually performed to port 27017
. Port may differ for different MongoDB instances.
Please ask your MongoDB administrator to provide required information.
Required grants#
Ask your MongoDB cluster administrator to set following grants for a user, used for creating a connection:
// allow writing data to specific database
db.grantRolesToUser("username", [{db: "somedb", role: "readWrite"}])
// allow reading data from specific database
db.grantRolesToUser("username", [{db: "somedb", role: "read"}])