Kerberos support#

Most of Hadoop instances set up with Kerberos support, so some connections require additional setup to work properly.

  • HDFS Uses requests-kerberos and GSSApi for authentication. It also uses kinit executable to generate Kerberos ticket.

  • Hive and SparkHDFS require Kerberos ticket to exist before creating Spark session.

So you need to install OS packages with:

  • krb5 libs

  • Headers for krb5

  • gcc or other compiler for C sources

The exact installation instruction depends on your OS, here are some examples:

dnf install krb5-devel gcc  # CentOS, OracleLinux
apt install libkrb5-dev gcc  # Debian-based

Also you should pass kerberos to extras to install required Python packages:

pip install onetl[kerberos]