Reading from Hive#

There are 2 ways of distributed data reading from Hive:

Hive.sql(query: str) DataFrame#

Lazily execute SELECT statement and return DataFrame. support_hooks

Same as spark.sql(query).

Parameters:
querystr

SQL query to be executed, like:

  • SELECT ... FROM ...

  • WITH ... AS (...) SELECT ... FROM ...

  • SHOW ... queries are also supported, like SHOW TABLES

Returns:
dfpyspark.sql.dataframe.DataFrame

Spark dataframe

Examples

Read data from Hive table:

connection = Hive(cluster="rnd-dwh", spark=spark)

df = connection.sql("SELECT * FROM mytable")