Reading from Hive#
There are 2 ways of distributed data reading from Hive:
Using
DBReader
with different Read StrategiesUsing
Hive.sql
- Hive.sql(query: str) DataFrame #
Lazily execute SELECT statement and return DataFrame.
Same as
spark.sql(query)
.- Parameters:
- querystr
SQL query to be executed, like:
SELECT ... FROM ...
WITH ... AS (...) SELECT ... FROM ...
SHOW ...
queries are also supported, likeSHOW TABLES
- Returns:
- dfpyspark.sql.dataframe.DataFrame
Spark dataframe
Examples
Read data from Hive table:
connection = Hive(cluster="rnd-dwh", spark=spark) df = connection.sql("SELECT * FROM mytable")