The Apache Hive ™ data warehouse software facilitates querying and managing large data sets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
So, today we will look into the installation and configuration of Hive. And we will explore the advantages of SQL-like queries over the Hadoop platform.
To install Hive, make sure you have the Hadoop instances are running on your clusters. If not, get it done first!!
Download hive from Hive downloads
First, extract the hive-<version>.gz file.
Now, go to Hive directory:
Now run following commands one by one:
Now, create /tmp and /user/hive/warehouse directory at the HDFS location.
For that, go to Hadoop directory:
And, run the following commands:
bin/hadoop fs -mkdir /tmp
bin/hadoop fs -mkdir /user/hive/warehouse
bin/hadoop fs -chmod g+w /tmp
bin/hadoop fs -chmod g+w /user/hive/warehouse
Now, set the Hive home:
Congratulations, you are done with the configuration..!!
To start hive go to Hive home:
And run the command:
Hive will be started, If this article helps do share your feedback below