hive的安装配置与操作

一、Hive安装地址

1、Hive官网地址
http://hive.apache.org/

2、文档查看地址
https://cwiki.apache.org/confluence/display/Hive/GettingStarted

3、下载地址
http://archive.apache.org/dist/hive/

二、Hive安装部署

2.1 Hive安装及配置

1、把 apache-hive-1.2.1-bin.tar.gz 上传到 linux 的 /opt/software 目录下

[test@hadoop151 opt]$ cd software/
[test@hadoop151 software]$ ll
总用量 464860
-rw-rw-r--  1 test test  92834839 2月  14 09:48 apache-hive-1.2.1-bin.tar.gz
-rw-rw-r--. 1 test test 197657687 1月  25 10:41 hadoop-2.7.2.tar.gz
-rw-rw-r--. 1 test test 185515842 1月  25 09:33 jdk-8u144-linux-x64.tar.gz

2、解压 apache-hive-1.2.1-bin.tar.gz 到 /opt/module/ 目录下面

[test@hadoop151 software]$ tar -zxvf apache-hive-1.2.1-bin.tar.gz  -C /opt/module/

3、修改 apache-hive-1.2.1-bin.tar.gz 的名称为 hive

[test@hadoop151 module]$ mv apache-hive-1.2.1-bin/ hive
[test@hadoop151 module]$ ll
总用量 16
drwxr-xr-x. 15 test test 4096 2月   3 21:53 hadoop-2.7.2
drwxrwxr-x   8 test test 4096 2月  14 09:49 hive
drwxrwxr-x   2 test test 4096 2月   4 19:16 input
drwxr-xr-x.  8 test test 4096 7月  22 2017 jdk1.8.0_144

4、修改/opt/module/hive/conf 目录下的 hive-env.sh.template 名称为 hive-env.sh

[test@hadoop151 conf]$ mv hive-env.sh.template hive-env.sh

5、配置 hive-env.sh 文件
(1) 配置 HADOOP_HOME 路径

export HADOOP_HOME=/opt/module/hadoop-2.7.2

(2) 配置 HIVE_CONF_DIR 路径

export HIVE_CONF_DIR=/opt/module/hive/conf

2.2 Hadoop集群配置

1、必须启动 hdfs 和 yarn

[test@hadoop151 conf]$ start-dfs.sh
Starting namenodes on [hadoop151]
hadoop151: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-test-namenode-hadoop151.out
hadoop151: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-test-datanode-hadoop151.out
hadoop152: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-test-datanode-hadoop152.out
hadoop153: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-test-datanode-hadoop153.out
Starting secondary namenodes [hadoop153]
hadoop153: starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-test-secondarynamenode-hadoop153.out

[test@hadoop152 ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-test-resourcemanager-hadoop152.out
hadoop153: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-test-nodemanager-hadoop153.out
hadoop151: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-test-nodemanager-hadoop151.out
hadoop152: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-test-nodemanager-hadoop152.out

2、在 HDFS 上创建 /tmp 和 /user/hive/warehouse 两个目录并修改他们的同组权限可写(可不操作，系统会自动创建)

[test@hadoop153 ~]$ hadoop fs -mkdir /tmp
[test@hadoop153 ~]$ hadoop fs -mkdir -p /user/hive/warehouse

[test@hadoop153 ~]$ hadoop fs -chmod g+w /tmp
[test@hadoop153 ~]$ hadoop fs -chmod g+w /user/hive/warehouse

2.3 Hive基本操作

1、启动hive

[test@hadoop151 bin]$ hive
Logging initialized using configuration in jar:file:/opt/module/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive>

2、查看数据库

hive> show databases;
OK
default
Time taken: 1.355 seconds, Fetched: 1 row(s)

3、打开默认的数据库

hive> use default;
OK
Time taken: 0.075 seconds

4、显示 default 数据库中的表

hive> show tables;
OK
Time taken: 0.063 seconds

5、创建一张表

hive> create table student(id int, name string);
OK
Time taken: 0.584 seconds

6、显示数据库中有几张表

hive> show tables;
OK
student
Time taken: 0.022 seconds, Fetched: 1 row(s)

7、查看表的结构

hive> desc student;
OK
id                  	int                 	                    
name                	string              	                    
Time taken: 0.206 seconds, Fetched: 2 row(s)

8、向表中插入数据

hive> insert into student values(1000,"ss");
Query ID = test_20200214104152_6f97830a-f3f8-4c26-9744-c77441cefd05
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581645695424_0001, Tracking URL = http://hadoop152:8088/proxy/application_1581645695424_0001/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1581645695424_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-14 10:42:18,853 Stage-1 map = 0%,  reduce = 0%
2020-02-14 10:42:38,866 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.2 sec
MapReduce Total cumulative CPU time: 4 seconds 200 msec
Ended Job = job_1581645695424_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop151:9000/user/hive/warehouse/student/.hive-staging_hive_2020-02-14_10-41-52_166_5255955701507975136-1/-ext-10000
Loading data to table default.student
Table default.student stats: [numFiles=1, numRows=1, totalSize=8, rawDataSize=7]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 4.2 sec   HDFS Read: 3557 HDFS Write: 79 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 200 msec
OK
Time taken: 49.347 seconds

9、查询表中数据

hive> select * from student;
OK
1000	ss
Time taken: 0.098 seconds, Fetched: 1 row(s)

10、退出hive

hive> quit;

三、使用MySQL存储Metastore

3.1 为何使用MySQL储存Metastore？

如果我们已经打开了一个 hive 窗口，那么当我们再打开一个客户端启动 hive，会产生 java.sql.SQLException 异常

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

原因是：Metastore 默认存储在自带的 derby 数据库中，推荐使用 MySQL 存储 Metastore。

3.2 MySQL安装

1、查看 mysql 是否安装，如果安装了，卸载 mysql

(1) 查看

[test@hadoop151 ~]$ rpm -qa | grep mysql
mysql-libs-5.1.73-7.e