一、Hive安装地址
1、Hive官网地址
http://hive.apache.org/
2、文档查看地址
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
3、下载地址
http://archive.apache.org/dist/hive/
二、Hive安装部署
2.1 Hive安装及配置
1、把 apache-hive-1.2.1-bin.tar.gz 上传到 linux 的 /opt/software 目录下
[test@hadoop151 opt]$ cd software/
[test@hadoop151 software]$ ll
总用量 464860
-rw-rw-r-- 1 test test 92834839 2月 14 09:48 apache-hive-1.2.1-bin.tar.gz
-rw-rw-r--. 1 test test 197657687 1月 25 10:41 hadoop-2.7.2.tar.gz
-rw-rw-r--. 1 test test 185515842 1月 25 09:33 jdk-8u144-linux-x64.tar.gz
2、解压 apache-hive-1.2.1-bin.tar.gz 到 /opt/module/ 目录下面
[test@hadoop151 software]$ tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /opt/module/
3、修改 apache-hive-1.2.1-bin.tar.gz 的名称为 hive
[test@hadoop151 module]$ mv apache-hive-1.2.1-bin/ hive
[test@hadoop151 module]$ ll
总用量 16
drwxr-xr-x. 15 test test 4096 2月 3 21:53 hadoop-2.7.2
drwxrwxr-x 8 test test 4096 2月 14 09:49 hive
drwxrwxr-x 2 test test 4096 2月 4 19:16 input
drwxr-xr-x. 8 test test 4096 7月 22 2017 jdk1.8.0_144
4、修改/opt/module/hive/conf 目录下的 hive-env.sh.template 名称为 hive-env.sh
[test@hadoop151 conf]$ mv hive-env.sh.template hive-env.sh
5、配置 hive-env.sh 文件
(1) 配置 HADOOP_HOME 路径
export HADOOP_HOME=/opt/module/hadoop-2.7.2
(2) 配置 HIVE_CONF_DIR 路径
export HIVE_CONF_DIR=/opt/module/hive/conf
2.2 Hadoop集群配置
1、必须启动 hdfs 和 yarn
[test@hadoop151 conf]$ start-dfs.sh
Starting namenodes on [hadoop151]
hadoop151: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-test-namenode-hadoop151.out
hadoop151: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-test-datanode-hadoop151.out
hadoop152: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-test-datanode-hadoop152.out
hadoop153: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-test-datanode-hadoop153.out
Starting secondary namenodes [hadoop153]
hadoop153: starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-test-secondarynamenode-hadoop153.out
[test@hadoop152 ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-test-resourcemanager-hadoop152.out
hadoop153: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-test-nodemanager-hadoop153.out
hadoop151: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-test-nodemanager-hadoop151.out
hadoop152: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-test-nodemanager-hadoop152.out
2、在 HDFS 上创建 /tmp 和 /user/hive/warehouse 两个目录并修改他们的同组权限可写(可不操作,系统会自动创建)
[test@hadoop153 ~]$ hadoop fs -mkdir /tmp
[test@hadoop153 ~]$ hadoop fs -mkdir -p /user/hive/warehouse
[test@hadoop153 ~]$ hadoop fs -chmod g+w /tmp
[test@hadoop153 ~]$ hadoop fs -chmod g+w /user/hive/warehouse
2.3 Hive基本操作
1、启动hive
[test@hadoop151 bin]$ hive
Logging initialized using configuration in jar:file:/opt/module/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive>
2、查看数据库
hive> show databases;
OK
default
Time taken: 1.355 seconds, Fetched: 1 row(s)
3、打开默认的数据库
hive> use default;
OK
Time taken: 0.075 seconds
4、显示 default 数据库中的表
hive> show tables;
OK
Time taken: 0.063 seconds
5、创建一张表
hive> create table student(id int, name string);
OK
Time taken: 0.584 seconds
6、显示数据库中有几张表
hive> show tables;
OK
student
Time taken: 0.022 seconds, Fetched: 1 row(s)
7、查看表的结构
hive> desc student;
OK
id int
name string
Time taken: 0.206 seconds, Fetched: 2 row(s)
8、向表中插入数据
hive> insert into student values(1000,"ss");
Query ID = test_20200214104152_6f97830a-f3f8-4c26-9744-c77441cefd05
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581645695424_0001, Tracking URL = http://hadoop152:8088/proxy/application_1581645695424_0001/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1581645695424_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-14 10:42:18,853 Stage-1 map = 0%, reduce = 0%
2020-02-14 10:42:38,866 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.2 sec
MapReduce Total cumulative CPU time: 4 seconds 200 msec
Ended Job = job_1581645695424_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop151:9000/user/hive/warehouse/student/.hive-staging_hive_2020-02-14_10-41-52_166_5255955701507975136-1/-ext-10000
Loading data to table default.student
Table default.student stats: [numFiles=1, numRows=1, totalSize=8, rawDataSize=7]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 4.2 sec HDFS Read: 3557 HDFS Write: 79 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 200 msec
OK
Time taken: 49.347 seconds
9、查询表中数据
hive> select * from student;
OK
1000 ss
Time taken: 0.098 seconds, Fetched: 1 row(s)
10、退出hive
hive> quit;
三、使用MySQL存储Metastore
3.1 为何使用MySQL储存Metastore?
如果我们已经打开了一个 hive 窗口,那么当我们再打开一个客户端启动 hive,会产生 java.sql.SQLException 异常
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
原因是:Metastore 默认存储在自带的 derby 数据库中,推荐使用 MySQL 存储 Metastore。
3.2 MySQL安装
1、查看 mysql 是否安装,如果安装了,卸载 mysql
(1) 查看
[test@hadoop151 ~]$ rpm -qa | grep mysql
mysql-libs-5.1.73-7.e