mybatis 学习笔记

mybatis的xml中的比较符号处理方法

第一种方法:
用转义字符把><替换掉,mapper文件示例代码:

1
2
3
4
5
6
<if test="createDateStart != null ">
AND create_date &gt;= #{createDateStart,jdbcType=DATE}
</if>
<if test="createDateEnd != null ">
AND create_date &lt;= #{createDateEnd,jdbcType=DATE}
</if>

XML转义字符表

字符 符号 解释
&lt; < 小于号
&gt; > 大于号
&amp; &
&apos; 单引号
&quot; 双引号

第二种方法:
xml格式中,不允许出现类似>这样的字符,但是可以使用<![CDATA[ ]]>符号进行说明,将此类符号不进行解析,mapper文件示例代码:

1
2
3
4
5
6
<if test="createDateStart != null ">
AND create_date <![CDATA[ >= ]]> #{createDateStart,jdbcType=DATE}
</if>
<if test="createDateEnd != null ">
AND create_date <![CDATA[ <= ]]> #{createDateEnd,jdbcType=DATE}
</if>

canal 学习笔记

本篇将说说阿里巴巴开源的mysql数据库同步工具canal的使用,详细说明可以参考wiki。主要用途是基于mysql数据库增量日志解析,提供增量数据订阅和消费。

基于日志增量订阅和消费的业务包括

  • 数据库镜像
  • 数据库实时备份
  • 索引构建和实时维护(拆分异构索引、倒排索引等)
  • 业务 cache 刷新
  • 带业务逻辑的增量数据处理

当前的 canal 支持源端 mysql 版本包括 5.1.x , 5.5.x , 5.6.x , 5.7.x , 8.0.x

工作原理

mysql主备复制原理

  • mysql master 将数据变更写入二进制日志(binary log,其中记录叫做二进制日志事件binary log events,可以通过 show binlog events 进行查看)
  • mysql slave 将 master 的 binary log events 复制到它的中继日志(relay log)
  • mysql slave 重放 relay log 中事件,将数据变更反映它自己的数据

canal工作原理

  • canal 模拟 mysql slave 的交互协议,伪装自己为 mysql slave,向 mysql master 发送 dump 请求
  • mysql master 收到 dump 请求,开始推送 binary log 给 slave (即 canal)
  • canal 解析 binary log 对象(原始为 byte 流)

配置mysql服务器

在文件/etc/mysql/conf.d/mysql.cnf中增加如下配置信息:

1
2
3
4
[mysqld]
log_bin=mysql-bin # 开启 binlog,其中 mysql-bin 是日志名称前缀
binlog_format=ROW # 选择 ROW 模式
server_id=1 # 默认值是0,如果使用默认值则不能和从节点通信,这个值的区间是:1到(2^32)-1。注意不要和 canal 的 slaveId 重复

配置后,重启mysql服务器,验证是否配置成功:

SHOW VARIABLES LIKE 'server_id';
SHOW VARIABLES LIKE 'log_%';
SHOW VARIABLES LIKE 'binlog_format';

创建 canal 链接 mysql 的账号,并分配作为 mysql slave 的权限,如果已有账户则可直接 grant

CREATE USER canal IDENTIFIED BY 'canal';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;

安装canal

下载,可以在releases页面进行下载

1
2
3
4
5
6
7
8
$ wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.deployer-1.1.4.tar.gz

$ mkdir ~/canal
$ tar xf canal.deployer-1.1.4.tar.gz -C ~/canal

$ cd ~/canal
$ ls
bin conf lib logs

配置canal server
配置端口、用户名和访问密码

1
2
3
4
5
6
$ vi ~/canal/conf/canal.properties

canal.port = 11111
canal.user = canal
canal.passwd = E3619321C1A937C46A0D8BD1DAC39F93B27D4458 # 这个是加密后的密码,未加密前是canal
canal.destinations = example # 当前默认开启了一个名为example的instance,多个之间用逗号(,)分隔

配置canal instance

1
2
3
4
5
6
7
8
$ vi ~/canal/conf/example/instance.properties

canal.instance.mysql.slaveId=1234 # mysql serverId , v1.0.26+ will autoGen
canal.instance.master.address=127.0.0.1:3306
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
canal.instance.filter.regex=.*\\..*

启动canal

1
2
$ cd ~/canal
$ sh bin/startup.sh

查看 server 日志

1
2
3
4
5
6
7
$ more logs/canal/canal.log

2021-01-22 20:12:11.735 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## set default uncaught exception handler
2021-01-22 20:12:11.797 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## load canal configurations
2021-01-22 20:12:11.813 [main] INFO com.alibaba.otter.canal.deployer.CanalStarter - ## start the canal server.
2021-01-22 20:12:11.869 [main] INFO com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[172.18.0.1(172.18.0.1):11111]
2021-01-22 20:12:13.779 [main] INFO com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......

查看 instance 的日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ more logs/example/example.log

2021-01-22 20:12:12.428 [main] INFO c.a.o.c.i.spring.support.PropertyPlaceholderConfigurer - Loading properties file from class path resource [canal.properties]
2021-01-22 20:12:12.436 [main] INFO c.a.o.c.i.spring.support.PropertyPlaceholderConfigurer - Loading properties file from class path resource [example/instance.properties]
2021-01-22 20:12:12.722 [main] WARN o.s.beans.GenericTypeAwarePropertyDescriptor - Invalid JavaBean property 'connectionCharset' being accessed! Ambiguous write methods found next to actually used [public void com.alibaba.otter.canal.parse.inbound.mysql.AbstractMysqlEventParser.setConnectionCharset(java.nio.charset.Charset)]: [public void com.alibaba.otter.canal.parse.inbound.mysql.AbstractMysqlEventParser.setConnectionCharset(java.lang.String)]
2021-01-22 20:12:12.803 [main] INFO c.a.o.c.i.spring.support.PropertyPlaceholderConfigurer - Loading properties file from class path resource [canal.properties]
2021-01-22 20:12:12.803 [main] INFO c.a.o.c.i.spring.support.PropertyPlaceholderConfigurer - Loading properties file from class path resource [example/instance.properties]
2021-01-22 20:12:13.524 [main] INFO c.a.otter.canal.instance.spring.CanalInstanceWithSpring - start CannalInstance for 1-example
2021-01-22 20:12:13.536 [main] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table filter : ^.*\..*$
2021-01-22 20:12:13.536 [main] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table black filter :
2021-01-22 20:12:13.570 [main] INFO c.a.otter.canal.instance.core.AbstractCanalInstance - start successful....
2021-01-22 20:12:13.801 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
2021-01-22 20:12:13.801 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prepare to find start position just show master status
2021-01-22 20:12:15.240 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> find start position successfully, EntryPosition[included=false,journalName=mysql-bin.000003,position=4,serverId=1,gtid=<null>,timestamp=1591628633000] cost : 1408ms , the next step is binlog dump

关闭

1
2
$ cd ~/canal
$ sh bin/stop.sh

JAVA使用示例

在JAVA项目中的pom.xml加入依赖

1
2
3
4
5
<dependency>
<groupId>com.alibaba.otter</groupId>
<artifactId>canal.client</artifactId>
<version>1.1.4</version>
</dependency>

JAVA代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
package com.hewentian.canal;

import java.net.InetSocketAddress;
import java.util.Date;
import java.util.List;

import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.protocol.Message;
import com.alibaba.otter.canal.protocol.CanalEntry.Column;
import com.alibaba.otter.canal.protocol.CanalEntry.Entry;
import com.alibaba.otter.canal.protocol.CanalEntry.EntryType;
import com.alibaba.otter.canal.protocol.CanalEntry.EventType;
import com.alibaba.otter.canal.protocol.CanalEntry.RowChange;
import com.alibaba.otter.canal.protocol.CanalEntry.RowData;

public class SimpleCanalClientExample {

public static void main(String args[]) {
// 创建链接
CanalConnector connector = CanalConnectors.newSingleConnector(
new InetSocketAddress("192.168.56.113", 11111),
"example", "canal", "canal");

int batchSize = 1000;
int emptyCount = 0;

try {
connector.connect();

//订阅 监控的 数据库.表
// connector.subscribe("test.t_user");
connector.subscribe(".*\\..*");
connector.rollback();
int totalEmptyCount = 100;

while (emptyCount < totalEmptyCount) {
Message message = connector.getWithoutAck(batchSize); // 获取指定数量的数据
long batchId = message.getId();
int size = message.getEntries().size();
System.out.println("batchId: " + batchId);

if (batchId == -1 || size == 0) {
emptyCount++;
System.out.println("empty count: " + emptyCount);
try {
Thread.sleep(2000);
} catch (InterruptedException e) {
}
} else {
emptyCount = 0;
// System.out.printf("message[batchId=%s,size=%s] \n", batchId, size);
printEntry(message.getEntries());
}

connector.ack(batchId); // 提交确认
// connector.rollback(batchId); // 处理失败, 回滚数据
}

System.out.println("empty too many times, exit");
} finally {
connector.disconnect();
}
}

private static void printEntry(List<Entry> entrys) {
for (Entry entry : entrys) {
if (entry.getEntryType() == EntryType.TRANSACTIONBEGIN || entry.getEntryType() == EntryType.TRANSACTIONEND) {
continue;
}

RowChange rowChage;
try {
rowChage = RowChange.parseFrom(entry.getStoreValue());
} catch (Exception e) {
throw new RuntimeException("ERROR ## parser of eromanga-event has an error, data:" + entry.toString(), e);
}

EventType eventType = rowChage.getEventType();
long delayTime = new Date().getTime() - entry.getHeader().getExecuteTime();
System.out.println(String.format("================ binlog[%s:%s], name[%s,%s], eventType: %s, delayTime: %s",
entry.getHeader().getLogfileName(), entry.getHeader().getLogfileOffset(),
entry.getHeader().getSchemaName(), entry.getHeader().getTableName(),
eventType, delayTime));

// DDL数据,打印SQL
if (eventType == EventType.QUERY || rowChage.getIsDdl()) {
System.out.println("sql -----> " + rowChage.getSql());
}

// DML数据,打印字段信息
for (RowData rowData : rowChage.getRowDatasList()) {
if (eventType == EventType.DELETE) {
printColumn(rowData.getBeforeColumnsList());
} else if (eventType == EventType.INSERT) {
printColumn(rowData.getAfterColumnsList());
} else {
System.out.println("---------- before");
printColumn(rowData.getBeforeColumnsList());
System.out.println("---------- after");
printColumn(rowData.getAfterColumnsList());
}
}
}
}

private static void printColumn(List<Column> columns) {
for (Column column : columns) {
System.out.println(column.getName() + " : " + column.getValue() + ", update = " + column.getUpdated());
}
}

}

运行上面的JAVA代码后,可以从控制台看到类似消息:

1
2
3
4
empty count: 1
empty count: 2
empty count: 3
empty count: 4

此时代表当前数据库无变更数据

触发数据库变更

1
2
3
4
5
6
7
8
9
10
11
use test;

CREATE TABLE `t_user` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'ID',
`name` varchar(20) DEFAULT NULL COMMENT '用户名',
PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC COMMENT='用户表';

INSERT INTO t_user(id,name) VALUE(1,'Scott');
UPDATE t_user SET name = 'Tiger' WHERE id = 1;
DELETE FROM t_user WHERE id = 1;

可以从控制台中看到:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
empty count: 1
empty count: 2
empty count: 3
empty count: 4
empty count: 5
empty count: 6
================ binlog[mysql-bin.000003:2626], name[test,t_user], eventType: INSERT
id : 1, update = true
name : Scott, update = true
empty count: 1
empty count: 2
empty count: 3
empty count: 4
empty count: 5
empty count: 6
empty count: 7
empty count: 8
================ binlog[mysql-bin.000003:2892], name[test,t_user], eventType: UPDATE
---------- before
id : 1, update = false
name : Scott, update = false
---------- after
id : 1, update = false
name : Tiger, update = true
empty count: 1
empty count: 2
empty count: 3
empty count: 4
empty count: 5
empty count: 6
================ binlog[mysql-bin.000003:3170], name[test,t_user], eventType: DELETE
id : 1, update = false
name : Tiger, update = false
empty count: 1
empty count: 2
empty count: 3
empty count: 4
empty count: 5
empty count: 6
empty count: 7
empty count: 8

完整代码在这里

springboot 学习笔记

springboot前后端分离,跨域问题解决

在java后端启动类中加上跨域处理即可,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;

import org.springframework.web.cors.CorsConfiguration;
import org.springframework.web.cors.UrlBasedCorsConfigurationSource;
import org.springframework.web.filter.CorsFilter;

@SpringBootApplication
public class Application {

public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}

@Bean
public CorsFilter corsFilter() {
// 设置跨域
UrlBasedCorsConfigurationSource source = new UrlBasedCorsConfigurationSource();
CorsConfiguration corsConfiguration = new CorsConfiguration();
corsConfiguration.addAllowedOrigin("*");
corsConfiguration.addAllowedHeader("*");
corsConfiguration.addAllowedMethod("*");
source.registerCorsConfiguration("/**", corsConfiguration);
return new CorsFilter(source);
}
}

但是,如果前端发请求时,使用了withCredentials,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$.ajax({
type : "POST",
contentType: "application/json;charset=UTF-8",
url : "http://www.a.com/admin/user/save",
async: true,
xhrFields: {withCredentials: true},
data : JSON.stringify(list),
success : function(result) {
console.log(result);
},
error : function(e) {
console.log(e.status);
console.log(e.responseText);
}
});

并且前端域名为http://www.b.com,则后端需做如下修改(共修改了2行代码):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;

import org.springframework.web.cors.CorsConfiguration;
import org.springframework.web.cors.UrlBasedCorsConfigurationSource;
import org.springframework.web.filter.CorsFilter;

@SpringBootApplication
public class Application {

public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}

@Bean
public CorsFilter corsFilter() {
// 设置跨域
UrlBasedCorsConfigurationSource source = new UrlBasedCorsConfigurationSource();
CorsConfiguration corsConfiguration = new CorsConfiguration();
corsConfiguration.addAllowedOrigin("http://www.b.com"); // 这里要指定域名
corsConfiguration.addAllowedHeader("*");
corsConfiguration.addAllowedMethod("*");
source.registerCorsConfiguration("/**", corsConfiguration);
corsConfiguration.setAllowCredentials(true); // 增加这行
return new CorsFilter(source);
}
}

对于附带身份凭证的请求,服务器不得设置Access-Control-Allow-Origin的值为*。这是因为请求的首部中携带
Cookie信息,如果Access-Control-Allow-Origin的值为*,请求将会失败。而将Access-Control-Allow-Origin
的值设置为http://www.b.com,则请求将成功执行。也就是说Access-Control-Allow-Credentials设置为true的
情况下Access-Control-Allow-Origin不能设置为*

springboot启动时,指定加载的信息

1
2
3
4
5
nohup java -Dloader.path=/home/root/userinfo/libs \
-Dlogging.path=/home/root/logs/userinfo \
-Dspring.config.location=/home/root/userinfo/config/application.yml \
-jar /home/root/userinfo/userinfo-0.0.1-SNAPSHOT.jar \
> /dev/null 2>&1 &

说明:

  1. loader.path用于指定加载一些外部的JAR包,指定到目录;
  2. logging.path指定日志文件存放的目录;
  3. spring.config.location指定要加载的配置文件。

日志管理

如果要编写除控制台输出之外的日志文件,则需在 application.properties 中设置 logging.file 或 logging.path 属性。
logging.file: 设置文件,可以是绝对路径,也可以是相对路径。如: logging.file=my.log
logging.path: 设置目录,会在该目录下创建 spring.log 文件,并写入日志内容,如: logging.path=/var/log

如果只配置 logging.file ,会在项目的当前路径下生成一个 xxx.log 日志文件。
如果只配置 logging.path ,在 /var/log 文件夹生成一个日志文件为 spring.log

注:二者不能同时使用,如若同时使用,则只有 logging.file 生效

默认情况下,日志文件的大小达到 10MB 时会切分一次,产生新的日志文件,默认级别为: ERROR、WARN、INFO

通过命令行设置属性值

相信使用过一段时间Spring Boot的用户,一定知道这条命令:

java -jar xxx.jar --server.port=8888 --spring.profiles.active=dev

通过使用--server.port属性来设置xxx.jar应用的端口为8888。

在命令行运行时,连续的两个减号--就是对application.properties中的属性值进行覆盖的标识。

通过命令行来修改属性值固然提供了不错的便利性,但是通过命令行就能更改应用运行的参数,那岂不是很不安全?是的,所以Spring Boot也提供了屏蔽命令行访问属性的设置,只需要这句设置就能屏蔽:

SpringApplication.setAddCommandLineProperties(false);

spark 学习笔记

standalone模式

为便于学习spark,这里安装standalone模式,在下面三台机器搭建一个简单的集群。我们将slave3作为spark的master节点,而slave1、slave2作为从节点。

slave1:
    ip: 192.168.56.111
    hostname: hadoop-host-slave-1
slave2:
    ip: 192.168.56.112
    hostname: hadoop-host-slave-2
slave3:
    ip: 192.168.56.113
    hostname: hadoop-host-slave-3

安装过程参考官方文档:
http://spark.apache.org/docs/latest/spark-standalone.html

目前spark的稳定版本为2.4.4,可以在下面的地址找到然后下载。
http://spark.apache.org/downloads.html
https://www.apache.org/dyn/closer.lua/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz

在slave3中启动spark作为master

1
2
3
$ tar xf spark-2.4.4-bin-hadoop2.7.tgz
$ cd spark-2.4.4-bin-hadoop2.7/
$ ./sbin/start-master.sh

启动后,在启动日志中可以看到部分输出,如下:

20/01/14 16:54:24 INFO Master: Starting Spark master at spark://hadoop-host-slave-3:7077
20/01/14 16:54:24 INFO Master: Running Spark version 2.4.4
20/01/14 16:54:25 INFO Utils: Successfully started service 'MasterUI' on port 8080.
20/01/14 16:54:25 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://hadoop-host-slave-3:8080
20/01/14 16:54:25 INFO Master: I have been elected leader! New state: ALIVE

接着分别在slave1、slave2中启动spark作为从节点

1
2
3
$ tar xf spark-2.4.4-bin-hadoop2.7.tgz
$ cd spark-2.4.4-bin-hadoop2.7/
$ ./sbin/start-slave.sh spark://hadoop-host-slave-3:7077

在浏览器中可以通过访问如下地址,看到整个集群的情况
http://hadoop-host-slave-3:8080/

可以通过交互式命令连接到集群

./bin/spark-shell --master spark://IP:PORT

也可以将程序打包成jar包,然后在主节点上面提交给集群,如下

./bin/spark-submit --class com.hewentian.spark.SparkPi --master spark://hadoop-host-slave-3:7077 /home/hadoop/spark-1.0-SNAPSHOT.jar

我们可以通过配置,在一台机器一键启动所有节点的spark进程。分别在三个节点执行如下操作:

1
2
3
4
5
6
7
$ cd spark-2.4.4-bin-hadoop2.7/conf
$ cp slaves.template slaves
$ vi slaves

去掉文件中原来的localhost配置,并加入下面两行
hadoop-host-slave-1
hadoop-host-slave-2

这样在slave3节点上就可以通过如下命令,一次将master、slave进程启动

1
2
$ cd spark-2.4.4-bin-hadoop2.7/
$ ./sbin/start-all.sh

读取mysql数据

1
2
3
4
5
$ cd /home/hadoop/spark-2.4.4-bin-hadoop2.7/
$ ./bin/spark-shell --jars /home/hadoop/spark-2.4.4-bin-hadoop2.7/jars/mysql-connector-java-5.1.25.jar

scala> val sqlContext = spark.sqlContext
scala> val df = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://mysql.hewentian.com:3306/bfg_db?useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull").option("driver", "com.mysql.jdbc.Driver").option("user", "bfg_db").option("password", "iE1zNB?A91*YbQ9hK").option("dbtable", "student").load()

yarn模式

Spark支持将作业提交到yarn上运行,此时不需要启动master节点,也不需要启动worker节点。

同样需在slaves文件下配置worker节点的机器,像standalone模式那样。

只需在spark中指定hadoop的配置文件目录即可,在所有spark节点中执行如下操作:

1
2
3
4
5
6
$ cd spark-2.4.4-bin-hadoop2.7/conf
$ cp spark-env.sh.template spark-env.sh
$ vi spark-env.sh

配置这行即可
HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.3/etc/hadoop

要保证hdfs和yarn已经启动。

以交互方式连接到yarn模式的spark
./bin/spark-shell –master yarn –deploy-mode client

也可以将程序打包成jar包,然后在主节点上面提交给集群,如下
./bin/spark-submit –class path.to.your.Class –master yarn –deploy-mode cluster [options] [app options]

例如:
./bin/spark-submit \
–class org.apache.spark.examples.SparkPi \
–master yarn \
–deploy-mode cluster \
–driver-memory 4g \
–executor-memory 2g \
–executor-cores 1 \
–queue thequeue \
examples/jars/spark-examples*.jar \
10

如果还想将此yarn模式配置成基于zookeeper的高可用,如将主机hadoop-host-slave-2配置成备用的master。则需如下配置:
分别在所有spark节点中执行如下操作

1
2
3
4
5
$ cd spark-2.4.4-bin-hadoop2.7/conf
$ vi spark-env.sh

配置zookeeper地址
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hadoop-host-master:2181,hadoop-host-slave-1:2181,hadoop-host-slave-2:2181 -Dspark.deploy.zookeeper.dir=/spark"

启动顺序:

  1. 启动的zookeeper集群;
  2. 启动的hdfs和yarn;
  3. 在主节点hadoop-host-slave-3执行如下脚本,启动所有服务./sbin/start-all.sh
  4. 在节点hadoop-host-slave-2执行如下脚本,启动它作为备份主节点./sbin/start-master.sh

通过浏览器,观看集群状态:
http://hadoop-host-slave-2:8080/
http://hadoop-host-slave-3:8080/

未完待续……

scala 学习笔记

在Linux下面安装SDK

Scala依赖于JAVA,所以,在安装Scala之前必须先安装JDK,安装过程可以参考:安装 JDK

第一步:下载Scala,在https://www.scala-lang.org/download/可以找到下载链接,目前最新版本是scala-2.11.12,下载地址:
https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz

第二步:解压安装

1
2
$ cd /home/hewentian/Downloads
$ tar xf scala-2.11.12.tgz

解压后会得到scala-2.11.12文件夹,接着就是将解压出来的文件夹移动到/usr/local的目录下。在这之前当然需要你拥有root的权限su root(对于ubuntu)再输入root账户的密码,做好这些准备之后,我们就可以把sdk的文件移动到我们想要的位置了。

1
2
3
$ su root
Password:
$ mv scala-2.11.12 /usr/local/

第三步:修改环境变量,若PATH已存在,则用冒号作间隔,将sdk的bin目录地址加上,配置如下:

1
2
3
4
vi /etc/profile 

export SCALA_HOME=/usr/local/scala-2.11.12
export PATH=$PATH:$SCALA_HOME/bin

记得保存,最后执行

1
$ source /etc/profile

第四步:验证安装是否成功,执行如下命令:

1
2
3
4
5
$ scalac -version
Scala compiler version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL
$
$ scala -version
Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

看到上面的输出,证明安装成功。

在IDEA中安装Scala开发插件

启动IDEA,在启动界面中依次选择:Configure->Settings->Plugins,在右则搜索框中输入Scala进行搜索,找到如下图所示插件。

点击Install进行安装。安装成功后,需重启IDEA。重启IDEA后,在启动界面中点击Create New Project,如果能看到有Scala选项,则证明安装成功。

接着在IDEA中配置SDK,点击上图的Cancel,回到刚才的启动界面。依次选择:Configure->Project Defaults->Project Structure,在弹出的界面中选择Global Libraries,然后在右则点击+号,并在弹出的下拉列表中选择Scala SDK,配置如下:

在上图中点击Apply,至此IDEA中Scala开发环境配置完成。

在IDEA中已创建好的java项目中编写scala代码

  1. 在src/main/下创建一个目录scala;
  2. 将src/main/scala设置成sources目录:File -> Project Structure -> Modules -> 在右则选择我们的项目 -> 然后在右则选择Sources这个tab -> 选中我们刚才的目录src/main/scala -> 点击上面的 Mark as: Sources -> 最后点右下角的OK;
  3. 有可能还要将刚才添加的SDK(Global Libraries)包删除,重新再添加。

docker 学习笔记

今天开始学习一下docker,因为它越来越流行了。首先,我们来安装一下。因为我的系统为Ubuntu Linux,所以我的安装过程,参照这里:
https://docs.docker.com/install/linux/docker-ce/ubuntu/

安装过程如下,在系统上首次安装的时候需要执行安装前的准备工作

安装前的准备工作

第一步:首先卸载系统自带的旧版docker(如系统有预安装docker)

1
$ sudo apt-get remove docker docker-engine docker.io containerd runc

第二步:更新apt包索引:

1
$ sudo apt-get update

第三步:安装允许apt通过HTTPS访问仓库的软件包:

1
2
3
4
5
6
$ sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common

第四步: 添加docker官方的GPG key:

1
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

验证添加KEY的指纹是否是9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88,我们只需搜索最后8个字符即可:

1
2
3
4
5
$ sudo apt-key fingerprint 0EBFCD88
pub rsa4096 2017-02-22 [SCEA]
9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88
uid [ unknown] Docker Release (CE deb) <docker@docker.com>
sub rsa4096 2017-02-22 [S]

第五步:将docker的下载路径添加到下载源中:

1
2
3
4
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"

开始正式安装

第一步:更新apt包索引:

1
$ sudo apt-get update

第二步:开始安装docker最新的社区版:

1
$ sudo apt-get install docker-ce docker-ce-cli containerd.io

安装结束后,docker的daemon进程默认自动启动,否则需要手动启动。

1
$ sudo service docker start

可选命令有:

service docker {start|stop|restart|status}

第三步:验证安装是否正确无误,通过运行一个测试用例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ sudo docker run hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
1b930d010525: Pull complete
Digest: sha256:6540fc08ee6e6b7b63468dc3317e3303aae178cb8a45ed3123180328bcc1d20f
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/

For more examples and ideas, visit:
https://docs.docker.com/get-started/

至此,安装完成。

卸载方法

要卸载软件本身和删除相关的docker文件

1
2
$ sudo apt-get purge docker-ce
$ sudo rm -rf /var/lib/docker

docker的一些操作命令

一个简单的示例:

1
$ sudo docker run ubuntu:18.04 /bin/echo "Hello world"

或者交互式:

1
$ sudo docker run -i -t ubuntu:18.04 /bin/bash

以后台模式启动容器:

1
2
3
$ sudo docker run -d ubuntu:18.04 /bin/sh -c "while true; do echo hello world; sleep 1; done"

d7c1549c2a495499270eb31819ce5e9ea9748ab8126f025f33b06612491fd447

输出的那一长长的字符串,是容器ID。

查看容器内的标准输出:

sudo docker logs -f {CONTAINER ID}|{NAMES}

示例如下:

1
2
3
4
5
6
$ sudo docker logs d7c1549c2a495499270eb31819ce5e9ea9748ab8126f025f33b06612491fd447
hello world
hello world
hello world
hello world
hello world

查看正在运行中的容器

1
2
3
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d7c1549c2a49 ubuntu:18.04 "/bin/sh -c 'while t…" 4 minutes ago Up 4 minutes friendly_minsky

停止容器,可以使用容器ID或者容器名

1
$ sudo docker stop {CONTAINER ID}|{NAMES}

docker的帮助命令

在命令行中直接输入docker即可看到它的提示,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
$ docker

Usage: docker [OPTIONS] COMMAND

A self-sufficient runtime for containers

Options:
--config string Location of client config files (default "/home/hewentian/.docker")
-c, --context string Name of the context to use to connect to the daemon (overrides DOCKER_HOST env var and default context set with "docker context use")
-D, --debug Enable debug mode
-H, --host list Daemon socket(s) to connect to
-l, --log-level string Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")
--tls Use TLS; implied by --tlsverify
--tlscacert string Trust certs signed only by this CA (default "/home/hewentian/.docker/ca.pem")
--tlscert string Path to TLS certificate file (default "/home/hewentian/.docker/cert.pem")
--tlskey string Path to TLS key file (default "/home/hewentian/.docker/key.pem")
--tlsverify Use TLS and verify the remote
-v, --version Print version information and quit

Management Commands:
builder Manage builds
config Manage Docker configs
container Manage containers
context Manage contexts
engine Manage the docker engine
image Manage images
network Manage networks
node Manage Swarm nodes
plugin Manage plugins
secret Manage Docker secrets
service Manage services
stack Manage Docker stacks
swarm Manage Swarm
system Manage Docker
trust Manage trust on Docker images
volume Manage volumes

Commands:
attach Attach local standard input, output, and error streams to a running container
build Build an image from a Dockerfile
commit Create a new image from a container's changes
cp Copy files/folders between a container and the local filesystem
create Create a new container
deploy Deploy a new stack or update an existing stack
diff Inspect changes to files or directories on a container's filesystem
events Get real time events from the server
exec Run a command in a running container
export Export a container's filesystem as a tar archive
history Show the history of an image
images List images
import Import the contents from a tarball to create a filesystem image
info Display system-wide information
inspect Return low-level information on Docker objects
kill Kill one or more running containers
load Load an image from a tar archive or STDIN
login Log in to a Docker registry
logout Log out from a Docker registry
logs Fetch the logs of a container
pause Pause all processes within one or more containers
port List port mappings or a specific mapping for the container
ps List containers
pull Pull an image or a repository from a registry
push Push an image or a repository to a registry
rename Rename a container
restart Restart one or more containers
rm Remove one or more containers
rmi Remove one or more images
run Run a command in a new container
save Save one or more images to a tar archive (streamed to STDOUT by default)
search Search the Docker Hub for images
start Start one or more stopped containers
stats Display a live stream of container(s) resource usage statistics
stop Stop one or more running containers
tag Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
top Display the running processes of a container
unpause Unpause all processes within one or more containers
update Update configuration of one or more containers
version Show the Docker version information
wait Block until one or more containers stop, then print their exit codes

Run 'docker COMMAND --help' for more information on a command.

查找镜像

默认从 https://hub.docker.com/ 查找我们需要的镜像,例如,搜索httpd

1
2
3
4
5
6
7
$ sudo docker search httpd
NAME DESCRIPTION STARS OFFICIAL AUTOMATED
httpd The Apache HTTP Server Project 2567 [OK]
centos/httpd 23 [OK]
centos/httpd-24-centos7 Platform for running Apache httpd 2.4 or bui… 22
armhf/httpd The Apache HTTP Server Project 8
polinux/httpd-php Apache with PHP in Docker (Supervisor, CentO… 3 [OK]

下载并运行容器

我们可以在 https://hub.docker.com/ 上面查询所有可用的镜像,找到需要的镜像后,可以下载,例如training/webapp

1
2
3
4
$ sudo docker pull training/webapp
$ sudo docker run -d -P training/webapp python app.py

a2d42ce3df7d0dc34b93095fe3cd526de22f75f2f49a7762e395c32cecae82e5

我们也可以通过-p参数来设置不一样的端口,(格式为 本机端口:容器端口):

1
2
3
4
5
6
7
8
$ sudo docker run -d -p 5001:5000 training/webapp python app.py

2f8c4e68d8fbb3130fb51197218b9024bed5de1c4614cd7cca198a68807b57a9

$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2f8c4e68d8fb training/webapp "python app.py" 27 seconds ago Up 26 seconds 0.0.0.0:5001->5000/tcp infallible_greider
a2d42ce3df7d training/webapp "python app.py" 41 seconds ago Up 40 seconds 0.0.0.0:32769->5000/tcp epic_pasteur

这样在本机的浏览器上面通过如下2种方式,都能访问到应用:
http://localhost:32769/
http://localhost:5001/

查看容器内的进程状况

1
2
3
$ sudo docker top infallible_greider
UID PID PPID C STIME TTY TIME CMD
root 28721 28694 0 14:49 ? 00:00:00 python app.py

查看指定容器的配置和状态信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ sudo docker inspect infallible_greider
[
{
"Id": "2f8c4e68d8fbb3130fb51197218b9024bed5de1c4614cd7cca198a68807b57a9",
"Created": "2019-07-29T06:49:41.334728611Z",
"Path": "python",
"Args": [
"app.py"
],
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 28721,
"ExitCode": 0,
"Error": "",
"StartedAt": "2019-07-29T06:49:42.075479437Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
"Image": "sha256:6fae60ef344644649a39240b94d73b8ba9c67f898ede85cf8e947a887b3e6557",
"ResolvConfPath": "/var/lib/docker/containers/2f8c4e68d8fbb3130fb51197218b9024bed5de1c4614cd7cca198a68807b57a9/resolv.conf",
"HostnamePath": "/var/lib/docker/containers/2f8c4e68d8fbb3130fb51197218b9024bed5de1c4614cd7cca198a68807b57a9/hostname",
"HostsPath": "/var/lib/docker/containers/2f8c4e68d8fbb3130fb51197218b9024bed5de1c4614cd7cca198a68807b57a9/hosts",
"LogPath": "/var/lib/docker/containers/2f8c4e68d8fbb3130fb51197218b9024bed5de1c4614cd7cca198a68807b57a9/2f8c4e68d8fbb3130fb51197218b9024bed5de1c4614cd7cca198a68807b57a9-json.log",
"Name": "/infallible_greider",
"RestartCount": 0,
"Driver": "overlay2",
"Platform": "linux",
"MountLabel": "",
"ProcessLabel": "",
"AppArmorProfile": "docker-default",
...

容器可以停止、重新启动和移除

1
2
3
$ sudo docker start infallible_greider
$ sudo docker stop infallible_greider
$ sudo docker rm infallible_greider

列出本机上的所有镜像

1
2
3
4
5
6
$ sudo docker images

REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 18.04 3556258649b2 5 days ago 64.2MB
hello-world latest fce289e99eb9 6 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

列出本机上所有已创建的容器

1
2
3
4
5
6
7
8
9
10
11
12
13
$ sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ee4b72860ab3 ubuntu:18.04 "/bin/sh -c 'while t…" 8 minutes ago Up 8 minutes brave_edison
3063341debf2 ubuntu:18.04 "/bin/bash" 9 minutes ago Exited (0) 9 minutes ago friendly_poitras
6623f20692a4 ubuntu:18.04 "/bin/echo 'Hello wo…" 9 minutes ago Exited (0) 9 minutes ago intelligent_roentgen
a2d42ce3df7d training/webapp "python app.py" 2 hours ago Exited (137) 48 minutes ago epic_pasteur
1058cf5fafad training/webapp "python app.py" 2 hours ago Exited (137) 2 hours ago goofy_lewin
2c113cc40c51 training/webapp "python app.py" 2 hours ago Exited (137) 2 hours ago confident_gagarin
c9102f1d9541 training/webapp "python app.py" 3 hours ago Exited (137) 2 hours ago crazy_borg
a4e84d331fbd hello-world "/hello" 6 hours ago Exited (0) 6 hours ago epic_fermi
b03a6f03bd39 hello-world "/hello" 2 days ago Exited (0) 2 days ago competent_cartwright
b7c3f4699549 hello-world "/hello" 2 days ago Exited (0) 2 days ago crazy_brattain
bb7eb9e197b4 hello-world "/hello" 2 days ago Exited (0) 2 days ago pensive_lalande

修改镜像

我们以已存在的ubuntu镜像为原始版本,创建新的镜像

1
2
3
$ sudo docker run -t -i ubuntu:18.04 /bin/bash
root@d23dc5d88f11:/# apt-get update
root@d23dc5d88f11:/# exit

查看最后创建的容器

1
2
3
$ sudo docker ps -l
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d23dc5d88f11 ubuntu:18.04 "/bin/bash" 2 minutes ago Exited (0) About a minute ago amazing_vaughan

可以看到ID为d23dc5d88f11的容器为我们刚才创建的容器,提交这个容器:

1
2
3
$ sudo docker commit -m="exec apt-get update" -a="hewentian" d23dc5d88f11 hewentian/ubuntu:v2

sha256:2bdf86d10fbc18204e04fe5a30dee06dfeb30683247c41e85e8cfe6d66d5d9d6

查看我们刚刚创建的镜像

1
2
3
4
5
6
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hewentian/ubuntu v2 2bdf86d10fbc 6 seconds ago 91MB
ubuntu 18.04 3556258649b2 5 days ago 64.2MB
hello-world latest fce289e99eb9 6 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

然后,我们就可以使用我们新建的镜像创建容器了

1
2
3
4
5
6
7
$ sudo docker run -it hewentian/ubuntu:v2 /bin/bash
root@10adcace776b:/# cat /proc/version
Linux version 4.15.0-47-generic (buildd@lgw01-amd64-001) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019
root@10adcace776b:/# whoami
root
root@10adcace776b:/# exit
exit

创建镜像

从零开始创建一个镜像,我们需要一个Dockerfile文件,示例如下:

1
2
3
4
5
6
7
8
9
10
$ cat /home/hewentian/Documents/docker/ubuntu/Dockerfile

FROM ubuntu:18.04
MAINTAINER hewentian "wentian.he@qq.com"

ENV AUTHOR="hewentian"
WORKDIR /tmp/
RUN /usr/bin/touch he.txt
RUN /bin/echo "The author is $AUTHOR, created at " >> /tmp/he.txt
RUN /bin/date >> /tmp/he.txt

开始创建镜像

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
$ sudo docker build -t hewentian/ubuntu:v2.1 -f /home/hewentian/Documents/docker/ubuntu/Dockerfile .

Sending build context to Docker daemon 2.56kB
Step 1/7 : FROM ubuntu:18.04
---> 3556258649b2
Step 2/7 : MAINTAINER hewentian "wentian.he@qq.com"
---> Running in 9684fd7dab36
Removing intermediate container 9684fd7dab36
---> 87f25ba61a99
Step 3/7 : ENV AUTHOR="hewentian"
---> Running in 22e933129053
Removing intermediate container 22e933129053
---> 23c5a574b01c
Step 4/7 : WORKDIR /tmp/
---> Running in a4341dbc2164
Removing intermediate container a4341dbc2164
---> 94663075f2b0
Step 5/7 : RUN /usr/bin/touch he.txt
---> Running in e54479ffd964
Removing intermediate container e54479ffd964
---> a196207c63e9
Step 6/7 : RUN /bin/echo "The author is $AUTHOR, created at " >> /tmp/he.txt
---> Running in 89d010bd1b78
Removing intermediate container 89d010bd1b78
---> 11aa9b6d3605
Step 7/7 : RUN /bin/date >> /tmp/he.txt
---> Running in 66627425d24c
Removing intermediate container 66627425d24c
---> c6cd98aa1461
Successfully built c6cd98aa1461
Successfully tagged hewentian/ubuntu:v2.1

查看生成的镜像

1
2
3
4
5
6
7
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hewentian/ubuntu v2.1 c6cd98aa1461 43 seconds ago 64.2MB
hewentian/ubuntu v2 2bdf86d10fbc 18 hours ago 91MB
ubuntu 18.04 3556258649b2 6 days ago 64.2MB
hello-world latest fce289e99eb9 7 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

用我们新建的镜像创建容器

1
2
3
4
5
6
7
8
9
$ sudo docker run -it hewentian/ubuntu:v2.1 /bin/bash

root@335d56425694:/tmp# ls /tmp/
he.txt
root@335d56425694:/tmp# more /tmp/he.txt
The author is hewentian, created at
Tue Jul 30 03:10:50 UTC 2019
root@335d56425694:/tmp# exit
exit

可见,镜像包含我们自已创建的文件。

将JAR程序部署到容器中

假设我们现在有一个springBoot的WEB项目,里面有一个接口/hello,已经打好了JAR包。我们要将它部署到容器中运行,目录结构如下:

1
2
3
4
5
$ pwd
/home/hewentian/Documents/docker/showIp

$ ls
Dockerfile showIp-1.0.0.jar

我们创建一个镜像,因此需要一个Dockerfile文件,如下:

1
2
3
4
5
6
7
8
$ cat /home/hewentian/Documents/docker/showIp/Dockerfile

FROM java:8
MAINTAINER hewentian "wentian.he@qq.com"

ADD showIp-1.0.0.jar showIp.jar
EXPOSE 8080
ENTRYPOINT ["java","-jar","showIp.jar"]

开始创建镜像

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cd /home/hewentian/Documents/docker/showIp/
$ sudo docker build -t hewentian/show-ip:v1.0.0 -f /home/hewentian/Documents/docker/showIp/Dockerfile .

Sending build context to Docker daemon 13.4MB
Step 1/5 : FROM java:8
---> d23bdf5b1b1b
Step 2/5 : MAINTAINER hewentian "wentian.he@qq.com"
---> Using cache
---> 8ef66d4bf19b
Step 3/5 : ADD showIp-1.0.0.jar showIp.jar
---> 74faf7fe0fdf
Step 4/5 : EXPOSE 8080
---> Running in 605f90040e44
Removing intermediate container 605f90040e44
---> 2de558b34abc
Step 5/5 : ENTRYPOINT ["java","-jar","showIp.jar"]
---> Running in e8436a5ea5e0
Removing intermediate container e8436a5ea5e0
---> 5f89e1fe5e7e
Successfully built 5f89e1fe5e7e
Successfully tagged hewentian/show-ip:v1.0.0

运行新创建的镜像:

1
$ sudo docker run -p 8081:8080 -d hewentian/show-ip:v1.0.0

运行成功后,可以访问:
http://localhost:8081/hello

docker安装nginx

首先拉取镜像

1
2
3
4
5
6
7
8
9
10
11
$ sudo docker search nginx
$ sudo docker pull nginx

$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hewentian/ubuntu v2.1 c6cd98aa1461 4 hours ago 64.2MB
hewentian/ubuntu v2 2bdf86d10fbc 22 hours ago 91MB
nginx latest e445ab08b2be 6 days ago 126MB
ubuntu 18.04 3556258649b2 6 days ago 64.2MB
hello-world latest fce289e99eb9 7 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

使用nginx的默认配置来启动一个容器:

1
2
3
$ sudo docker run --name nginx-test -p 8081:80 -d nginx

838ebabcc937cf9a8e13946f92d104b2eb153cc61f21cb47e7661d6bbe205253

如果启动成功,则可以在浏览器中访问:
http://localhost:8081/

然后,开始部署我们想要的nginx,首先在本机上创建nginx相关文件目录

1
2
$ cd /home/hewentian/Documents/docker
$ mkdir -p nginx/www nginx/logs nginx/conf

将刚才启动的nginx容器内的配置文件,复制到本机中:

1
$ sudo docker cp 838ebabcc937:/etc/nginx/nginx.conf /home/hewentian/Documents/docker/nginx/conf

docker cp: 用于本地主机与容器之间的数据复制

创建nginx欢迎页面/home/hewentian/Documents/docker/nginx/www/index.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to docker nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

启动nginx:

1
$ sudo docker run --name nginx-test2 -p 8082:80 -d -v /home/hewentian/Documents/docker/nginx/www:/usr/share/nginx/html -v /home/hewentian/Documents/docker/nginx/conf/nginx.conf:/etc/nginx/nginx.conf -v /home/hewentian/Documents/docker/nginx/logs:/var/log/nginx nginx

参数说明:

-v /home/hewentian/Documents/docker/nginx/www:/usr/share/nginx/html:将在本机创建的目录,挂载到容器内的/usr/share/nginx/html目录

如果启动成功,则可以在浏览器中访问:
http://localhost:8082/

docker安装redis

先建立保存数据的目录和设置好配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ cd /root/db/redis
$ mkdir data conf
$ cd conf
$ vi redis.conf

requirepass abc12345

appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes

开始安装

1
2
3
4
5
6
7
8
$ sudo docker pull redis
$
$ sudo docker run \
--name redis \
-p 6379:6379 \
-v /root/db/redis/conf/redis.conf:/etc/redis/redis.conf \
-v /root/db/redis/data:/data \
-d redis redis-server /etc/redis/redis.conf

docker安装mysql

先建立保存数据的目录和设置好配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ cd /root/db/mysql
$ mkdir data conf logs
$ cd conf
$ vi mysql.cnf

[client]
port=3306
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
port=3306
character-set-server=utf8
max_connections=100

拉取镜像

1
2
3
4
5
6
7
8
9
10
11
12
$ sudo docker search mysql
$ sudo docker pull mysql:5.6.42

$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hewentian/ubuntu v2.1 c6cd98aa1461 4 hours ago 64.2MB
hewentian/ubuntu v2 2bdf86d10fbc 22 hours ago 91MB
nginx latest e445ab08b2be 6 days ago 126MB
ubuntu 18.04 3556258649b2 6 days ago 64.2MB
hello-world latest fce289e99eb9 7 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB
mysql 5.6.42 27e29668a08a 12 months ago 256MB

运行容器

  1. 简单安装

    1
    2
    3
    4
    5
    $ sudo docker run \
    -itd --name mysql-hwt \
    -p 3306:3306 \
    -e MYSQL_ROOT_PASSWORD=123456 \
    mysql:5.6.42
  2. 详细安装

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    $ sudo docker run \
    -itd --name mysql-hwt \
    -p 3306:3306 \
    -v /root/db/mysql/conf/mysql.cnf:/etc/mysql/conf.d/mysql.cnf \
    -v /root/db/mysql/data:/var/lib/mysql \
    -v /root/db/mysql/logs:/logs \
    -e MYSQL_ROOT_PASSWORD=123456 \
    mysql:5.6.42


    $ sudo docker ps
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    524097ed5349 mysql:5.6.42 "docker-entrypoint.s…" 3 minutes ago Up 3 minutes 0.0.0.0:3306->3306/tcp mysql-hwt

进入mysql,将root用户密码修改,并且禁用root远程登录

1
2
3
4
5
$ sudo docker exec -it mysql-hwt mysql -uroot -p123456

mysql> GRANT ALL ON *.* TO 'root'@'localhost' IDENTIFIED BY 'root';
mysql> DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1');
mysql> FLUSH PRIVILEGES;

修改mysql的默认字符编码为UTF-8,打开容器中的/etc/mysql/conf.d/mysql.cnf,增加如下内容即可(其中最后三行为原有的内容)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
init_connect='SET collation_connection = utf8_unicode_ci'
init_connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_unicode_ci

log_bin=mysql-bin
binlog_format=ROW
server_id=1

重启mysql,然后查看编码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ sudo docker restart mysql-hwt-5.7


mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)

然后创建一个用于操作mysql的简单用户,参考之前的 mysql 学习笔记

docker安装mongo

先建立保存数据的目录和设置好配置文件:

1
2
3
4
5
6
7
8
9
$ cd /root/db/mongodb
$ mkdir configdb data logs
$ cd configdb
$ vi mongod.conf

systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log

开始安装

1
2
3
4
5
6
7
8
9
10
11
$ sudo docker pull mongo
$
$ sudo docker run \
-itd --name mongo \
-p 27017:27017 \
-v /root/db/mongodb/data:/data/db \
-v /root/db/mongodb/configdb:/data/configdb \
-v /root/db/mongodb/logs:/data/logs \
mongo:latest --auth \
-f /data/configdb/mongod.conf \
--bind_ip_all

  1. 进入mongo,添加管理员。创建第一个用户admin,该用户需要有用户管理权限,其角色为root。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    $ sudo docker exec -it mongo mongo
    MongoDB shell version v4.2.7
    connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
    Implicit session: session { "id" : UUID("81ad33aa-6318-4854-98df-9910a8927698") }
    MongoDB server version: 4.2.7
    Welcome to the MongoDB shell.
    For interactive help, type "help".
    For more comprehensive documentation, see
    http://docs.mongodb.org/
    Questions? Try the support group
    http://groups.google.com/group/mongodb-user
    > use admin
    switched to db admin
    >
    >
    > db.createUser({user:"admin",pwd:"12345",roles:["root"]})
    Successfully added user: { "user" : "admin", "roles" : [ "root" ] }
    >
    >
    > show collections
    Warning: unable to run listCollections, attempting to approximate collection names by parsing connectionStatus
    >
    >
    > db.auth("admin","12345")
    1
    >
    > show collections
    system.users
    system.version
    >
  2. 添加数据库用户
    为数据库添加用户,添加用户前需要切换到该数据库,这里简单设置其角色为dbOwner

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    > use bfg
    switched to db bfg
    >
    >
    > db.createUser({user: "bfg", pwd: "bfg100", roles: [{ role: "dbOwner", db: "bfg" }]})
    Successfully added user: {
    "user" : "bfg",
    "roles" : [
    {
    "role" : "dbOwner",
    "db" : "bfg"
    }
    ]
    }

docker安装rabbitmq

  1. 简单安装
    1
    2
    3
    4
    5
    6
    7
    8
    9
    $ sudo docker pull rabbitmq:3.8.9-management
    $
    $ sudo docker run \
    -itd --name rabbitmq-hwt \
    -p 5672:5672 \
    -p 15672:15672 \
    -e RABBITMQ_DEFAULT_USER=admin \
    -e RABBITMQ_DEFAULT_PASS=admin \
    rabbitmq:3.8.9-management

访问WEB界面:
http://192.168.56.113:15672/

docker安装zookeeper

  1. 简单安装
    先建立保存数据的目录和设置好配置文件:
    1
    2
    $ cd /root/db/zookeeper
    $ mkdir data datalog logs

开始安装

1
2
3
4
5
6
7
8
9
10
11
12
$ sudo docker pull zookeeper:3.6.2
$
$ sudo docker run \
-itd --name zookeeper-hwt \
-p 2181:2181 \
-e ZOO_TICK_TIME=2000 \
-e ZOO_INIT_LIMIT=10 \
-e ZOO_SYNC_LIMIT=5 \
-v /root/db/zookeeper/data:/data \
-v /root/db/zookeeper/datalog:/datalog \
-v /root/db/zookeeper/logs:/logs \
zookeeper:3.6.2

客户端登录

1
2
$ sudo docker exec -it zookeeper-hwt zkCli.sh
Connecting to localhost:2181

docker安装kafka

  1. 简单安装
    先建立保存数据的目录和设置好配置文件:
    1
    2
    $ cd /root/db/kafka
    $

开始安装

1
2
3
4
5
6
7
8
9
10
11
$ sudo docker pull wurstmeister/kafka:2.13-2.7.0
$
$ sudo docker run \
-itd --name kafka-hwt \
-p 9092:9092 \
-e KAFKA_BROKER_ID=0 \
-e KAFKA_ZOOKEEPER_CONNECT=192.168.56.113:2181/kafka \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://192.168.56.113:9092 \
-e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092 \
-v /root/db/kafka:/kafka \
wurstmeister/kafka:2.13-2.7.0

验证kafka是否可以使用

1
2
3
4
5
6
7
8
9
$ sudo docker exec -it kafka-hwt bash
$ cd /opt/kafka/bin/
$
$ 发送消息
$ ./kafka-console-producer.sh --broker-list localhost:9092 --topic redsuns
> hello word
$
$ 接收消息
$ ./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic redsuns --from-beginning

注意:因为上面的KAFKA_ZOOKEEPER_CONNECT将kafka的数据保存到了/kafka目录下,所以,当我们使用Kafka Tool工具连接kafka的时候要记得配置chroot path: /kafka,否则连不上。

docker安装zipkin

拉取镜像

1
$ sudo docker pull openzipkin/zipkin

  1. 简单安装(数据保存到内存中)
    1
    2
    3
    4
    $ sudo docker run \
    -itd --name zipkin-hwt \
    -p 9411:9411 \
    openzipkin/zipkin

访问WEB界面:
http://192.168.56.113:9411/zipkin/

  1. 详细安装(数据保存到mysql中,并且从消息队列中读取数据)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    $ sudo docker run \
    -itd --name zipkin-hwt \
    -p 9411:9411 \
    -e STORAGE_TYPE=mysql \
    -e MYSQL_HOST=192.168.56.113 \
    -e MYSQL_TCP_PORT=3306 \
    -e MYSQL_DB=zipkin \
    -e MYSQL_USER=zipkin \
    -e MYSQL_PASS=HFWM8DBv6nfPXKg2 \
    -e RABBIT_ADDRESSES=192.168.56.113:5672 \
    -e RABBIT_USER=admin \
    -e RABBIT_PASSWORD=admin \
    openzipkin/zipkin

进入指定的容器

若启动容器的时候不是以交互模式,之后又想进入容器,则可以使用如下命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
先启动一个之前停止了的容器
$ sudo docker start 3063341debf2
3063341debf2

直接运行容器内的脚本
$ sudo docker exec -it 3063341debf2 /bin/bash /a.sh
Wed Jul 31 01:40:20 UTC 2019

以交互模式进入容器
$ sudo docker exec -it 3063341debf2 /bin/bash
root@3063341debf2:/# ls
a.sh bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
root@3063341debf2:/# sh a.sh
Wed Jul 31 01:44:34 UTC 2019
root@3063341debf2:/# exit
exit

退出后,容器并不会停止
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3063341debf2 ubuntu:18.04 "/bin/bash" 41 hours ago Up 18 seconds friendly_poitras

另外,使用attach命令也能进入容器,但是当退出后,容器会停止

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3063341debf2 ubuntu:18.04 "/bin/bash" 41 hours ago Up 18 seconds friendly_poitras

$ sudo docker attach --sig-proxy=false 3063341debf2
root@3063341debf2:/# ls
a.sh bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
root@3063341debf2:/# sh a.sh
Wed Jul 31 02:02:50 UTC 2019
root@3063341debf2:/# exit
exit

$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

将镜像导出/导入

导出镜像语法:

1
2
3
4
5
6
7
8
$ sudo docker save --help

Usage: docker save [OPTIONS] IMAGE [IMAGE...]

Save one or more images to a tar archive (streamed to STDOUT by default)

Options:
-o, --output string Write to a file, instead of STDOUT

导入镜像语法:

1
2
3
4
5
6
7
8
9
$ sudo docker load --help

Usage: docker load [OPTIONS]

Load an image from a tar archive or STDIN

Options:
-i, --input string Read from tar archive file, instead of STDIN
-q, --quiet Suppress the load output

示例:先将镜像导出,然后删除镜像,最后再将导出的镜像重新导入:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> c6cd98aa1461 27 hours ago 64.2MB
hewentian/ubuntu v2 2bdf86d10fbc 45 hours ago 91MB
nginx latest e445ab08b2be 7 days ago 126MB
ubuntu 18.04 3556258649b2 7 days ago 64.2MB
hello-world latest fce289e99eb9 7 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

$ sudo docker save -o hu.tar hewentian/ubuntu:v2

$ ls
hu.tar

$ sudo docker rmi 2bdf86d10fbc
Untagged: hewentian/ubuntu:v2
Deleted: sha256:2bdf86d10fbc18204e04fe5a30dee06dfeb30683247c41e85e8cfe6d66d5d9d6
Deleted: sha256:c4a9226f13fa8f48ef07e27e0954c43f38275b6aa1d24e361ed016dfff056069

$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> c6cd98aa1461 27 hours ago 64.2MB
nginx latest e445ab08b2be 7 days ago 126MB
ubuntu 18.04 3556258649b2 7 days ago 64.2MB
hello-world latest fce289e99eb9 7 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

$ sudo docker load -i hu.tar
ed4797628ae8: Loading layer [==================================================>] 26.85MB/26.85MB
Loaded image: hewentian/ubuntu:v2

$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> c6cd98aa1461 27 hours ago 64.2MB
hewentian/ubuntu v2 2bdf86d10fbc 45 hours ago 91MB
nginx latest e445ab08b2be 7 days ago 126MB
ubuntu 18.04 3556258649b2 7 days ago 64.2MB
hello-world latest fce289e99eb9 7 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

或者使用压缩方式导出导入

$ sudo docker save mysql:5.6.42 | gzip > mysql-5.6.42.tar.gz
$ sudo docker load < mysql-5.6.42.tar.gz

注意:导出镜像的时候,要使用REPOSITORY:TAG,而不是IMAGE ID,否则在重新导入的时候会没有REPOSITORY:TAG,显示为none

使用import创建镜像

也可以从导出的tar文件中创建一个新的镜像,语法如下:

1
2
3
4
5
6
7
8
9
$ sudo docker import --help

Usage: docker import [OPTIONS] file|URL|- [REPOSITORY[:TAG]]

Import the contents from a tarball to create a filesystem image

Options:
-c, --change list Apply Dockerfile instruction to the created image
-m, --message string Set commit message for imported image

示例:

1
2
3
4
5
6
7
8
9
10
11
$ sudo docker import hu.tar hewentian/ubuntu:v2.1
sha256:c389673d68c576b08ad8e3c2337de4ee3b4ed7e622fa986771323797edd2d595

$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hewentian/ubuntu v2.1 c389673d68c5 5 seconds ago 66.6MB
hewentian/ubuntu v2 2bdf86d10fbc 45 hours ago 91MB
nginx latest e445ab08b2be 7 days ago 126MB
ubuntu 18.04 3556258649b2 7 days ago 64.2MB
hello-world latest fce289e99eb9 7 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

我试过使用import进去的镜像来创建容器,但是失败了,留待以后再解决

登录/登出镜像仓库

默认登录/登出官方仓库 https://hub.docker.com/ ,不过,也可以登录到指定的私有仓库。

登录语法:

1
2
3
4
5
6
7
8
9
10
11
$ sudo docker login --help

Usage: docker login [OPTIONS] [SERVER]

Log in to a Docker registry.
If no server is specified, the default is defined by the daemon.

Options:
-p, --password string Password
--password-stdin Take the password from stdin
-u, --username string Username

登出语法:

1
2
3
4
5
6
$ sudo docker logout --help

Usage: docker logout [SERVER]

Log out from a Docker registry.
If no server is specified, the default is defined by the daemon.

登录/登出示例:

1
2
3
4
5
6
7
8
9
10
11
$ sudo docker login -u hewentian
Password:
WARNING! Your password will be stored unencrypted in /home/hewentian/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


$ sudo docker logout
Removing login credentials for https://index.docker.io/v1/

将本地镜像上传到镜像仓库

默认上传到docker官方仓库docker.io,上传到私有仓库的例子,后面会介绍。

1
2
3
4
$ sudo docker push hewentian/ubuntu:v2.1
The push refers to repository [docker.io/hewentian/ubuntu]
8c29bfccf50c: Pushed
v2.1: digest: sha256:992cc4e008449d8285387fe80aff3c9b0574360fc3ad21b04bccc5b6a4229923 size: 528

harbor的安装

我们将在机器192.168.56.113上面安装harbor,安装过程参考这里:
https://github.com/goharbor/harbor

harbor依赖docker 17.06.0-ce+docker-compose 1.18.0+,其中dock-ce的安装参照上文。

安装docker-compose,参考: https://docs.docker.com/compose/install/#install-compose

1
2
3
4
5
$ sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose

$ docker-compose -version
docker-compose version 1.24.1, build 4667896b

开始安装harbor,安装之前需要启动docker,参考: https://github.com/goharbor/harbor/blob/master/docs/installation_guide.md
下载离线安装包,在 https://github.com/goharbor/harbor/releases 下载最新版本

1
2
3
4
5
6
$ cd /home/hadoop
$ wget https://storage.googleapis.com/harbor-releases/release-1.8.0/harbor-offline-installer-v1.8.2-rc1.tgz
$ tar xf harbor-offline-installer-v1.8.2-rc1.tgz
$ cd harbor
$ ls
harbor.v1.8.2.tar.gz harbor.yml install.sh LICENSE prepare

安装前的配置,配置文件harbor.ymlhostname可以配置成IP地址或域名,主要是用于给客户端登录使用:
首先查看本机的hostname和IP地址

1
2
3
4
5
6
7
8
9
10
11
12
$ hostname
hadoop-host-slave-3

$ ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.56.113 netmask 255.255.255.0 broadcast 192.168.56.255
inet6 fe80::90f2:2a79:288c:984e prefixlen 64 scopeid 0x20<link>
ether 08:00:27:9f:8e:7e txqueuelen 1000 (Ethernet)
RX packets 4726 bytes 448939 (448.9 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 7637 bytes 9739346 (9.7 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

可以看到hostnamehadoop-host-slave-3,而IP地址是192.168.56.113。这里我们将hostname配置成hadoop-host-slave-3,如下:

1
2
3
4
$ cd /home/hadoop/harbor
$ vi harbor.yml

hostname: hadoop-host-slave-3

开始安装,执行一个安装脚本即可:

1
2
$ cd /home/hadoop/harbor
$ sudo ./install.sh

如无意外,你会看到如下的安装日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
[sudo] password for hadoop: 

[Step 0]: checking installation environment ...

Note: docker version: 19.03.1

Note: docker-compose version: 1.24.1

[Step 1]: loading Harbor images ...
39b2d676308e: Loading layer [==================================================>] 33.47MB/33.47MB
f3583ea30104: Loading layer [==================================================>] 3.552MB/3.552MB
8290f582ffa5: Loading layer [==================================================>] 6.59MB/6.59MB
19913bc5e52b: Loading layer [==================================================>] 161.3kB/161.3kB
ae8b73743d1b: Loading layer [==================================================>] 215kB/215kB
5c811d1fe61a: Loading layer [==================================================>] 3.584kB/3.584kB
Loaded image: goharbor/harbor-portal:v1.8.2
f27812f7a2da: Loading layer [==================================================>] 8.971MB/8.971MB
c74d2b18a2d1: Loading layer [==================================================>] 38.82MB/38.82MB
c416e128ff4c: Loading layer [==================================================>] 38.82MB/38.82MB
Loaded image: goharbor/harbor-jobservice:v1.8.2
e97909585a09: Loading layer [==================================================>] 8.972MB/8.972MB
23b18d08698d: Loading layer [==================================================>] 3.072kB/3.072kB
9c1d8c03df3e: Loading layer [==================================================>] 20.1MB/20.1MB
9666a22cf141: Loading layer [==================================================>] 3.072kB/3.072kB
95783fa51b82: Loading layer [==================================================>] 7.465MB/7.465MB
285e05bca91e: Loading layer [==================================================>] 27.56MB/27.56MB
Loaded image: goharbor/harbor-registryctl:v1.8.2
6543a3ba9bd9: Loading layer [==================================================>] 338MB/338MB
43f486f0ed18: Loading layer [==================================================>] 107kB/107kB
Loaded image: goharbor/harbor-migrator:v1.8.2
6710d86773e1: Loading layer [==================================================>] 50.51MB/50.51MB
dba91d68db46: Loading layer [==================================================>] 3.584kB/3.584kB
4b6a61fc3477: Loading layer [==================================================>] 3.072kB/3.072kB
efd64eeb5c31: Loading layer [==================================================>] 2.56kB/2.56kB
25d50c6108dd: Loading layer [==================================================>] 3.072kB/3.072kB
6c22404ddaf0: Loading layer [==================================================>] 3.584kB/3.584kB
135fef0d64a7: Loading layer [==================================================>] 12.29kB/12.29kB
Loaded image: goharbor/harbor-log:v1.8.2
f080cac48a5f: Loading layer [==================================================>] 3.552MB/3.552MB
Loaded image: goharbor/nginx-photon:v1.8.2
9562b05e7bd1: Loading layer [==================================================>] 8.971MB/8.971MB
2ff1ba9952dc: Loading layer [==================================================>] 5.143MB/5.143MB
463651a0baca: Loading layer [==================================================>] 15.13MB/15.13MB
feceecff30a6: Loading layer [==================================================>] 26.47MB/26.47MB
a2d1a1b1eaaa: Loading layer [==================================================>] 22.02kB/22.02kB
2c8463eca215: Loading layer [==================================================>] 3.072kB/3.072kB
7e91f466c852: Loading layer [==================================================>] 46.74MB/46.74MB
Loaded image: goharbor/notary-server-photon:v0.6.1-v1.8.2
628aac791456: Loading layer [==================================================>] 113MB/113MB
32e13bd19d15: Loading layer [==================================================>] 10.94MB/10.94MB
17d6a3366a31: Loading layer [==================================================>] 2.048kB/2.048kB
9c3d274d3072: Loading layer [==================================================>] 48.13kB/48.13kB
a3e8bc524efe: Loading layer [==================================================>] 3.072kB/3.072kB
6edf120ab0a5: Loading layer [==================================================>] 10.99MB/10.99MB
Loaded image: goharbor/clair-photon:v2.0.8-v1.8.2
fa7f8bd666e1: Loading layer [==================================================>] 8.972MB/8.972MB
d23a3ac1da5c: Loading layer [==================================================>] 3.072kB/3.072kB
25ece37b9b62: Loading layer [==================================================>] 2.56kB/2.56kB
ceff80c4799d: Loading layer [==================================================>] 20.1MB/20.1MB
4ddaf99a2326: Loading layer [==================================================>] 20.1MB/20.1MB
Loaded image: goharbor/registry-photon:v2.7.1-patch-2819-v1.8.2
86ef8960f9fa: Loading layer [==================================================>] 13.72MB/13.72MB
4be07cab0847: Loading layer [==================================================>] 26.47MB/26.47MB
b3f2bb8db417: Loading layer [==================================================>] 22.02kB/22.02kB
4c68837d983b: Loading layer [==================================================>] 3.072kB/3.072kB
f2526a5c0965: Loading layer [==================================================>] 45.33MB/45.33MB
Loaded image: goharbor/notary-signer-photon:v0.6.1-v1.8.2
9c6a2b28994d: Loading layer [==================================================>] 2.56kB/2.56kB
49bb4e719955: Loading layer [==================================================>] 1.536kB/1.536kB
47d1a63f5482: Loading layer [==================================================>] 69.81MB/69.81MB
db449d60801c: Loading layer [==================================================>] 39.75MB/39.75MB
f01c7fa07db7: Loading layer [==================================================>] 144.4kB/144.4kB
5ff7a32e9f2c: Loading layer [==================================================>] 3.005MB/3.005MB
Loaded image: goharbor/prepare:v1.8.2
6602e119ecab: Loading layer [==================================================>] 8.971MB/8.971MB
6b45eae45c58: Loading layer [==================================================>] 46.86MB/46.86MB
e3d9614f88b3: Loading layer [==================================================>] 5.632kB/5.632kB
f0b457c2a1b1: Loading layer [==================================================>] 28.67kB/28.67kB
f4e712369f36: Loading layer [==================================================>] 46.86MB/46.86MB
Loaded image: goharbor/harbor-core:v1.8.2
c39fa71cb1b3: Loading layer [==================================================>] 63.4MB/63.4MB
245ad05b59aa: Loading layer [==================================================>] 50.88MB/50.88MB
6fc4b5ec5705: Loading layer [==================================================>] 6.656kB/6.656kB
8a003956ed73: Loading layer [==================================================>] 2.048kB/2.048kB
0b4d3b06d5d5: Loading layer [==================================================>] 7.68kB/7.68kB
c045e2109691: Loading layer [==================================================>] 2.56kB/2.56kB
eef5f9c09eb0: Loading layer [==================================================>] 2.56kB/2.56kB
75776554d401: Loading layer [==================================================>] 2.56kB/2.56kB
Loaded image: goharbor/harbor-db:v1.8.2
0130cb61aaba: Loading layer [==================================================>] 74.58MB/74.58MB
9f0973beb46c: Loading layer [==================================================>] 3.072kB/3.072kB
74bd291b6f8b: Loading layer [==================================================>] 59.9kB/59.9kB
3b11caba8d3e: Loading layer [==================================================>] 61.95kB/61.95kB
Loaded image: goharbor/redis-photon:v1.8.2
5b00b48e6ec3: Loading layer [==================================================>] 8.976MB/8.976MB
7f5008b71ec6: Loading layer [==================================================>] 44.39MB/44.39MB
02f96d3b6e35: Loading layer [==================================================>] 2.048kB/2.048kB
da8354357ee3: Loading layer [==================================================>] 3.072kB/3.072kB
1819913851a3: Loading layer [==================================================>] 44.4MB/44.4MB
Loaded image: goharbor/chartmuseum-photon:v0.9.0-v1.8.2


[Step 2]: preparing environment ...
prepare base dir is set to /home/hadoop/harbor
Generated configuration file: /config/log/logrotate.conf
Generated configuration file: /config/nginx/nginx.conf
Generated configuration file: /config/core/env
Generated configuration file: /config/core/app.conf
Generated configuration file: /config/registry/config.yml
Generated configuration file: /config/registryctl/env
Generated configuration file: /config/db/env
Generated configuration file: /config/jobservice/env
Generated configuration file: /config/jobservice/config.yml
Generated and saved secret to file: /secret/keys/secretkey
Generated certificate, key file: /secret/core/private_key.pem, cert file: /secret/registry/root.crt
Generated configuration file: /compose_location/docker-compose.yml
Clean up the input dir



[Step 3]: starting Harbor ...
Creating network "harbor_harbor" with the default driver
Creating harbor-log ... done
Creating redis ... done
Creating registryctl ... done
Creating registry ... done
Creating harbor-db ... done
Creating harbor-core ... done
Creating harbor-portal ... done
Creating harbor-jobservice ... done
Creating nginx ... done

✔ ----Harbor has been installed and started successfully.----

Now you should be able to visit the admin portal at http://hadoop-host-slave-3.
For more details, please visit https://github.com/goharbor/harbor .

按照上面的提示,我们在浏览器中访问
http://hadoop-host-slave-3
http://harbor.hewentian.com

在访问之前,我们需要在本地机器中配置一下hosts,添加如下两行

1
2
3
4
$ more /etc/hosts

192.168.56.113 hadoop-host-slave-3
192.168.56.113 harbor.hewentian.com

为方便docker将镜像上传到私有harbor,这里多配置一个域名。所以访问上面所列的两个站点,结果是一样的。后面的操作,我们使用harbor.hewentian.com这个域名。

输入管理员的初始用户名/密码:admin/Harbor12345,登录之后,在页面上可以修改密码。登录之后,如下图:

我们创建一个用户,用户名/密码:hewentian/Harbor12345,用于上传下载镜像:

然后我们退出管理员帐号,用新创建的用户登录:

创建一个project,名为hp,可见性为public:

docker登录到harbor

1
2
3
$ sudo docker login harbor.hewentian.com -u hewentian
Password:
Error response from daemon: Get https://harbor.hewentian.com/v2/: dial tcp 192.168.56.113:443: connect: connection refused

有可能会报上面的错误,原因是docker与registry交互默认使用的是HTTPS,但是我们搭建的harbor默认使用的是HTTP服务。解决方法:
在要登录到harbor的机器(本地机器)作如下配置

1
2
3
4
5
$ sudo vi /etc/docker/daemon.json

{
"insecure-registries": ["harbor.hewentian.com"]
}

文件/etc/docker/daemon.json原先可能并不存在,它所有可能的配置,可以参考这里:
https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-configuration-file

重启(本地机器)的docker,并重新尝试登录到harbor

1
2
3
4
5
6
7
8
9
$ sudo service docker restart

$ sudo docker login harbor.hewentian.com -u hewentian
Password:
WARNING! Your password will be stored unencrypted in /home/hewentian/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

将本地镜像上传到私有镜像仓库harbor

上传到私库的命令,和上传到官方仓库的命令差不多,命令如下:

docker push reg.yourdomain.com/myproject/myrepo:mytag

先查看本地的所有镜像

1
2
3
4
5
6
7
8
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hewentian/ubuntu v2.1 3712fd008024 8 days ago 64.2MB
hewentian/ubuntu v2 2bdf86d10fbc 10 days ago 91MB
nginx latest e445ab08b2be 2 weeks ago 126MB
ubuntu 18.04 3556258649b2 2 weeks ago 64.2MB
hello-world latest fce289e99eb9 7 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

例如我们要将ubuntu:18.04上传到harbor

1
2
3
4
$ sudo docker push harbor.hewentian.com/hp/ubuntu:18.04
[sudo] password for hewentian:
The push refers to repository [harbor.hewentian.com/hp/ubuntu]
An image does not exist locally with the tag: harbor.hewentian.com/hp/ubuntu

可以看到,不能直接上传,要先为待上传的镜像打tag

1
2
3
4
5
6
7
8
9
10
11
$ sudo docker tag ubuntu:18.04 harbor.hewentian.com/hp/ubuntu:18.04

$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hewentian/ubuntu v2.1 3712fd008024 8 days ago 64.2MB
hewentian/ubuntu v2 2bdf86d10fbc 10 days ago 91MB
nginx latest e445ab08b2be 2 weeks ago 126MB
harbor.hewentian.com/hp/ubuntu 18.04 3556258649b2 2 weeks ago 64.2MB
ubuntu 18.04 3556258649b2 2 weeks ago 64.2MB
hello-world latest fce289e99eb9 7 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

从上面可以看到,打tag后的镜像只是原镜像的一个引用,它们的IMAGE ID是一样的。

1
2
3
4
5
6
7
$ sudo docker push harbor.hewentian.com/hp/ubuntu:18.04
The push refers to repository [harbor.hewentian.com/hp/ubuntu]
b079b3fa8d1b: Pushed
a31dbd3063d7: Pushed
c56e09e1bd18: Pushed
543791078bdb: Pushed
18.04: digest: sha256:d91842ef309155b85a9e5c59566719308fab816b40d376809c39cf1cf4de3c6a size: 1152

上传成功。同样在浏览器上面,也可以看到,已经成功上传了。

从harbor下载镜像

先将本地的镜像删掉

1
2
3
4
5
6
7
8
9
10
11
12
13
$ sudo docker rmi harbor.hewentian.com/hp/ubuntu:18.04
Untagged: harbor.hewentian.com/hp/ubuntu:18.04
Untagged: harbor.hewentian.com/hp/ubuntu@sha256:d91842ef309155b85a9e5c59566719308fab816b40d376809c39cf1cf4de3c6a

$ sudo docker rmi ubuntu:18.04

$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hewentian/ubuntu v2.1 3712fd008024 8 days ago 64.2MB
hewentian/ubuntu v2 2bdf86d10fbc 10 days ago 91MB
nginx latest e445ab08b2be 2 weeks ago 126MB
hello-world latest fce289e99eb9 7 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

然后从harbor下载

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ sudo docker pull harbor.hewentian.com/hp/ubuntu:18.04
18.04: Pulling from hp/ubuntu
7413c47ba209: Pull complete
0fe7e7cbb2e8: Pull complete
1d425c982345: Pull complete
344da5c95cec: Pull complete
Digest: sha256:d91842ef309155b85a9e5c59566719308fab816b40d376809c39cf1cf4de3c6a
Status: Downloaded newer image for harbor.hewentian.com/hp/ubuntu:18.04
harbor.hewentian.com/hp/ubuntu:18.04

$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hewentian/ubuntu v2.1 3712fd008024 8 days ago 64.2MB
hewentian/ubuntu v2 2bdf86d10fbc 10 days ago 91MB
nginx latest e445ab08b2be 2 weeks ago 126MB
harbor.hewentian.com/hp/ubuntu 18.04 3556258649b2 2 weeks ago 64.2MB
hello-world latest fce289e99eb9 7 months ago 1.84kB
training/webapp latest 6fae60ef3446 4 years ago 349MB

查看harbor进程状态

harbor的日志默认存放在/var/log/harbor,如果有下列哪个服务不是Up状态,可以查看相关日志

1
2
3
4
5
6
7
8
9
10
11
12
13
$ sudo docker-compose ps
[sudo] password for hadoop:
Name Command State Ports
---------------------------------------------------------------------------------------------
harbor-core /harbor/start.sh Up (healthy)
harbor-db /entrypoint.sh postgres Up (healthy) 5432/tcp
harbor-jobservice /harbor/start.sh Up
harbor-log /bin/sh -c /usr/local/bin/ ... Up (healthy) 127.0.0.1:1514->10514/tcp
harbor-portal nginx -g daemon off; Up (healthy) 80/tcp
nginx nginx -g daemon off; Up (healthy) 0.0.0.0:80->80/tcp
redis docker-entrypoint.sh redis ... Up 6379/tcp
registry /entrypoint.sh /etc/regist ... Up (healthy) 5000/tcp
registryctl /harbor/start.sh Up (healthy)

harbor生命周期管理

可以使用docker-compose命令来启动、停止harbor
停止harbor:

1
2
3
4
5
6
7
8
9
10
11
$ sudo docker-compose stop

Stopping nginx ... done
Stopping harbor-jobservice ... done
Stopping harbor-portal ... done
Stopping harbor-core ... done
Stopping registry ... done
Stopping harbor-db ... done
Stopping registryctl ... done
Stopping redis ... done
Stopping harbor-log ... done

启动harbor:

1
2
3
4
5
6
7
8
9
10
11
$ sudo docker-compose start

Starting log ... done
Starting registry ... done
Starting registryctl ... done
Starting postgresql ... done
Starting core ... done
Starting portal ... done
Starting redis ... done
Starting jobservice ... done
Starting proxy ... done

参考文献:
https://docs.docker.com/
https://blog.docker.com/
https://github.com/goharbor/harbor

将宿主机的文件复制到容器里面

sudo docker cp /data.txt containerID:/

注意:一定要指定复制到的目录位置

未完待续……

nginx 学习笔记

今天一看,天哪,原来已经有4个月没有更新博客了,我在想,这4个月我都干嘛去了,内心立马慌起来了(你知道的,程序员是不能停止学习的)。但是细想,虽然没更新博客,但还是看了几本书:《从0到1》、《毛泽东选集》卷一、《MongoDB in Action》、《SpringBoot in Action》、《白帽子讲Web安全》,内心立马淡定了不少。好了,题外话不多说了,马上进入主题。

nginx是一个HTTP和反向代理服务器、邮件代理服务器和通用的TCP/UDP代理服务器,最初由俄罗斯程序员Igor Sysoev所开发。

详细介绍可以参见:
http://nginx.org/en/
http://nginx.org/en/docs/

本文将说下nginx的简单安装使用,因为在项目中要使用到的。我在虚拟机VirtualBox中的机器中安装,系统为Ubuntu Linux,机器节点相关配置如下(之前安装过Hadoop集群的机器):

master:
    ip: 192.168.56.110
    hostname: hadoop-host-master

首先,我们要将nginx的安装包下载回来,截止本文写时,它的最新稳定版本为1.16.0,可以在它的官网下载。我先在我的物理机器下载回来。

1
2
3
4
5
$ cd /home/hewentian/Downloads/
$ wget http://nginx.org/download/nginx-1.16.0.tar.gz
$ wget http://nginx.org/download/nginx-1.16.0.tar.gz.asc

验证下载文件的完整性,这里略

在物理机上将它传到要安装的机器hadoop-host-master

1
$ scp nginx-1.16.0.tar.gz hadoop@hadoop-host-master:~/

接下来,我们进入hadoop-host-master中操作:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
$ ssh hadoop@hadoop-host-master

$ tar xf nginx-1.16.0.tar.gz
$ cd nginx-1.16.0/
$ ls
auto CHANGES CHANGES.ru conf configure contrib html LICENSE man README src


$ sudo ./configure --with-http_stub_status_module --with-http_ssl_module --with-http_realip_module
checking for OS
+ Linux 4.15.0-39-generic x86_64
...

中间省略部分


Configuration summary
+ using system PCRE library
+ OpenSSL library is not used
+ using system zlib library

nginx path prefix: "/usr/local/nginx"
nginx binary file: "/usr/local/nginx/sbin/nginx"
nginx modules path: "/usr/local/nginx/modules"
nginx configuration prefix: "/usr/local/nginx/conf"
nginx configuration file: "/usr/local/nginx/conf/nginx.conf"
nginx pid file: "/usr/local/nginx/logs/nginx.pid"
nginx error log file: "/usr/local/nginx/logs/error.log"
nginx http access log file: "/usr/local/nginx/logs/access.log"
nginx http client request body temporary files: "client_body_temp"
nginx http proxy temporary files: "proxy_temp"
nginx http fastcgi temporary files: "fastcgi_temp"
nginx http uwsgi temporary files: "uwsgi_temp"
nginx http scgi temporary files: "scgi_temp"

./configure过程中会检查依赖的缺失情况,如果有缺失,则会在这里提示。我们根据提示安装即可。

一般来说,nginx编译会依赖:zlib、zlib-devel、openssl、openssl-devel、pcre、pcre-devel、gcc、g++

在CentOS上可以通过以下方式进行安装:

1
$ yum -y install zlib zlib-devel openssl openssl-devel pcre pcre-devel

而在Ubuntu上面则可以通过下面的方式进行安装:

1
2
3
4
5
$ sudo apt-get install build-essential    这会同时安装 gcc、g++
$ sudo apt-get install zlib1g
$ sudo apt-get install openssl libssl-dev
$ sudo apt-get install libpcre3 libpcre3-dev
$ sudo apt-get install zlib1g-dev

接下来进行编译安装:

1
2
3
4
5
6
7
8
9
10
11
$ sudo make && make install

objs/ngx_modules.o \
-ldl -lpthread -lcrypt -lpcre -lz \
-Wl,-E
sed -e "s|%%PREFIX%%|/usr/local/nginx|" \
-e "s|%%PID_PATH%%|/usr/local/nginx/logs/nginx.pid|" \
-e "s|%%CONF_PATH%%|/usr/local/nginx/conf/nginx.conf|" \
-e "s|%%ERROR_LOG_PATH%%|/usr/local/nginx/logs/error.log|" \
< man/nginx.8 > objs/nginx.8
make[1]: Leaving directory '/home/hadoop/nginx-1.16.0'

它默认会安装到/usr/local/nginx/目录下。

启动命令如下:

1
$ sudo /usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf

注意:使用-c参数指定配置文件一定要使用绝对路径,否则可能会报错。

修改配置文件后,检查配置文件是否正确的命令如下:

1
2
3
4
$ sudo /usr/local/nginx/sbin/nginx -t -c /usr/local/nginx/conf/nginx.conf

nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
nginx: configuration file /usr/local/nginx/conf/nginx.conf test is successful

可以打开 http://192.168.56.110/ 来看看是否已启动。

另外,使用 http://hadoop-host-master/ 也是一样的。

当你看到上面的界面后,nginx的安装到这里就成功了。

一些常用操作命令

进入nginx的安装目录,这里进入默认安装目录,然后查看帮助信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ cd /usr/local/nginx/sbin
$ ./nginx -h

nginx version: nginx/1.16.0
Usage: nginx [-?hvVtTq] [-s signal] [-c filename] [-p prefix] [-g directives]

Options:
-?,-h : this help
-v : show version and exit
-V : show version and configure options then exit
-t : test configuration and exit
-T : test configuration, dump it and exit
-q : suppress non-error messages during configuration testing
-s signal : send signal to a master process: stop, quit, reopen, reload
-p prefix : set prefix path (default: /usr/local/nginx/)
-c filename : set configuration file (default: conf/nginx.conf)
-g directives : set global directives out of configuration file

说明如下:

  1. 启动命令: ./nginx
  2. 关闭命令: ./nginx -s stop,快速停止nginx,可能并不保存相关信息;
  3. 退出命令: ./nginx -s quit,完整有序的停止nginx,会保存相关信息,建议使用此命令;
  4. 动态加载配置文件: ./nginx -s reload可以不关闭nginx的情况下更新配置文件;
  5. 重新打开日志文件:./nginx -s reopen
  6. 查看Nginx版本: ./nginx -v
  7. 检查配置文件是否正确: ./nginx -t或者检查指定配置文件./nginx -t -c /usr/local/nginx/conf/nginx.conf

接下来,我们作一些简单的配置示例。但是在开始之前,在执行访问的机器上面(在这里是我的物理机器),配置一下HOST,增加下面这3行:

1
2
3
4
5
6
7
$ sudo vi /etc/hosts

192.168.56.110 www.hewentian.com
192.168.56.110 admin.hewentian.com
192.168.56.110 img.hewentian.com
192.168.56.110 api.hewentian.com
192.168.56.110 so.hewentian.com

示例一:将某目录下的图片,让其他机器可以通过WEB访问

在安装了nginx的机器hadoop-host-master上有个目录/home/hadoop/Pictures,这个目录下放有图片(我的机器里有一张:my_computer.png)。现在想通过web,让其他机器的用户访问这个目录下的图片。

修改nginx.conf配置,添加如下代码即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ sudo vi /usr/local/nginx/conf/nginx.conf

server {
listen 80;
server_name img.hewentian.com; # 修改 1/3: WEB访问的位置
access_log logs/img.access.log main; # 修改 2/3: 日志存放位置。记得将此配置文件中的 log_format main 前面的注释打开

location / {
root /home/hadoop/Pictures/; # 修改 3/3: 图片存放位置
index index.html index.htm;
}

error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}

保存文件后,退出。
检查配置文件是否正确:

1
$ sudo /usr/local/nginx/sbin/nginx -t -c /usr/local/nginx/conf/nginx.conf

然后重启nginx,首先找出nginx进程号:

ps -ef | grep nginx

然后杀死nginx的主进程:

sudo kill -9 [nginx进程号]

重启nginx

sudo /usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf

在物理机器上面打开浏览器,试着访问: http://img.hewentian.com/my_computer.png

查看nginx访问日志:

1
2
3
4
5
6
7
8
9
10
$ tail /usr/local/nginx/logs/img.access.log

192.168.56.1 - - [11/Jun/2019:22:46:33 +0800] "GET / HTTP/1.1" 403 555 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [11/Jun/2019:22:46:50 +0800] "GET /my_computer.png HTTP/1.1" 200 59332 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [11/Jun/2019:22:47:21 +0800] "GET /my_computer.png HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:00:51:04 +0800] "GET /my_computer.png HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:00:51:04 +0800] "GET /my_computer.png HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:00:51:04 +0800] "GET /my_computer.png HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:00:51:05 +0800] "GET /my_computer.png HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:01:02:06 +0800] "GET /my_computer.png HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"

示例二:实现简单的负载匀衡

有个应用,它有个接口/hello,是返回当前服务器的IP地址。我是使用SpringBoot来简单开发的,代码只有几行:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package com.hewentian.web.controller;

import org.apache.log4j.Logger;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.net.InetAddress;
import java.net.UnknownHostException;

/**
* <p>
* <b>HelloController</b> 是 返回当前服务器IP地址的Controller
* </p>
*
* @author <a href="mailto:wentian.he@qq.com">hewentian</a>
* @date 2019-06-12 14:52:49
* @since JDK 1.8
*/
@RestController
public class HelloController {
private static Logger log = Logger.getLogger(HelloController.class);

@RequestMapping("/hello")
public String index() {
String address = "";

try {
InetAddress inetAddress = InetAddress.getLocalHost();
address = inetAddress.getHostAddress();
} catch (UnknownHostException e) {
log.error(e.getMessage(), e);
}

return address + " is serving for you: Hello World";
}
}

部署在下面的2台服务器上,都是监听8080端口。

slave1:
    ip: 192.168.56.111
    hostname: hadoop-host-slave-1
slave2:
    ip: 192.168.56.112
    hostname: hadoop-host-slave-2

现在想在网页上访问:http://192.168.56.110http://api.hewentian.com 的时候,就会访问到上面这2台机器,实现负载均衡。

修改nginx.conf配置,添加如下代码即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ sudo vi /usr/local/nginx/conf/nginx.conf

upstream api_worker {
server 192.168.56.111:8080 weight=3000;
server 192.168.56.112:8080 weight=3000;
keepalive 2000;
}

server {
listen 80;
server_name 192.168.56.110 api.hewentian.com;
access_log logs/api.access.log main;

location / {
proxy_pass http://api_worker/;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}

重启nginx。正常情况下,流量会匀衡地分到2台服务器,如下图所示。

如果两台机器中的某一台挂掉了,流量会自动分到另外一台,这样也实现了简单的高可用。

查看nginx访问日志:

1
2
3
4
5
6
7
8
9
10
11
12
$ tail /usr/local/nginx/logs/api.access.log 

192.168.56.1 - - [12/Jun/2019:02:46:52 +0800] "GET /hello HTTP/1.1" 200 46 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:02:46:52 +0800] "GET /favicon.ico HTTP/1.1" 200 946 "http://api.hewentian.com/hello" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:02:46:52 +0800] "GET /hello HTTP/1.1" 200 46 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:02:46:52 +0800] "GET /favicon.ico HTTP/1.1" 200 946 "http://api.hewentian.com/hello" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:02:46:52 +0800] "GET /hello HTTP/1.1" 200 46 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:02:46:52 +0800] "GET /favicon.ico HTTP/1.1" 200 946 "http://api.hewentian.com/hello" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:02:46:52 +0800] "GET /hello HTTP/1.1" 200 46 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:02:46:52 +0800] "GET /favicon.ico HTTP/1.1" 200 946 "http://api.hewentian.com/hello" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:02:46:52 +0800] "GET /hello HTTP/1.1" 200 46 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"
192.168.56.1 - - [12/Jun/2019:02:46:52 +0800] "GET /favicon.ico HTTP/1.1" 200 946 "http://api.hewentian.com/hello" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"

示例三:实现同一个域名下的两个服务

有两个服务:
一个是一个WEB服务,是指向另一台机器的,其中/hello是一个接口:
http://www.hewentian.com/api/hello

另一个是用来看某个目录下的图片的:
http://www.hewentian.com/img/my_computer.png

修改nginx.conf配置,添加如下代码即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$ sudo vi /usr/local/nginx/conf/nginx.conf

upstream api_worker {
server 192.168.56.111:8080 weight=3000;
keepalive 2000;
}

server {
listen 80;
server_name www.hewentian.com;

location / {
root html;
index index.html;
}

location /api/ {
proxy_pass http://api_worker/;
#proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
#proxy_http_version 1.1;
#proxy_set_header Connection "";
access_log logs/api.access.log main;
}

location /img/ {
alias /home/hadoop/Pictures/;
access_log logs/img.access.log main;
}
}

重启nginx。访问上述地址,将得到如下图效果:

查看nginx访问日志:

1
2
3
4
5
6
7
$ tail -n1 /usr/local/nginx/logs/api.access.log /usr/local/nginx/logs/img.access.log 

==> /usr/local/nginx/logs/api.access.log <==
192.168.56.1 - - [12/Jun/2019:04:09:55 +0800] "GET /api/hello HTTP/1.1" 200 46 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"

==> /usr/local/nginx/logs/img.access.log <==
192.168.56.1 - - [12/Jun/2019:04:08:04 +0800] "GET /img/my_computer.png HTTP/1.1" 200 59332 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" "-"

示例四:静态HTML项目、WEB接口项目并存和Tomcat项目

有三个服务:

  1. 第一个是一个WEB服务,是指向另一台机器的,其中/hello是一个接口,另外此WEB项目包含静态HTML代码:
    http://www.hewentian.com/index.html 最后的 /index.html不能少
    http://www.hewentian.com/hello

  2. 第二个也是一个WEB服务,是指向另一台机器的,其中/hello是一个接口,另外此WEB项目包含静态HTML代码:
    http://admin.hewentian.com/index.html 最后的 /index.html不能少
    http://admin.hewentian.com/hello

  3. 第三个是Tomcat中的一个WEB项目:
    http://so.hewentian.com

修改nginx.conf配置,添加如下代码即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
$ sudo vi /usr/local/nginx/conf/nginx.conf

server {
listen 80;
server_name www.hewentian.com;
access_log logs/www.access.log main;

location ~* (.html|.js|.css|.png|.jpg|.gif|.ico|.woff|.ttf|.woff2)$ {
root /home/hadoop/www-hewentian;
index main.html index.html index.htm;
}

location / {
proxy_pass http://192.168.56.111:8080;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}

server {
listen 80;
server_name admin.hewentian.com;
access_log logs/admin.access.log main;

location ~* (.html|.js|.css|.png|.jpg|.gif|.ico|.woff|.ttf|.woff2)$ {
root /home/hadoop/admin-hewentian;
index main.html index.html index.htm;
}

location / {
proxy_pass http://192.168.56.112:8080;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}

server {
listen 80;
server_name so.hewentian.com;
access_log logs/so.access.log main;

location / {
proxy_pass http://192.168.56.111:8082/so/; # 注意:最后的/不能少,具体位置: /home/hadoop/apache-tomcat-8.0.47/webapps/so
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}

示例五:同一个Tomcat下的两个项目

有三个服务:

  1. 第一个是一个WEB服务:
    http://www.hewentian.com/so/

  2. 第二个也是一个WEB服务,和第一个项目在同一个Tomcat中:
    http://www.hewentian.com/smswzl/

  3. 第三个是统计nginx状态的功能,需要安装./configure --with-http_stub_status_module模块,并重新编译安装nginx:
    http://www.hewentian.com/nginxstatus

修改nginx.conf配置,添加如下代码即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
$ sudo vi /usr/local/nginx/conf/nginx.conf

gzip on;
gzip_min_length 1k;
gzip_buffers 4 16k;
gzip_http_version 1.0;
gzip_comp_level 2;
gzip_types text/plain application/x-javascript text/css application/xml;
gzip_vary on;

upstream tomcatServer {
server 192.168.56.111:8082;
}

server {
listen 80;
server_name www.hewentian.com;
charset utf-8;

root /home/hadoop/www-hewentian;
index main.html index.html index.htm;

access_log logs/www.access.log combined;

#expires
location ~ .*\.(gif|jpg|jpeg|png|bmp|swf)$ {
expires 30d;
}

location ~ .*\.(js|css)?$ {
expires 24h;
}

location /nginxstatus {
stub_status on;
access_log off;
}

location /so {
index index.html;
proxy_pass http://tomcatServer/so;
}

location /smswzl {
index index.html;
proxy_pass http://tomcatServer/smswzl;
}
}

示例六:服务器使用nginx做代理,通过HttpServletRequest获取请求用户的真实IP地址

在使用nginx做代理时,服务端如果直接从`X-Forwarded-For`头部获取来源IP,将获取到nginx所在的ip

地址,而不是请求的真实ip地址。那么,如何获取请求的真实IP地址?

首先,在nginx执行`./configure`的时候一定要加上`--with-http_realip_module`模块,然后在nginx

的配置中添加如下配置。

1
2
3
4
5
6
location /api/userinfo {
proxy_pass http://127.0.0.1:8081/api/userinfo;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}

在nginx中将请求来源IP添加到代理请求头部,然后使用命令重新加载配置

nginx -s reload

服务端使用以下代码即可获取请求主机真实IP地址

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
private String getRealIP(HttpServletRequest request) {
for (String head : Arrays.asList("X-Forwarded-For", "X-Real-IP")) {
String ip = request.getHeader(head);

if (StringUtils.isBlank(ip)) {
continue;
}

log.info("{} : {}", head, ip);

int index = ip.indexOf(',');
if (index != -1) {
ip = ip.substring(0, index);
}

return ip.trim(); // 一般从 X-Forwarded-For 中即可获取并返回
}

return null;
}

hbase 集群的搭建

Hbase 介绍

Hbase的官方文档中有对Hbase的详细介绍,这里不再赘述。这里用一句话描述如下:

Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.

Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables – billions of rows X millions of columns – atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

Hbase 的安装

安装过程参考这里:
http://hbase.apache.org/book.html#quickstart_fully_distributed

Hbase依赖于HADOOP,我们在上一篇hadoop 集群的搭建HA的基础上安装Hbase。

节点分布如下:

master:
    ip: 192.168.56.110
    hostname: hadoop-host-master
slave1:
    ip: 192.168.56.111
    hostname: hadoop-host-slave-1
slave2:
    ip: 192.168.56.112
    hostname: hadoop-host-slave-2
slave3:
    ip: 192.168.56.113
    hostname: hadoop-host-slave-3

如下图(绿色代表在这些节点上面安装这些程序,与hadoop 集群的搭建HA安装中的图类似,这里多了后面两列):

安装NTP

可能还要在各节点服务器上面安装NTP服务,实现服务器节点间时间的一致。如果服务器节点间的时间不一致,可能会引发HBase的异常,这一点在HBase官网上有特别强调。在这里,设置我的笔记本电脑为NTP的服务端节点,即是我的电脑从国家授时中心同步时间,然后其它节点(master、slave1、slave2、slave3)作为客户端从我的笔记本同步时间。此篇的安装过程将省略这个步骤,在后续篇章中再介绍,本篇将手动将各节点的时间调成一致。

修改ulimit

Configuring the maximum number of file descriptors and processes for the user who is running the HBase process is an operating system configuration, rather than an HBase configuration. It is also important to be sure that the settings are changed for the user that actually runs HBase. To see which user started HBase, and that user’s ulimit configuration, look at the first line of the HBase log for that instance.

修改ulimit,以增加linux系统能同时打开文件的数量

1
2
3
4
$ vi /etc/security/limits.conf

hadoop - nofile 32768
hadoop - nproc 32000

修改后需重启系统才能生效。

下载Hbase

首先下载Hbase,我们下载的时候,要选择适合我们HADOOP版本的Hbase,我们下载的稳定版为hbase-1.2.6-bin.tar.gz,将压缩包首先传到master节点的/home/hadoop/目录下,先在master节点配置好,然后同步到其他3个节点。

1
2
3
4
5
6
$ cd /home/hadoop/
$ tar xzvf hbase-1.2.6-bin.tar.gz
$
$ cd hbase-1.2.6/
$ ls
bin CHANGES.txt conf docs hbase-webapps LEGAL lib LICENSE.txt NOTICE.txt README.txt

配置hbase-env.sh,加上JDK绝对路径

  1. JDK的路径就是安装JDK的时候的路径;
  2. Hbase内置有zookeeper,但是为了方便管理,我们单独部署zookeeper,即使用HADOOP中用作ZKFC的zookeeper;
1
2
3
4
5
$ cd /home/hadoop/hbase-1.2.6/conf
$ vi hbase-env.sh

export JAVA_HOME=/usr/local/jdk1.8.0_102/
export HBASE_MANAGES_ZK=false

配置hbase-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ cd /home/hadoop/hbase-1.2.6/conf
$ vi hbase-site.xml

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop-host-master:8020/hbase</value>
<description>Do not create the dir hbase, the system will create it automatically, and the value is dfs.namenode.rpc-address.hadoop-cluster-ha.nn1</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop-host-master,hadoop-host-slave-1,hadoop-host-slave-2</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper-3.4.6/data</value>
<description>Property from ZooKeeper's config zoo.cfg. The directory where the snapshot is stored.</description>
</property>
</configuration>

配置regionservers

在将要运行regionservers的节点加入此文件中

1
2
3
4
5
6
$ cd /home/hadoop/hbase-1.2.6/conf
$ vi regionservers

hadoop-host-slave-1
hadoop-host-slave-2
hadoop-host-slave-3

配置backup-masters

我们将hadoop-host-master作为Hbase集群的master,并配置HBase使用hadoop-host-slave-1作为backup master,在conf目录下创建一个文件backup-masters, 并在其中添加作为backup master的主机名

1
2
3
4
$ cd /home/hadoop/hbase-1.2.6/conf
$ vi backup-masters

hadoop-host-slave-1

复制hdfs-site.xml配置文件

复制$HADOOP_HOME/etc/hadoop/hdfs-site.xml$HBASE_HOME/conf目录下,这样以保证HDFS与Hbase两边配置一致,这也是官网所推荐的方式。例子,如果HDFS中配置的副本数量为5(默认为3),如果没有将hadoop的hdfs-site.xml复制到$HBASE_HOME/conf目录下,则Hbase将会按3份备份,从而两边不一致,导致出现异常。

1
2
$ cd /home/hadoop/hbase-1.2.6/conf/
$ cp /home/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml .

至此,配置完毕,将这些配置同步到其他三个节点,在hadoop-host-master上面执行:

1
2
3
4
$ cd /home/hadoop/
$ scp -r hbase-1.2.6 hadoop@hadoop-host-slave-1:/home/hadoop/
$ scp -r hbase-1.2.6 hadoop@hadoop-host-slave-2:/home/hadoop/
$ scp -r hbase-1.2.6 hadoop@hadoop-host-slave-3:/home/hadoop/

启动Hbase

可使用$HBASE_HOME/bin/start-hbase.sh指令启动整个集群,如果要使用该命令,则集群的节点间必须实现ssh的免密码登录,这样才能到不同的节点启动服务。

按我们前面的规划,hadoop-host-master将作为Hbase集群的master,其实在哪台机器上面运行start-hbase.sh指令,那么这台机器将成为master。

1
2
$ cd /home/hadoop/hbase-1.2.6/bin
$ ./start-hbase.sh

当执行jps指令后,可以看到hadoop-host-master上面多了一个HMaster进程,在hadoop-host-slave-1中会同时存在HMasterHRegionServer进程,而在其他两个节点则只存在HRegionServer进程。

另外,我们可以在其他任何机器通过以下命令启动一个master

1
2
3
4
5
$ cd /home/hadoop/hbase-1.2.6/bin
$ ./hbase-daemon.sh start master

或者启动作为backup master
$ ./hbase-daemon.sh start master --backup

可以在实体机的浏览器中输入:
http://hadoop-host-master:16010/
http://hadoop-host-slave-1:16010/

来查看是否启动成功,如无意外的话,你会看到如下结果页面。其中一个是Master,另一个是Back Master:

同样的它在HDFS中也自动创建了保存数据的目录:

至此,集群搭建成功。

Hbase初体验

首先我们通过SHELL的方式简单体验一下Hbase:

1
2
3
4
5
6
7
8
9
10
11
12
$ cd /home/hadoop/hbase-1.2.6/
$ ./bin/hbase shell
2019-01-23 19:16:36,118 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.6, rUnknown, Mon May 29 02:25:32 CDT 2017

hbase(main):001:0> list
TABLE
0 row(s) in 0.6030 seconds

=> []

由上述结果可知,Hbase中现在没有一张表。我们尝试创建一张表t_student

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
hbase(main):002:0> create 't_student', 'cf1'
0 row(s) in 2.4700 seconds

=> Hbase::Table - t_student
hbase(main):003:0> list
TABLE
t_student
1 row(s) in 0.0250 seconds

=> ["t_student"]
hbase(main):004:0> desc 't_student'
Table t_student is ENABLED
t_student
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLO
CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.2010 seconds

往表中插入2条数据:

1
2
3
4
5
6
7
8
9
10
11
hbase(main):005:0> put 't_student', '01', 'cf1:name', 'tim'
0 row(s) in 0.1840 seconds

hbase(main):006:0> put 't_student', '02', 'cf1:name', 'timho'
0 row(s) in 0.3630 seconds

hbase(main):007:0> scan 't_student'
ROW COLUMN+CELL
01 column=cf1:name, timestamp=1548242390794, value=tim
02 column=cf1:name, timestamp=1548246522887, value=timho
2 row(s) in 0.1240 seconds

插入数据之后,可能在HDFS中还不能立刻看到,因为数据还在内存中,但可以通过以下命令将数据立刻写到HDFS中:

1
2
hbase(main):008:0> flush 't_student'
0 row(s) in 0.7290 seconds

然后我们可以在HDFS、Hbase的管理界面分别看到表信息:

当我们在HDFS中看到表中的某个块的数据,如下:

我们可以通过Hbase中的命令来查看数据的真实内容:

1
2
3
4
5
6
$ cd /home/hadoop/hbase-1.2.6
$ ./bin/hbase hfile -p -f /hbase/data/default/t_student/b76cccf6c6a7926bf8f40b4eafc6991e/cf1/2ed0a233411447778982edce04e96fe3
2019-01-23 19:45:33,200 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-01-23 19:45:34,046 INFO [main] hfile.CacheConfig: Created cacheConfig: CacheConfig:disabled
K: 01/cf1:name/1548242390794/Put/vlen=3/seqid=4 V: tim
Scanned kv count -> 1

查看集群状态和节点数量

1
2
hbase(main):009:0> status
1 active master, 1 backup masters, 3 servers, 0 dead, 1.0000 average load

根据条件查询数据

1
2
3
4
hbase(main):010:0> get 't_student', '01'
COLUMN CELL
cf1:name timestamp=1548242390794, value=tim
1 row(s) in 0.0590 seconds

表失效、表生效、删除表:

  1. 使用disable命令可将某张表失效,失效后该表将不能使用;
  2. 使用enable命令可使表重新生效,表生效后,即可对表进行操作;
  3. 使用drop命令可对表进行删除,但只有表在失效的情况下,才能进行删除。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
hbase(main):011:0> desc 't_student'
Table t_student is ENABLED
t_student
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLO
CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.0480 seconds

hbase(main):012:0> disable 't_student'
0 row(s) in 2.4070 seconds

hbase(main):013:0> desc 't_student'
Table t_student is DISABLED
t_student
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLO
CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.0320 seconds

hbase(main):014:0> enable 't_student'
0 row(s) in 1.3260 seconds

hbase(main):015:0> disable 't_student'
0 row(s) in 2.2550 seconds

hbase(main):016:0> drop 't_student'
0 row(s) in 1.3540 seconds

hbase(main):017:0> list
TABLE
0 row(s) in 0.0060 seconds

=> []

退出 hbase shell

1
hbase(main):018:0> quit

遇到的问题

  1. 有时候我们重启了hadoop集群后,发现hbase无法使用,有可能是我们在hbase-site.xml中配置的hadoop master节点已经不是active了,解决办法是在hadoop中手动将其设为active状态;

  2. 主从节点时间没有同步时,极有可能出现如下错误,同步时间后可以正常启动:

    master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 855041 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
    
    [HBase] ERROR:org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
    

java操作Hbase

java操作Hbase的例子见这里:HbaseUtil.javaHbaseDemo.java

未完,待续……

hive 学习笔记

Hive 介绍

hive的官方文档中有对hive的详细介绍,这里不再赘述。我们用一句话描述如下:
The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

hive 的安装

安装过程参考这里:
https://cwiki.apache.org/confluence/display/Hive/GettingStarted

hive依赖于HADOOP,我们在上一篇hadoop 集群的搭建HA的基础上安装hive。

首先下载hive,我们下载的时候,要选择适合我们HADOOP版本的hive,我们下载的稳定版为apache-hive-1.2.2-bin.tar.gz,我们将在HADOOP集群的namenode上面安装,即在master机器上面安装。将压缩包传到/home/hadoop/目录下。

1
2
$ cd /home/hadoop/
$ tar xzvf apache-hive-1.2.2-bin.tar.gz

解压后得到目录apache-hive-1.2.2-bin,我们看下压缩包中的内容:

1
2
3
4
5
6
7
$ cd /home/hadoop/apache-hive-1.2.2-bin
$ ls
bin conf examples hcatalog lib LICENSE NOTICE README.txt RELEASE_NOTES.txt scripts
$
$ ls conf/
beeline-log4j.properties.template hive-env.sh.template hive-log4j.properties.template
hive-default.xml.template hive-exec-log4j.properties.template ivysettings.xml

配置HADOOP_HOME:

1
2
3
4
5
6
$ cd /home/hadoop/apache-hive-1.2.2-bin/conf/
$ cp hive-default.xml.template hive-site.xml
$ cp hive-env.sh.template hive-env.sh
$ vi hive-env.sh

HADOOP_HOME=/home/hadoop/hadoop-2.7.3

到这里,hive就配置好了,可以运行了。但,不妨看下下面的配置hive元数据的存储位置,因为生产环境一般是要配置的。

配置hive元数据的存储位置(可选配置)

hive默认将元数据存储在derby数据库中(hive安装包自带),当然我们也可以选择存储在其他数据库,如mysql中。下面演示一下:
首先在MYSQL数据库中创建一个数据库,用于存储hive的元数据,我们就将库名创建为hive:

1
2
3
4
mysql> CREATE DATABASE IF NOT EXISTS hive COLLATE = 'utf8_general_ci' CHARACTER SET = 'utf8';
mysql> GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive';
mysql> GRANT ALL ON hive.* TO 'hive'@'localhost' IDENTIFIED BY 'hive';
mysql> FLUSH PRIVILEGES;

然后配置hive使用mysql存储元数据:

1
2
$ cd /home/hadoop/apache-hive-1.2.2-bin/conf/
$ vi hive-site.xml

修改下面部分,假定我们的数据库地址、用户名和密码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://mysql.hewentian.com:3306/hive</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>

最后,将mysql连接JDBC的jar包mysql-connector-java-5.1.42.jar放到apache-hive-1.2.2-bin/lib目录下

好了,以上这部分是可选配置部分。

启动hive

初次启动hive,需在HDFS中创建几个目录,用于存储hive的数据,我们在安装hive的master节点执行如下命令:

1
2
3
4
5
6
$ cd /home/hadoop/hadoop-2.7.3/
$ ./bin/hdfs dfs -mkdir /tmp
$ ./bin/hdfs dfs -mkdir -p /user/hive/warehouse
$
$ ./bin/hdfs dfs -chmod g+w /tmp
$ ./bin/hdfs dfs -chmod g+w /user/hive/warehouse

初始化元数据存储相关信息,hive默认使用内置的derby数据库存储元数据。这里使用mysql,如果要使用默认的,则则将下面的mysql修改成derby即可。

1
2
3
4
5
6
7
8
9
10
$ cd /home/hadoop/apache-hive-1.2.2-bin/bin
$ ./schematool -dbType mysql -initSchema

Metastore connection URL: jdbc:mysql://mysql.hewentian.com:3306/hive
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 1.2.0
Initialization script hive-schema-1.2.0.mysql.sql
Initialization script completed
schemaTool completed

正式启动hive

1
2
$ cd /home/hadoop/apache-hive-1.2.2-bin/bin
$ ./hive

启动的时候可能会报如下错误:

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at org.apache.hadoop.fs.Path.initialize(Path.java:205)
    at org.apache.hadoop.fs.Path.<init>(Path.java:171)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:659)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
    at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:549)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:750)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at java.net.URI.checkPath(URI.java:1823)
    at java.net.URI.<init>(URI.java:745)
    at org.apache.hadoop.fs.Path.initialize(Path.java:202)
    ... 12 more

解决方法如下,先建目录:

1
2
$ cd /home/hadoop/apache-hive-1.2.2-bin/
$ mkdir iotmp

hive-site.xml

  1. 包含${system:java.io.tmpdir}的配置项替换为上面的路径/home/hadoop/apache-hive-1.2.2-bin/iotmp,一共有4处;
  2. 包含${system:user.name}的配置项替换为hadoop
    修改项如下:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/home/hadoop/apache-hive-1.2.2-bin/iotmp/hadoop</value>
    <description>Local scratch space for Hive jobs</description>
    </property>
    <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/home/hadoop/apache-hive-1.2.2-bin/iotmp/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
    </property>
    <property>
    <name>hive.querylog.location</name>
    <value>/home/hadoop/apache-hive-1.2.2-bin/iotmp/hadoop</value>
    <description>Location of Hive run time structured log file</description>
    </property>
    <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/home/hadoop/apache-hive-1.2.2-bin/iotmp/hadoop/operation_logs</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
    </property>

重新启动hive:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ cd /home/hadoop/apache-hive-1.2.2-bin/bin
$ ./hive

Logging initialized using configuration in jar:file:/home/hadoop/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.821 seconds, Fetched: 1 row(s)
hive>
> use default;
OK
Time taken: 0.043 seconds
hive>
> show tables;
OK
Time taken: 0.094 seconds
hive>

至此,hive安装成功。从上面可知,hive有一个默认的数据库default,并且里面一张表也没有。

hive初体验

创建数据库:

1
2
3
4
5
6
7
8
9
hive> CREATE DATABASE IF NOT EXISTS tim;
OK
Time taken: 0.323 seconds
hive>
> show databases;
OK
default
tim
Time taken: 0.025 seconds, Fetched: 2 row(s)

同样,我们可以在HDFS中查看到:

1
2
3
4
$ cd /home/hadoop/hadoop-2.7.3/
$ ./bin/hdfs dfs -ls /user/hive/warehouse
Found 1 items
drwxrwxr-x - hadoop supergroup 0 2019-01-01 19:32 /user/hive/warehouse/tim.db

创建表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
hive> use tim;
OK
Time taken: 0.042 seconds
hive>
> CREATE TABLE IF NOT EXISTS t_user (
> id INT,
> name STRING COMMENT 'user name',
> age INT COMMENT 'user age',
> sex STRING COMMENT 'user sex',
> birthday DATE COMMENT 'user birthday',
> address STRING COMMENT 'user address'
> )
> COMMENT 'This is the use info table'
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE;
OK
Time taken: 0.521 seconds
hive>
> show tables;
OK
t_user
Time taken: 0.035 seconds, Fetched: 1 row(s)
hive>

查看表结构

1
2
3
4
5
6
7
8
9
10
hive> desc t_user;
OK
id int
name string user name
age int user age
sex string user sex
birthday date user birthday
address string user address
Time taken: 0.074 seconds, Fetched: 6 row(s)
hive>

插入数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
hive> INSERT INTO TABLE t_user(id, name, age, sex, birthday, address) VALUES(1, 'Tim Ho', 23, 'M', '1989-05-01', 'Higher Education Mega Center South, Guangzhou city, Guangdong Province');
Query ID = hadoop_20190102160558_640a90a7-9122-4650-af78-acb436e2643b
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1546186928725_0015, Tracking URL = http://hadoop-host-master:8088/proxy/application_1546186928725_0015/
Kill Command = /home/hadoop/hadoop-2.7.3/bin/hadoop job -kill job_1546186928725_0015
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-01-02 16:06:08,341 Stage-1 map = 0%, reduce = 0%
2019-01-02 16:06:14,565 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
MapReduce Total cumulative CPU time: 1 seconds 390 msec
Ended Job = job_1546186928725_0015
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop-cluster-ha/user/hive/warehouse/tim.db/t_user/.hive-staging_hive_2019-01-02_16-05-58_785_7094384272339204067-1/-ext-10000
Loading data to table tim.t_user
Table tim.t_user stats: [numFiles=1, numRows=1, totalSize=96, rawDataSize=95]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.39 sec HDFS Read: 4763 HDFS Write: 162 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 390 msec
OK
Time taken: 17.079 seconds
hive>

执行插入操作它会产生一个mapReduce任务。

查询数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
hive> select * from t_user;
OK
1 Tim Ho 23 M 1989-05-01 Higher Education Mega Center South, Guangzhou city, Guangdong Province
Time taken: 0.196 seconds, Fetched: 1 row(s)
hive>
> select * from t_user where name='Tim Ho';
OK
1 Tim Ho 23 M 1989-05-01 Higher Education Mega Center South, Guangzhou city, Guangdong Province
Time taken: 0.258 seconds, Fetched: 1 row(s)
hive>
> select count(*) from t_user;
Query ID = hadoop_20190102161100_d60df721-539d-4e5b-a3db-a4951ac884b4
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1546186928725_0016, Tracking URL = http://hadoop-host-master:8088/proxy/application_1546186928725_0016/
Kill Command = /home/hadoop/hadoop-2.7.3/bin/hadoop job -kill job_1546186928725_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-01-02 16:11:10,739 Stage-1 map = 0%, reduce = 0%
2019-01-02 16:11:16,997 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.05 sec
2019-01-02 16:11:23,280 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.37 sec
MapReduce Total cumulative CPU time: 2 seconds 370 msec
Ended Job = job_1546186928725_0016
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.37 sec HDFS Read: 7285 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 370 msec
OK
1
Time taken: 24.444 seconds, Fetched: 1 row(s)
hive>

由上面可知,执行简单的查询操作不会启动mapReduce,但执行像COUNT这样的统计操作将会产生一个mapReduce。

从文件中导入数据

语法:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

我们可以按定义表结构时的使用的字段分隔符(\t),将数据存放在文本文件里,然后使用LOAD命令来导入。例如我们将数据存放在/home/hadoop/user.txt中:

1
2
2	scott	25	M	1977-10-21	USA
3 tiger 21 F 1977-08-12 UK

然后在hive中执行LOAD命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
hive> LOAD DATA LOCAL INPATH '/home/hadoop/user.txt' INTO TABLE t_user;
Loading data to table tim.t_user
Table tim.t_user stats: [numFiles=2, numRows=0, totalSize=151, rawDataSize=0]
OK
Time taken: 0.214 seconds
hive>
> select * from t_user;
OK
1 Tim Ho 23 M 1989-05-01 Higher Education Mega Center South, Guangzhou city, Guangdong Province
2 scott 25 M 1977-10-21 USA
3 tiger 21 F 1977-08-12 UK
Time taken: 0.085 seconds, Fetched: 3 row(s)
hive>

通过JAVA代码操作hive

HQL脚本通常有以下几种方式执行:

  1. hive -e “hql”;
  2. hive -f “hql.file”;
  3. hive jdbc code.

本节主要讲讲如何通过java来操作hive,首先启动HiveServer2,hiveserver2命令未来可用于替代hive命令

1
2
$ cd /home/hadoop/apache-hive-1.2.2-bin/bin
$ ./hiveserver2

启动后,你可能会发现,啥也没输出。这时我们在另一个SHELL窗口中启动beelie

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
$ cd /home/hadoop/apache-hive-1.2.2-bin/bin
$ ./beeline -u jdbc:hive2://hadoop-host-master:10000 -n hadoop -p hadoop

Connecting to jdbc:hive2://hadoop-host-master:10000
Connected to: Apache Hive (version 1.2.2)
Driver: Hive JDBC (version 1.2.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.2 by Apache Hive
0: jdbc:hive2://hadoop-host-master:10000>
0: jdbc:hive2://hadoop-host-master:10000> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
| tim |
+----------------+--+
2 rows selected (0.217 seconds)
0: jdbc:hive2://hadoop-host-master:10000> use tim;
No rows affected (0.08 seconds)
0: jdbc:hive2://hadoop-host-master:10000> show tables;
+-----------+--+
| tab_name |
+-----------+--+
| t_user |
+-----------+--+
1 row selected (0.071 seconds)
0: jdbc:hive2://hadoop-host-master:10000> select * from t_user;
+------------+--------------+-------------+-------------+------------------+-------------------------------------------------------------------------+--+
| t_user.id | t_user.name | t_user.age | t_user.sex | t_user.birthday | t_user.address |
+------------+--------------+-------------+-------------+------------------+-------------------------------------------------------------------------+--+
| 1 | Tim Ho | 23 | M | 1989-05-01 | Higher Education Mega Center South, Guangzhou city, Guangdong Province |
| 2 | scott | 25 | M | 1977-10-21 | USA |
| 3 | tiger | 21 | F | 1977-08-12 | UK |
+------------+--------------+-------------+-------------+------------------+-------------------------------------------------------------------------+--+
3 rows selected (0.219 seconds)
0: jdbc:hive2://hadoop-host-master:10000>

由上面可知,和在hive命令下的操作是一样的。上面的命令也可以没有-p hadoop这个参数,这个可以在hive-site.xml中配置。

java代码操作hive的例子在这里:HiveUtil.javaHiveDemo.java

后台方式启动hive

For versions 1.2 and above, hive is deprecated and the hiveserver2 command should be used directly.

So the correct way to start hiveserver2 in background is now:

cd /home/hadoop/apache-hive-1.2.2-bin/bin
nohup ./hiveserver2 &

Or with output to a log file:

nohup ./hiveserver2 > hive.log &

未完待续……

hadoop 集群的搭建HA

本篇将说说hadoop集群HA的搭建,如果不想搭建HA,可以参考我之前的笔记:hadoop 集群的搭建,下面HA的搭建很多步骤与此文相同。

为了解决hadoop 1.0.0之前版本的单点故障问题,在hadoop 2.0.0中通过在同一个集群上运行两个NameNode主动/被动配置热备份,这样集群允许在一个NameNode出现故障时,请求转移到另外一个NameNode来保证集群的正常运行。两个NameNode有相同的职能。在任何时刻,只有一个是active状态的,另一个是standby状态的。当集群运行时,只有active状态的NameNode是正常工作的,standby状态的NameNode是处于待命状态的,时刻同步active状态NameNode的数据。一旦active状态的NameNode不能工作,通过手工或者自动切换,standby状态的NameNode就可以转变为active状态的,就可以继续工作了,这就是高可靠。

安装过程参考官方文档:
http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

hadoop集群的搭建,我们将搭建如下图所示的集群,HADOOP集群中所有节点的配置文件可以一模一样的。

对上图的节点分布,如下图(绿色代表在这些节点上面安装这些程序,一般运行namenode的节点都同时运行ZKFC):

在我的笔记本电脑中,安装虚拟机VirtualBox,在虚拟机中安装四台服务器:master、slave1、slave2、slave3来搭建hadoop集群HA。安装好VirtualBox后,启动它。依次点File -> Host Network Manager -> Create,来创建一个网络和虚拟机中的机器通讯,这个地址是:192.168.56.1,也就是我们外面实体机的地址(仅和虚拟机中的机器通讯使用)。如下图:

我们使用ubuntu 18.04来作为我们的服务器,先在虚拟机中安装好一台服务器master,将Jdk、hadoop在上面安装好,然后将master克隆出slave1、slave2、slave3。以master为namenode节点,slave1、slave2、slave3作为datanode节点。slave1同时也作为namenode节点。相关配置如下:

master:
    ip: 192.168.56.110
    hostname: hadoop-host-master
slave1:
    ip: 192.168.56.111
    hostname: hadoop-host-slave-1
slave2:
    ip: 192.168.56.112
    hostname: hadoop-host-slave-2
slave3:
    ip: 192.168.56.113
    hostname: hadoop-host-slave-3

下面开始master的安装

在虚拟机中安装master的过程中我们会设置一个用户用于登录,我们将用户名、密码都设为hadoop,当然也可以为其他名字,其他安装过程略。安装好之后,使用默认的网关配置NAT,NAT可以访问外网,我们将jdk-8u102-linux-x64.tar.gzhadoop-2.7.3.tar.gz从它们的官网下载到用户的/home/hadoop/目录下。或在实体机中通过SCP命令传进去。然后将网关设置为Host-only Adapter,如下图所示。

网关设置好了之后,我们接下来配置IP地址。在master[Settings] -> [Network] -> [Wired 这里打开] -> [IPv4]按如下设置:

管理集群

在上面的IP等配置好之后,我们选择关闭master,注意不是直接关闭,而是在关闭的时候选择Save the machine state。然后在虚拟机中选中master -> Start 下拉箭头 -> Headless start,然后在我们实体机中通过ssh直接登录到master。

1
$ ssh hadoop@192.168.56.110

我们可以在实体机通过配置/etc/hosts,加上如下配置:

192.168.56.110    hadoop-host-master

然后就可以通过如下方式登录了

1
$ ssh hadoop@hadoop-host-master

在实体机中通过下面的配置,就可以无密码登录了:

1
$ ssh-copy-id hadoop@hadoop-host-master

下面的操作,均是在实体机中通过SSH到虚拟机执行的操作。

安装ssh openssh rsync

如系统已安装,则勿略下面的安装操作

1
$ sudo apt-get install ssh openssh-server rsync

如果上述命令无法执行,请先执行如下命令:

1
$ sudo apt-get update

JDK的安装请参考我之前的笔记:安装 JDK,这里不再赘述。安装到此目录/usr/local/jdk1.8.0_102/下,记住此路径,下面会用到。下在进行hadoop的安装。

1
2
$ cd /home/hadoop/
$ tar xf hadoop-2.7.3.tar.gz

解压后得到hadoop-2.7.3目录,hadoop的程序和相关配置就在此目录中。

建保存数据的目录

1
2
3
4
5
6
7
8
$ cd /home/hadoop/hadoop-2.7.3
$ mkdir -p hdfs/tmp
$ mkdir -p hdfs/name
$ mkdir -p hdfs/data
$ mkdir -p journal/data
$
$ chmod -R 777 hdfs/
$ chmod -R 777 journal/

配置文件浏览

hadoop的配置文件都位于下面的目录下:

1
2
3
4
5
6
7
8
9
10
11
12
$ cd /home/hadoop/hadoop-2.7.3/etc/hadoop
$ ls
capacity-scheduler.xml httpfs-env.sh mapred-env.sh
configuration.xsl httpfs-log4j.properties mapred-queues.xml.template
container-executor.cfg httpfs-signature.secret mapred-site.xml.template
core-site.xml httpfs-site.xml slaves
hadoop-env.cmd kms-acls.xml ssl-client.xml.example
hadoop-env.sh kms-env.sh ssl-server.xml.example
hadoop-metrics2.properties kms-log4j.properties yarn-env.cmd
hadoop-metrics.properties kms-site.xml yarn-env.sh
hadoop-policy.xml log4j.properties yarn-site.xml
hdfs-site.xml mapred-env.cmd

配置hadoop-env.sh,加上JDK绝对路径

JDK的路径就是上面安装JDK的时候的路径:

1
export JAVA_HOME=/usr/local/jdk1.8.0_102/

配置core-site.xml,在该文件中加入如下内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-cluster-ha</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop-2.7.3/hdfs/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop-host-master:2181,hadoop-host-slave-1:2181,hadoop-host-slave-2:2181</value>
</property>
</configuration>

配置hdfs-site.xml,在该文件中加入如下内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
<configuration>
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster-ha</value>
</property>
<property>
<name>dfs.ha.namenodes.hadoop-cluster-ha</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster-ha.nn1</name>
<value>hadoop-host-master:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster-ha.nn2</name>
<value>hadoop-host-slave-1:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster-ha.nn1</name>
<value>hadoop-host-master:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster-ha.nn2</name>
<value>hadoop-host-slave-1:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop-host-slave-1:8485;hadoop-host-slave-2:8485;hadoop-host-slave-3:8485/hadoop-cluster-ha</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.hadoop-cluster-ha</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/hadoop-2.7.3/journal/data</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>

<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop-2.7.3/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop-2.7.3/hdfs/data</value>
</property>
</configuration>

至此,master中要安装的通用环境配置完成。在虚拟机中将master复制出slave1、slave2、slave3。并参考上面配置IP地址的方法将slave1的ip配置为:192.168.56.111,slave2的ip配置为:192.168.56.112,slave3的ip配置为:192.168.56.113

配置主机名

配置master的主机名为hadoop-host-master,在master节点执行如下操作:

1
2
3
4
$ sudo vi /etc/hostname

修改为如下内容
hadoop-host-master

配置slave1的主机名为hadoop-host-slave-1,在slave1节点执行如下操作:

1
2
3
4
$ sudo vi /etc/hostname

修改为如下内容
hadoop-host-slave-1

配置slave2的主机名为hadoop-host-slave-2,在slave2节点执行如下操作:

1
2
3
4
$ sudo vi /etc/hostname

修改为如下内容
hadoop-host-slave-2

配置slave3的主机名为hadoop-host-slave-3,在slave3节点执行如下操作:

1
2
3
4
$ sudo vi /etc/hostname

修改为如下内容
hadoop-host-slave-3

注意:各个节点的主机名一定要不同,否则相同主机名的节点,只会有一个连得上namenode节点,并且集群会报错,修改主机名后,要重启才生效。

配置域名解析

分别对master、slave1、slave2、slave3都执行如下操作:

1
2
3
4
5
6
7
8
$ sudo vi /etc/hosts

修改为如下内容
127.0.0.1 localhost
192.168.56.110 hadoop-host-master
192.168.56.111 hadoop-host-slave-1
192.168.56.112 hadoop-host-slave-2
192.168.56.113 hadoop-host-slave-3

集中式管理集群

配置SSH无密码登陆,分别在master、slave1、slave2和slave3上面执行如下脚本:

1
$ ssh-keygen -t rsa -P ""

在master、slave1上面执行如下脚本(master和slave1都作为namenode):

1
2
3
4
$ ssh-copy-id hadoop-host-master
$ ssh-copy-id hadoop-host-slave-1
$ ssh-copy-id hadoop-host-slave-2
$ ssh-copy-id hadoop-host-slave-3

每执行一条命令的时候,都先输入yes,然后再输入目标机器的登录密码。

如果能成功运行如下命令,则配置免密登录其他机器成功。

1
2
3
4
$ ssh hadoop-host-master
$ ssh hadoop-host-slave-1
$ ssh hadoop-host-slave-2
$ ssh hadoop-host-slave-3

在master、slave1上面执行如下脚本:

1
2
3
4
5
6
$ cd /home/hadoop/hadoop-2.7.3/etc/hadoop/
$ vi slaves # 加入如下内容
$
hadoop-host-slave-1
hadoop-host-slave-2
hadoop-host-slave-3

当执行start-dfs.sh时,它会去slaves文件中找从节点。

安装zookeeper

我们在master、slave1和slave2上面安装zookeeper集群,安装过程可以参考:zookeeper 集群版安装方法,这里不再赘述。

至此,集群配置完成,下面将启动集群。

启动集群

首次启动的时候,先启动journalnode,分别在三台journalnode机器上面启动,因为接下来格式化namenode的时候,数据会写到这些节点中:

1
2
3
4
5
6
$ cd /home/hadoop/hadoop-2.7.3/
$ ./sbin/hadoop-daemon.sh start journalnode
$
$ jps # 查看是否启动成功
4016 Jps
2556 JournalNode

接下来在任意一台namenode执行如下命令,我们在master中执行:

1
2
3
4
5
6
7
$ cd /home/hadoop/hadoop-2.7.3/
$ ./bin/hdfs namenode -format # 再次启动的时候不需要执行此操作
$ ./sbin/hadoop-daemon.sh start namenode
$
$ jps # 查看是否启动成功
4016 Jps
2556 NameNode

然后在另一台未格式化的namenode节点,即slave1执行:

1
2
$ cd /home/hadoop/hadoop-2.7.3/
$ ./bin/hdfs namenode -bootstrapStandby

然后停掉所有服务,在master下执行:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ cd /home/hadoop/hadoop-2.7.3/
$ ./sbin/stop-dfs.sh

Stopping namenodes on [hadoop-host-master hadoop-host-slave-1]
hadoop-host-slave-1: no namenode to stop
hadoop-host-master: stopping namenode
hadoop-host-slave-1: no datanode to stop
hadoop-host-slave-2: no datanode to stop
hadoop-host-slave-3: no datanode to stop
Stopping journal nodes [hadoop-host-slave-1 hadoop-host-slave-2 hadoop-host-slave-3]
hadoop-host-slave-2: stopping journalnode
hadoop-host-slave-1: stopping journalnode
hadoop-host-slave-3: stopping journalnode
Stopping ZK Failover Controllers on NN hosts [hadoop-host-master hadoop-host-slave-1]
hadoop-host-slave-1: no zkfc to stop
hadoop-host-master: no zkfc to stop

在其中一个namenode上执行格式化ZKFC,我们在master中执行:

1
2
3
4
5
6
7
8
$ cd /home/hadoop/hadoop-2.7.3/
$ ./bin/hdfs zkfc -formatZK
$

18/12/30 12:54:52 INFO ha.ActiveStandbyElector: Session connected.
18/12/30 12:54:52 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/hadoop-cluster-ha in ZK.
18/12/30 12:54:52 INFO zookeeper.ClientCnxn: EventThread shut down
18/12/30 12:54:52 INFO zookeeper.ZooKeeper: Session: 0x167fd5512250000 closed

再次启动集群的时候,不需执行上面的操作,直接执行如下命令即可,我们在master上面执行如下命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ cd /home/hadoop/hadoop-2.7.3/
$ ./sbin/start-dfs.sh
$

Starting namenodes on [hadoop-host-master hadoop-host-slave-1]
hadoop-host-slave-1: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-hadoop-host-slave-1.out
hadoop-host-master: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-hadoop-host-master.out
hadoop-host-slave-2: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-hadoop-host-slave-2.out
hadoop-host-slave-1: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-hadoop-host-slave-1.out
hadoop-host-slave-3: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-hadoop-host-slave-3.out
Starting journal nodes [hadoop-host-slave-1 hadoop-host-slave-2 hadoop-host-slave-3]
hadoop-host-slave-2: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-hadoop-host-slave-2.out
hadoop-host-slave-1: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-hadoop-host-slave-1.out
hadoop-host-slave-3: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-hadoop-host-slave-3.out
Starting ZK Failover Controllers on NN hosts [hadoop-host-master hadoop-host-slave-1]
hadoop-host-slave-1: starting zkfc, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-zkfc-hadoop-host-slave-1.out
hadoop-host-master: starting zkfc, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-zkfc-hadoop-host-master.out

它会自动启动namenode、datanode、journalnode和zkfc,在启动的过程中观看日志,是个好习惯。

可以在实体机的浏览器中输入:
http://hadoop-host-master:50070/
http://hadoop-host-slave-1:50070/
来查看是否启动成功,如无意外的话,你会看到如下结果页面。其中一个是active,另一个是standby:

我们在active节点的页面上切换tab到Datanodes可以看到有3个datanode节点,如下图所示:

切换到Utilities -> Browse the file system,如下图所示(只能在active节点的页面中查看,standby节点对HDFS没有READ权限):

从上面的界面可以看到,目前HDFS中没有任何文件。我们尝试往其中放一个文件,就将我们的hadoop的压缩包放进去,在active的namenode节点中执行如下操作:

1
2
3
4
5
6
$ cd /home/hadoop/hadoop-2.7.3/
$ ./bin/hdfs dfs -put /home/hadoop/Downloads/hadoop-2.7.3.tar.gz /

$ ./bin/hdfs dfs -ls /
Found 1 items
-rw-r--r-- 3 hadoop supergroup 214092195 2018-12-29 22:07 /hadoop-2.7.3.tar.gz

我们在图形界面中查看,如下图:

我们点击列表中的文件,将会显示它的数据具体分布在哪些节点上,如下图:

注意:在主节点执行start-dfs.sh,主节点的用户名必须和所有从节点的用户名相同。因为主节点服务器以这个用户名去远程登录到其他从节点的服务器中,所以在所有的生产环境中控制同一类集群的用户一定要相同。

验证failover,即验证两个namenode是否可以自动切换

我们将active的namenode kill掉,在active的namenode节点上面执行:

1
2
3
4
5
6
7
$ jps
2593 QuorumPeerMain
31444 Jps
30613 NameNode
30965 DFSZKFailoverController

$ kill -9 30613

我们kill掉之后发现standby无法自动切换到active。我们查看日志,发现:
/home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-zkfc-hadoop-host-slave-1.log
有如下内容:

结论:两个namenode节点无法自动切换,的原因是操作系统安装的openssh版本和hadoop内部使用的版本不匹配造成的。

解决方案:将$HADOOP_HOME/share目录下的jsch-0.1.42.jar升级到jsch-0.1.54.jar,重启集群,问题解决。

我们首先到maven中央仓库下载jsch-0.1.54.jar

https://mvnrepository.com/artifact/com.jcraft/jsch/0.1.54

我们只需将两个namenode中的jsch-0.1.42.jar升级到jsch-0.1.54.jar即可:

1
2
3
4
5
6
7
$ cd /home/hadoop/hadoop-2.7.3/
$ find ./ -name "*jsch*"
$
./share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/jsch-0.1.42.jar
./share/hadoop/common/lib/jsch-0.1.42.jar
./share/hadoop/tools/lib/jsch-0.1.42.jar
./share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/jsch-0.1.42.jar

从查询结果看,只有4个JAR包需要升级,我们只要将两个namenode节点中的JAR包替换即可。重启集群,再次验证failover,我们可以看到两个namenode已经可以自动切换。大功告成。

启动YARN

YARN的启动步骤和hadoop 集群的搭建一样,这里不再赘述。

active和standby之间的手动切换

有时候,我们需要手动将某个namenode设置为active,可以通过haadmin命令,相关用法如下(我一般的做法是将原来active的namenode断网,从而让standby的节点成为active,然后再将之前断网的机器连回网络):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ cd /home/hadoop/hadoop-2.7.3
$ ./bin/hdfs haadmin --help
-help: Unknown command
Usage: haadmin
[-transitionToActive [--forceactive] <serviceId>]
[-transitionToStandby <serviceId>]
[-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]
[-getServiceState <serviceId>]
[-checkHealth <serviceId>]
[-help <command>]

Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

$ ./bin/hdfs haadmin -getServiceState nn1
standby
$ ./bin/hdfs haadmin -getServiceState nn2
active
$ ./bin/hdfs haadmin -transitionToActive --forcemanual nn1