greenplum sql

简介

FTS(Fault Tolerance Service)是greenplum提供的对于子节点的故障检测与恢复的服务。FTS是一个隶属于master的子进程，通过定期轮询每个primary的状态来获取每个primary-mirror组的状态。该进程只在master上存在，进程名为ftsprobe process。
在这里插入图片描述
这里需要注意的一点是，FTS并不直接连接mirror，对于mirror的状态，FTS也是通过primary来获取。Primary则通过wal-sender进程的状态来获取mirror存活和同步状态。
FTS在满足如下三个条件的时候会触发轮询

到时间了，gp_fts_probe_interval
用户手动执行select gp_request_fts_probe_scan()
查询执行过程中发现节点异常
轮询过程图如下：

FTS进程工作流程

FTS的进程名为ftsprobe process，后台进程标记为BGWORKER_SHMEM_ACCESS和BGWORKER_BACKEND_DATABASE_CONNECTION，也就是FTS进程可以访问后台进程共享内存，可以连接数据库。后台进程启动时机由下面的枚举类型标识，BgWorkerStart_DtxRecovering意味着进程可以在postmaster启动后，在分布式事务恢复前启动。这里的FTS的进程就是这样的进程。

Static BackgroundWorker PMAuxProcList[MaxPMAuxProc] = {
	{"ftsprobe process",
	 BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION,
	 BgWorkerStart_DtxRecovering, /* no need to wait dtx recovery */
	 0, /* restart immediately if ftsprobe exits with non-zero code */
	 FtsProbeMain, {0}, {0}, 0, 0,
	 FtsProbeStartRule},

Greenplum sets up a group of GP background processes through an array structure PMAuxProcList. A entry in that struct represents a GP background process.
Two functions pointers are important members of the BackgroundWorker structure. One points to main entry function of the GP background process. The other points to the function that determine if the process should be started or not. For FTS, these two functions are FtsProbeMain() and FtsProbeStartRule(), respectively. This is hard-coded in postmaster.c.
#define MaxPMAuxProc 6
static BackgroundWorker PMAuxProcList[MaxPMAuxProc]
In Postmaster, we will check the following condition:
Gp_entry_postmaster && Gp_role == GP_ROLE_DISPATCH
The FTS probe process is started when the condition is true.
In the initialization phase, we register one BackgroundWorker entry for each GP background process into postmaster’s private structure BackgroundWorkerList. When we do this, the above condition is checked to decide if FTS should be registered there or not. The reader may want to check load_auxiliary_libraries() for more detail.
Later, the postmaster tries to start the processes that have beenregistered in the BackgroundWorkerList, which includes the FTS probe process. If first attempt to start a particular process fails, or a process goes down for some reason and needs to be brought up again, postmaster restarts it in its main loop. Every iteration, it checks the status of these processes and acts accordingly.

FTS共享内存初始化

FTS共享内存初始化在共享内存创建pid_t大小内存，并初始化为0.

static volatile pid_t *shmFtsProbePID;
void FtsProbeShmemInit(void) {
	if (IsUnderPostmaster)
		return;
	shmFtsProbePID = (volatile pid_t*)ShmemAlloc(sizeof(*shmFtsProbePID));
	*shmFtsProbePID = 0;
}

FTS进程主函数FtsProbeMain

对于SIGHUP信号设置sigHupHandler信号处理函数，将got_SIGHUP设置为true，设置Latch。表示需要读取postgresql.conf。

static volatile sig_atomic_t got_SIGHUP = false;
static void sigHupHandler(SIGNAL_ARGS) {
	got_SIGHUP = true;
	if (MyProc) SetLatch(&MyProc->procLatch);
}

对于SIGINT信号设置sigIntHandler信号处理函数，将probe_requested设置为true，设置Latch。表示需要重新触发FTS扫描。

static volatile bool probe_requested = false;
static void sigIntHandler(SIGNAL_ARGS) {
	probe_requested = true;
	if (MyProc) SetLatch(&MyProc->procLatch);
}

对于SIGSEGV程序错误信号设置CdbProgramErrorHandler信号处理函数，

void CdbProgramErrorHandler(SIGNAL_ARGS) {
    int			save_errno = errno;
    char       *pts = "process";
	if (!pthread_equal(main_tid, pthread_self())) {
#ifndef _WIN32
		write_stderr("\nUnexpected internal error: Master %d received signal %d in worker thread %lu (forwarding signal to main thread)\n\n", MyProcPid, postgres_signal_arg, (unsigned long)pthread_self());
#else
		write_stderr("\nUnexpected internal error: Master %d received signal %d in worker thread %lu (forwarding signal to main thread)\n\n", MyProcPid, postgres_signal_arg, (unsigned long)pthread_self().p);
#endif
		/* Only forward if the main thread isn't quick-dying. */
		if (!in_quickdie) pthread_kill(main_tid, postgres_signal_arg);

		/*
		 * Don't exit the thread when we reraise SEGV/BUS/ILL signals to the OS.
		 * This thread will die together with the main thread after the OS reraises
		 * the signal. This is to ensure that the dumped core file contains the call
		 * stack on this thread for later debugging.
		 */
		if (!(gp_reraise_signal &&(postgres_signal_arg == SIGSEGV || postgres_signal_arg == SIGILL || postgres_signal_arg == SIGBUS))) {
			pthread_exit(NULL);
		}
		return;
	}

    if (Gp_role == GP_ROLE_DISPATCH)
        pts = "Master process";
    else if (Gp_role == GP_ROLE_EXECUTE)
        pts = "Segment process";
    else
        pts = "Process";
    errno = save_errno;
    StandardHandlerForSigillSigsegvSigbus_OnMainThread(pts, PASS_SIGNAL_ARGS);
}

Fts进程主函数FtsProbeMain主要流程就是设置信号处理函数，放开信号屏蔽，初始化连接，调用FtsLoop函数。


bool am_ftsprobe = false;
void FtsProbeMain(Datum main_arg) {
	*shmFtsProbePID = MyProcPid;  // 向共享内存中设置进程pid
	am_ftsprobe = true;           // 设置fts标志位启动
	/* reread postgresql.conf if requested */
	pqsignal(SIGHUP, sigHupHandler); pqsignal(SIGINT, sigIntHandler);
	/* CDB: Catch program error signals. Save our main thread-id for comparison during signals. */
	main_tid = pthread_self();
#ifdef SIGSEGV
	pqsignal(SIGSEGV, CdbProgramErrorHandler);
#endif
	/* We're now ready to receive signals */
	BackgroundWorkerUnblockSignals();
	/* Connect to our database */
	BackgroundWorkerInitializeConnection(DB_FOR_COMMON_ACCESS, NULL); // DB_FOR_COMMON_ACCESS定义为postgres
	/* main loop */
	FtsLoop();
	/* One iteration done, go away */
	proc_exit(0);
}

FTS故障检测

对于每个primary-mirror组来说一般会有如下几种状态。

primary正常，mirror正常
primary正常，mirror异常
primary异常，mirror正常
primary异常，mirror异常
Greenplum目前是一主一从架构，所以对于第四种primary-mirror都出故障的情况是解决不了的，在这种情况下只能通过人工干预解决。对于第一种都正常的情况也不需要做任何处理。所以下面对二三两种情况做详细介绍。

故障1：primary挂掉

这是最常见，也是高可用解决的最主要的问题。一旦FTS发现某个primary已经宕机，此时，如果Mirror是同步的，则将把对应的Mirror Promote成primary，并更新catalog。
在这里插入图片描述
Promote后catalog可以看到如下更新。

通过role可以看出，mirror成了primary，preferred_role没有变，同时原primary被标记成了mirror，但status是d。mode也被标记成了n。

故障2：mirror挂掉

如果mirror挂掉了，意味着primary与mirror的复制不可能同步了，所以primary会hang住，直到FTS来通知primary执行Sync-off来关闭同步复制。关闭同步复制的方法即将synchronous_standby_names设置成空。
在这里插入图片描述

可以看到catalog表的变化，mode已经变成不同步，mirror的状态也标记成down。