PostgreSQL源码分析—

PostgreSQL源码分析——日志归档

PG中有日志归档功能，主要目的就是备份恢复，PITR，为啥要做日志归档呢？因为在做检查点时会清理WAL日志，清理了之后，就没法实现恢复到任意时刻数据库状态了，而有了日志归档，我们可以保存从数据库初始状态到当前时刻的所有日志，相当于给数据库做了一个备份。当发生故障或者误操作时，可以恢复到指定时刻数据库的状态。

打开日志归档

在配置文件中配置archive_mode=on打开日志归档，启动时会创建归档进程archiver，通过archive_command中配置的命令进行归档。

# - Archiving -
archive_mode = on               # enables archiving; off, on, or always (change requires restart)
archive_command = 'cp %p /home/postgres/pgsql/archive/%f'               # command to use to archive a logfile segment
                                # placeholders: %p = path of file to archive
                                #               %f = file name only
                                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
archive_timeout = 1800          # force a logfile segment switch after this
                                # number of seconds; 0 disables

归档进程源码

我们看一下归档进程的源码，在src/backend/postmaster/pgarch.c中：

PgArchiverMain(void)
--> pgarch_MainLoop();	// 进入归档主循环
	--> pgarch_ArchiverCopyLoop(); 
		--> pgarch_readyXlog

日志归档的逻辑，主要是什么时候进行归档？核心要点是发生日志段切换时会触发，那我们看一下那些情况会触发日志切换

当WAL日志中的一个日志段（日志文件）已满，需要切换到下一个日志段时，就可以通知archiver进程将这个日志归档。产生日志切换的进程会在通知Postmaster之前先在pg_wal/archive_status下生成一个.ready文件，这个文件和待归档日志同名。
如果长时间没有归档，触发archive_timeout超时，则强制进行日志切换，强制归档
调用pg_switch_wal()函数手动触发

我们看一下归档进程主循环的实现逻辑，就是等待归档通知信号，拷贝日志：

static void pgarch_MainLoop(void)
{
   
	pg_time_t	last_copy_time = 0;
	bool		time_to_stop;
	// 进入主循环, 等待收到日志归档通知
	do {
   
		ResetLatch(MyLatch);

		/* When we get SIGUSR2, we do one more archive cycle, then exit */
		time_to_stop = ready_to_stop;

		/* Check for barrier events and config update */
		HandlePgArchInterrupts();

		// ...

		/* Do what we're here for */
		pgarch_ArchiverCopyLoop();		// 进行日志归档，拷贝WAL日志
		last_copy_time = time(NULL);

		/* Sleep until a signal is received, or until a poll is forced by
		 * PGARCH_AUTOWAKE_INTERVAL having passed since last_copy_time, or until postmaster dies. */
		if (!time_to_stop)		/* Don't wait during last iteration */
		{
   
			pg_time_t	curtime = (pg_time_t) time(NULL);
			int			timeout;

			timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time);
			if (timeout > 0) {
   
				int			rc;
				rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,timeout * 1000L, WAIT_EVENT_ARCHIVER_MAIN);
				if (rc & WL_POSTMASTER_DEATH)
					time_to_stop = true;
			}
		}
	} while (!time_to_stop);
}

触发归档的时机1

其中最重要的就是什么时候发信号，通知可以归档，是在切换日志段的时候，什么时候会切换日志段呢？用户可以通过调用pg_switch_wal函数强制切换日志段，正常情况下是不断插入日志的过程中，如果超出了日志段的大小，会触发切换日志段。我们看一下这块的处理逻辑。

具体的XLogWrite调用过程可参考文章PostgreSQL源码分析——WAL日志（二）

static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
{
   
	// ...
			/*
			 * If we just wrote the whole last page of a logfile segment,
			 * fsync the segment immediately.  This avoids having to go back
			 * and re-open prior segments when an fsync request comes along
			 * later. Doing it here ensures that one and only one backend will
			 * perform this fsync.
			 *
			 * This is also the right place to notify the Archiver that the
			 * segment is ready to copy to archival storage, and to update the
			 * timer for archive_timeout, and to signal for a checkpoint if
			 * too many logfile segments have been used since the last checkpoint. */
			if (finishing_seg)	// 一个段已满
			{
   
				// 将该段刷入磁盘，保证归档日志的数据完整性
				issue_xlog_fsync(openLogFile, openLogSegNo);

				// 通知walsender进程发送日志给standby
				/* signal that we need to wakeup walsenders later */
				WalSndWakeupRequest();

				LogwrtResult.Flush = LogwrtResult.Write;	/* end of page */

				if (XLogArchivingActive())
					XLogArchiveNotifySeg(openLogSegNo);	// 发送日志归档通知信息

				// 更新日志切换时间，计算archive_timeout用
				XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
				XLogCtl->lastSegSwitchLSN = LogwrtResult.Flush;

				/*
				 * Request a checkpoint if we've consumed too much xlog since
				 * the last one.  For speed, we first check using the local
				 * copy of RedoRecPtr, which might be out of date; if it looks
				 * like a checkpoint is needed, forcibly update RedoRecPtr and
				 * recheck.
				 */
				if (IsUnderPostmaster && XLogCheckpointNeeded(openLogSegNo))
				{
   
					(void) GetRedoRecPtr();
					if (XLogCheckpointNeeded(openLogSegNo))
						RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
				}
			}
}