本文重点分析 Broker 接收到生产者发送消息请求后如何存储在 Broker 上,本文暂不关注事务消息机制。
本文前置篇:RocketMQ源码分析之Broker概述与同步消息发送原理与高可用设计及思考 。
RocketMQ 的存储核心类为 DefaultMessageStore,存储消息的入口方法为:putMessage。
在深入学习消息存储之前,我们先大概了解一下DefaultMessageStore的属性与构造方法。
1、消息存储分析
1.1 DefaultMessageStore 概要
其核心属性如下:
- messageStoreConfig
存储相关的配置,例如存储路径、commitLog文件大小,刷盘频次等等。 - CommitLog commitLog
comitLog 的核心处理类,消息存储在 commitlog 文件中。 - ConcurrentMap
<String/\* topic \*/, ConcurrentMap
<Integer/* queueId */, ConsumeQueue>>` consumeQueueTable
topic 的队列信息。 - FlushConsumeQueueService flushConsumeQueueService
ConsumeQueue 刷盘服务线程。 - CleanCommitLogService cleanCommitLogService
commitLog 过期文件删除线程。 - CleanConsumeQueueService cleanConsumeQueueService
consumeQueue 过期文件删除线程。、 - IndexService indexService
索引服务。 - AllocateMappedFileService allocateMappedFileService
MappedFile 分配线程,RocketMQ 使用内存映射处理 commitlog、consumeQueue文件。 - ReputMessageService reputMessageService
reput 转发线程(负责 Commitlog 转发到 Consumequeue、Index文件)。 - HAService haService
主从同步实现服务。 - ScheduleMessageService scheduleMessageService
定时任务调度器,执行定时任务。 - StoreStatsService storeStatsService
存储统计服务。 - TransientStorePool transientStorePool
ByteBuffer 池,后文会详细使用。 - RunningFlags runningFlags
存储服务状态。 - BrokerStatsManager brokerStatsManager
Broker 统计服务。 - MessageArrivingListener messageArrivingListener
消息达到监听器。 - StoreCheckpoint storeCheckpoint
刷盘检测点。 - LinkedList dispatcherList
转发 comitlog 日志,主要是从 commitlog 转发到 consumeQueue、index 文件。
上面这些属性,是整个消息存储的核心,也是我们需要重点关注与理解的(将会在本系列一一介绍到)。
接下来,先从 putMessage 为入口,一起探究整个消息存储全过程。
1.2 消息存储流程
1.2.1 DefaultMessageStore.putMessage
public PutMessageResult putMessage(MessageExtBrokerInner msg) {
if (this.shutdown) {
log.warn("message store has shutdown, so putMessage is forbidden");
return new PutMessageResult(PutMessageStatus.SERVICE_NOT_AVAILABLE, null);
}
if (BrokerRole.SLAVE == this.messageStoreConfig.getBrokerRole()) {
long value = this.printTimes.getAndIncrement();
if ((value % 50000) == 0) {
log.warn("message store is slave mode, so putMessage is forbidden ");
}
return new PutMessageResult(PutMessageStatus.SERVICE_NOT_AVAILABLE, null);
}
if (!this.runningFlags.isWriteable()) {
long value = this.printTimes.getAndIncrement();
if ((value % 50000) == 0) {
log.warn("message store is not writeable, so putMessage is forbidden " + this.runningFlags.getFlagBits());
}
return new PutMessageResult(PutMessageStatus.SERVICE_NOT_AVAILABLE, null);
} else {
this.printTimes.set(0);
}
if (msg.getTopic().length() > Byte.MAX_VALUE) {
log.warn("putMessage message topic length too long " + msg.getTopic().length());
return new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, null);
}
if (msg.getPropertiesString() != null && msg.getPropertiesString().length() > Short.MAX_VALUE) {
log.warn("putMessage message properties length too long " + msg.getPropertiesString().length());
return new PutMessageResult(PutMessageStatus.PROPERTIES_SIZE_EXCEEDED, null);
}
if (this.isOSPageCacheBusy()) { //@1
return new PutMessageResult(PutMessageStatus.OS_PAGECACHE_BUSY, null);
}
long beginTime = this.getSystemClock().now();
PutMessageResult result = this.commitLog.putMessage(msg); // @2
long eclipseTime = this.getSystemClock().now() - beginTime;
if (eclipseTime > 500) {
log.warn("putMessage not in lock eclipse time(ms)={}, bodyLength={}", eclipseTime, msg.getBody().length);
}
this.storeStatsService.setPutMessageEntireTimeMax(eclipseTime); //@3
if (null == result || !result.isOk()) { //@4
this.storeStatsService.getPutMessageFailedTimes().incrementAndGet();
}
return result;
代码@1:检测操作系统页写入是否繁忙。
@Override
public boolean isOSPageCacheBusy() {
long begin = this.getCommitLog().getBeginTimeInLock();
long diff = this.systemClock.now() - begin;
if (diff < 10000000 //
&& diff > this.messageStoreConfig.getOsPageCacheBusyTimeOutMills()) {
return true;
}
return false;
代码@2:将日志写入CommitLog 文件,具体实现类 CommitLog。
代码@3:记录相关统计信息。
代码@4:记录写commitlog 失败次数。
1.2.2 CommitLog.putMessage
public PutMessageResult putMessage(final MessageExtBrokerInner msg) {
// Set the storage time
msg.setStoreTimestamp(System.currentTimeMillis());
// Set the message body BODY CRC (consider the most appropriate setting
// on the client)
msg.setBodyCRC(UtilAll.crc32(msg.getBody()));
// Back to Results
AppendMessageResult result = null;
StoreStatsService storeStatsService = this.defaultMessageStore.getStoreStatsService();
String topic = msg.getTopic();
int queueId = msg.getQueueId();
final int tranType = MessageSysFlag.getTransactionValue(msg.getSysFlag()); // @1
if (tranType == MessageSysFlag.TRANSACTION_NOT_TYPE//
|| tranType == MessageSysFlag.TRANSACTION_COMMIT_TYPE) { // @2
// Delay Delivery
if (msg.getDelayTimeLevel() > 0) {
if (msg.getDelayTimeLevel() > this.defaultMessageStore.getScheduleMessageService().getMaxDelayLevel()) {
msg.setDelayTimeLevel(this.defaultMessageStore.getScheduleMessageService().getMaxDelayLevel());
}
topic = ScheduleMessageService.SCHEDULE_TOPIC;
queueId = ScheduleMessageService.delayLevel2QueueId(msg.getDelayTimeLevel());
// Backup real topic, queueId
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_TOPIC, msg.getTopic());
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_QUEUE_ID, String.valueOf(msg.getQueueId()));
msg.setPropertiesString(MessageDecoder.messageProperties2String(msg.getProperties()));
msg.setTopic(topic);
msg.setQueueId(queueId);
}
}
long eclipseTimeInLock = 0;
MappedFile unlockMappedFile = null;
MappedFile mappedFile = this.mappedFileQueue.getLastMappedFile(); // @3
putMessageLock.lock(); //spin or ReentrantLock ,depending on store config //@4
try {
long beginLockTimestamp = this.defaultMessageStore.getSystemClock().now();
this.beginTimeInLock = beginLockTimestamp;
// Here settings are stored timestamp, in order to ensure an orderly
// global
msg.setStoreTimestamp(beginLockTimestamp);
if (null == mappedFile || mappedFile.isFull()) {
mappedFile = this.mappedFileQueue.getLastMappedFile(0); // Mark: NewFile may be cause noise
}
if (null == mappedFile) {
log.error("create maped file1 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString());
beginTimeInLock = 0;
return new PutMessageResult(PutMessageStatus.CREATE_MAPEDFILE_FAILED, null);
} // @5
result = mappedFile.appendMessage(msg, this.appendMessageCallback); // @6
switch (result.getStatus()) {
case PUT_OK:
break;
case END_OF_FILE:
unlockMappedFile = mappedFile;
// Create a new file, re-write the message
mappedFile = this.mappedFileQueue.getLastMappedFile(0);
if (null == mappedFile) {
// XXX: warn and notify me
log.error("create maped file2 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString());
beginTimeInLock = 0;
return new PutMessageResult(PutMessageStatus.CREATE_MAPEDFILE_FAILED, result);
}
result = mappedFile.appendMessage(msg, this.appendMessageCallback);
break;
case MESSAGE_SIZE_EXCEEDED:
case PROPERTIES_SIZE_EXCEEDED:
beginTimeInLock = 0;
return new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, result);
case UNKNOWN_ERROR:
beginTimeInLock = 0;
return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result);
default:
beginTimeInLock = 0;
return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result);
}
eclipseTimeInLock = this.defaultMessageStore.getSystemClock().now() - beginLockTimestamp;
beginTimeInLock = 0;
} finally {
putMessageLock.unlock();
}
if (eclipseTimeInLock > 500) {
log.warn("[NOTIFYME]putMessage in lock cost time(ms)={}, bodyLength={} AppendMessageResult={}", eclipseTimeInLock, msg.getBody().length, result);
}
if (null != unlockMappedFile && this.defaultMessageStore.getMessageStoreConfig().isWarmMapedFileEnable()) {
this.defaultMessageStore.unlockMappedFile(unlockMappedFile);
}
PutMessageResult putMessageResult = new PutMessageResult(PutMessageStatus.PUT_OK, result);
// Statistics
storeStatsService.getSinglePutMessageTopicTimesTotal(msg.getTopic()).incrementAndGet();
storeStatsService.getSinglePutMessageTopicSizeTotal(topic).addAndGet(result.getWroteBytes());
handleDiskFlush(result, putMessageResult, msg); // @7
handleHA(result, putMessageResult, msg); //@8
return putMessageResult;
先对ComitLog 写入消息做一个简单描述,然后需要详细探究每个步骤的实现。
代码@1:获取消息类型(事务消息,非事务消息,Commit消息。
代码@3:获取一个 MappedFile 对象,内存映射的具体实现。
代码@4,追加消息需要加锁,串行化处理。
代码@5:验证代码@3的 MappedFile 对象,获取一个可用的 MappedFile (如果没有,则创建一个)。
代码@6:通过MappedFile对象写入文件。
代码@7:根据刷盘策略刷盘。
代码@8:主从同步。
1.3 存储核心类分析
1.3.1 源码分析MappedFile
1、3.1.1 MappedFile 基础属性
public static final int OS_PAGE_SIZE = 1024 * 4; // 4K
private static final AtomicLong TOTAL_MAPPED_VIRTUAL_MEMORY = new AtomicLong(0);
private static final AtomicInteger TOTAL_MAPPED_FILES = new AtomicInteger(0);
protected final AtomicInteger wrotePosition = new AtomicInteger(0);
protected final AtomicInteger committedPosition = new AtomicInteger(0);
private final AtomicInteger flushedPosition = new AtomicInteger(0);
protected int fileSize;
protected FileChannel fileChannel;
/**
* Message will put to here first, and then reput to FileChannel if writeBuffer is not null.
*/
protected ByteBuffer writeBuffer = null;
protected TransientStorePool transientStorePool = null;
private String fileName;
private long fileFromOffset;
private File file;
private MappedByteBuffer mappedByteBuffer;
private volatile long storeTimestamp = 0;
- OS_PAGE_SIZE
OSpage大小,4K。
- TOTAL_MAPPED_VIRTUAL_MEMORY
类变量,所有 MappedFile 实例已使用字节总数。 - TOTAL_MAPPED_FILES
MappedFile 个数。 - wrotePosition
当前MappedFile对象当前写指针。 - committedPosition
当前提交的指针。 - flushedPosition
当前刷写到磁盘的指针。 - fileSize
文件总大小。 - fileChannel
文件通道。 - writeBuffer
如果开启了transientStorePoolEnable,消息会写入堆外内存,然后提交到 PageCache 并最终刷写到磁盘。 - TransientStorePool transientStorePool
ByteBuffer的缓冲池,堆外内存,transientStorePoolEnable 为 true 时生效。 - fileName
文件名称。 - fileFromOffset
文件序号,代表该文件代表的文件偏移量。 - File file
文件对象。 - MappedByteBuffer mappedByteBuffer
对应操作系统的 PageCache。 - storeTimestamp
最后一次存储时间戳。
1、3.1.2 初始化
private void init(final String fileName, final int fileSize) throws IOException {
this.fileName = fileName;
this.fileSize = fileSize;
this.file = new File(fileName);
this.fileFromOffset = Long.parseLong(this.file.getName());
boolean ok = false;
ensureDirOK(this.file.getParent());
try {
this.fileChannel = new RandomAccessFile(this.file, "rw").getChannel();
this.mappedByteBuffer = this.fileChannel.map(MapMode.READ_WRITE, 0, fileSize);
TOTAL_MAPPED_VIRTUAL_MEMORY.addAndGet(fileSize);
TOTAL_MAPPED_FILES.incrementAndGet();
ok = true;
} catch (FileNotFoundException e) {
log.error("create file channel " + this.fileName + " Failed. ", e);
throw e;
} catch (IOException e) {
log.error("map file " + this.fileName + " Failed. ", e);
throw e;
} finally {
if (!ok && this.fileChannel != null) {
this.fileChannel.close();
}
}
初始化FileChannel、mappedByteBuffer 等。
1、3.1.3 appendMessagesInner 消息写入
public AppendMessageResult appendMessagesInner(final MessageExt messageExt, final AppendMessageCallback cb) {
assert messageExt != null;
assert cb != null;
int currentPos = this.wrotePosition.get(); // @1
if (currentPos < this.fileSize) {
ByteBuffer byteBuffer = writeBuffer != null ? writeBuffer.slice() : this.mappedByteBuffer.slice();
byteBuffer.position(currentPos);
AppendMessageResult result = null;
if (messageExt instanceof MessageExtBrokerInner) { // @2
result = cb.doAppend(this.getFileFromOffset(), byteBuffer, this.fileSize - currentPos, (MessageExtBrokerInner) messageExt);
} else if (messageExt instanceof MessageExtBatch) {
result = cb.doAppend(this.getFileFromOffset(), byteBuffer, this.fileSize - currentPos, (MessageExtBatch)messageExt);
} else {
return new AppendMessageResult(AppendMessageStatus.UNKNOWN_ERROR);
}
this.wrotePosition.addAndGet(result.getWroteBytes()); // @4
this.storeTimestamp = result.getStoreTimestamp();
return result;
}
log.error("MappedFile.appendMessage return null, wrotePosition: {} fileSize: {}", currentPos, this.fileSize);
return new AppendMessageResult(AppendMessageStatus.UNKNOWN_ERROR);
代码@1:获取当前写入位置。
代码@2:根据消息类型,是批量消息还是单个消息,进入相应的处理。
代码@3:消息写入实现。
接下看具体的消息写入逻辑,代码来源于 CommitLog$DefaultAppendMessageCallback。
public AppendMessageResult doAppend(final long fileFromOffset, final ByteBuffer byteBuffer, final int maxBlank,
final MessageExtBrokerInner msgInner) { //@1
// STORETIMESTAMP + STOREHOSTADDRESS + OFFSET <br>
// PHY OFFSET
long wroteOffset = fileFromOffset + byteBuffer.position();
this.resetByteBuffer(hostHolder, 8);
String msgId = MessageDecoder.createMessageId(this.msgIdMemory, msgInner.getStoreHostBytes(hostHolder), wroteOffset); //@2
// Record ConsumeQueue information //@3start
keyBuilder.setLength(0);
keyBuilder.append(msgInner.getTopic());
keyBuilder.append('-');
keyBuilder.append(msgInner.getQueueId());
String key = keyBuilder.toString();
Long queueOffset = CommitLog.this.topicQueueTable.get(key);
if (null == queueOffset) {
queueOffset = 0L;
CommitLog.this.topicQueueTable.put(key, queueOffset);
} //@3 end
// Transaction messages that require special handling //@4 start
final int tranType = MessageSysFlag.getTransactionValue(msgInner.getSysFlag());
switch (tranType) {
// Prepared and Rollback message is not consumed, will not enter the
// consumer queuec
case MessageSysFlag.TRANSACTION_PREPARED_TYPE:
case MessageSysFlag.TRANSACTION_ROLLBACK_TYPE:
queueOffset = 0L;
break;
case MessageSysFlag.TRANSACTION_NOT_TYPE:
case MessageSysFlag.TRANSACTION_COMMIT_TYPE:
default:
break;
} // @4 end
/**
* Serialize message
*/
final byte[] propertiesData =
msgInner.getPropertiesString() == null ? null : msgInner.getPropertiesString().getBytes(MessageDecoder.CHARSET_UTF8);
final int propertiesLength = propertiesData == null ? 0 : propertiesData.length;
if (propertiesLength > Short.MAX_VALUE) {
log.warn("putMessage message properties length too long. length={}", propertiesData.length);
return new AppendMessageResult(AppendMessageStatus.PROPERTIES_SIZE_EXCEEDED);
} //@5
final byte[] topicData = msgInner.getTopic().getBytes(MessageDecoder.CHARSET_UTF8);
final int topicLength = topicData.length;
final int bodyLength = msgInner.getBody() == null ? 0 : msgInner.getBody().length;
final int msgLen = calMsgLength(bodyLength, topicLength, propertiesLength); //@6
// Exceeds the maximum message
if (msgLen > this.maxMessageSize) { // @7
CommitLog.log.warn("message size exceeded, msg total size: " + msgLen + ", msg body size: " + bodyLength
+ ", maxMessageSize: " + this.maxMessageSize);
return new AppendMessageResult(AppendMessageStatus.MESSAGE_SIZE_EXCEEDED);
}
// Determines whether there is sufficient free space
if ((msgLen + END_FILE_MIN_BLANK_LENGTH) > maxBlank) { // @8
this.resetByteBuffer(this.msgStoreItemMemory, maxBlank);
// 1 TOTALSIZE
this.msgStoreItemMemory.putInt(maxBlank);
// 2 MAGICCODE
this.msgStoreItemMemory.putInt(CommitLog.BLANK_MAGIC_CODE);
// 3 The remaining space may be any value
//
// Here the length of the specially set maxBlank
final long beginTimeMills = CommitLog.this.defaultMessageStore.now();
byteBuffer.put(this.msgStoreItemMemory.array(), 0, maxBlank);
return new AppendMessageResult(AppendMessageStatus.END_OF_FILE, wroteOffset, maxBlank, msgId, msgInner.getStoreTimestamp(),
queueOffset, CommitLog.this.defaultMessageStore.now() - beginTimeMills);
}
// Initialization of storage space @9
this.resetByteBuffer(msgStoreItemMemory, msgLen);
// 1 TOTALSIZE
this.msgStoreItemMemory.putInt(msgLen);
// 2 MAGICCODE
this.msgStoreItemMemory.putInt(CommitLog.MESSAGE_MAGIC_CODE);
// 3 BODYCRC
this.msgStoreItemMemory.putInt(msgInner.getBodyCRC());
// 4 QUEUEID
this.msgStoreItemMemory.putInt(msgInner.getQueueId());
// 5 FLAG
this.msgStoreItemMemory.putInt(msgInner.getFlag());
// 6 QUEUEOFFSET
this.msgStoreItemMemory.putLong(queueOffset);
// 7 PHYSICALOFFSET
this.msgStoreItemMemory.putLong(fileFromOffset + byteBuffer.position());
// 8 SYSFLAG
this.msgStoreItemMemory.putInt(msgInner.getSysFlag());
// 9 BORNTIMESTAMP
this.msgStoreItemMemory.putLong(msgInner.getBornTimestamp());
// 10 BORNHOST
this.resetByteBuffer(hostHolder, 8);
this.msgStoreItemMemory.put(msgInner.getBornHostBytes(hostHolder));
// 11 STORETIMESTAMP
this.msgStoreItemMemory.putLong(msgInner.getStoreTimestamp());
// 12 STOREHOSTADDRESS
this.resetByteBuffer(hostHolder, 8);
this.msgStoreItemMemory.put(msgInner.getStoreHostBytes(hostHolder));
//this.msgBatchMemory.put(msgInner.getStoreHostBytes());
// 13 RECONSUMETIMES
this.msgStoreItemMemory.putInt(msgInner.getReconsumeTimes());
// 14 Prepared Transaction Offset
this.msgStoreItemMemory.putLong(msgInner.getPreparedTransactionOffset());
// 15 BODY
this.msgStoreItemMemory.putInt(bodyLength);
if (bodyLength > 0)
this.msgStoreItemMemory.put(msgInner.getBody());
// 16 TOPIC
this.msgStoreItemMemory.put((byte) topicLength);
this.msgStoreItemMemory.put(topicData);
// 17 PROPERTIES
this.msgStoreItemMemory.putShort((short) propertiesLength);
if (propertiesLength > 0)
this.msgStoreItemMemory.put(propertiesData);
final long beginTimeMills = CommitLog.this.defaultMessageStore.now();
// Write messages to the queue buffer
byteBuffer.put(this.msgStoreItemMemory.array(), 0, msgLen);
AppendMessageResult result = new AppendMessageResult(AppendMessageStatus.PUT_OK, wroteOffset, msgLen, msgId,
msgInner.getStoreTimestamp(), queueOffset, CommitLog.this.defaultMessageStore.now() - beginTimeMills); //@10
switch (tranType) {
case MessageSysFlag.TRANSACTION_PREPARED_TYPE:
case MessageSysFlag.TRANSACTION_ROLLBACK_TYPE:
break;
case MessageSysFlag.TRANSACTION_NOT_TYPE:
case MessageSysFlag.TRANSACTION_COMMIT_TYPE:
// The next update ConsumeQueue information
CommitLog.this.topicQueueTable.put(key, ++queueOffset);
break;
default:
break;
}
return result;
代码@1:参数详解
- fileFromOffset
该文件在整个文件序列中的偏移量。 - ByteBuffer byteBuffer
byteBuffer,NIO 字节容器。 - int maxBlank
最大可写字节数。 - MessageExtBrokerInner msgInner
消息内部封装实体。
代码@2:创建msgId,底层存储由16个字节表示,如下图:
代码@3:根据 topic-queryId 获取该队列的偏移地址(待写入的地址),如果没有,新增一个键值对,当前偏移量为 0。
代码@4:对事务消息需要单独特殊的处理(PREPARE,ROLLBACK类型的消息,不进入Consume队列)。
代码@5:消息的附加属性长度不能超过65536个字节。
代码@6:计算消息存储长度,消息存储格式:
代码@7:如果消息长度超过配置的消息总长度,则返回 MESSAGE_SIZE_EXCEEDED。
代码@8:如果该 MapperFile 中可剩余空间小于当前消息存储空间,返回END_OF_FILE。
代码@9:将消息写入MapperFile中(内存中)。
代码@10:重点讲解一下AppendMessageResult方法。
public AppendMessageResult(AppendMessageStatus status, long wroteOffset, int wroteBytes, String msgId,
long storeTimestamp, long logicsOffset, long pagecacheRT) {
this.status = status;
this.wroteOffset = wroteOffset;
this.wroteBytes = wroteBytes;
this.msgId = msgId;
this.storeTimestamp = storeTimestamp;
this.logicsOffset = logicsOffset;
this.pagecacheRT = pagecacheRT;
- AppendMessageStatus status
追加结果(成功,到达文件尾(文件剩余空间不足)、消息长度超过、消息属性长度超出、未知错误)。 - wroteOffset
消息的偏移量(相对于整个commitlog)。 - wroteBytes
消息待写入字节。 - msgId
消息ID。 - storeTimestamp
消息写入时间戳。 - logicsOffset
消息队列偏移量。 - pagecacheRT
消息写入时机戳(消息存储时间戳— 消息存储开始时间戳)。
然后返回 AppendMessageStatus,流程回到 【1.2.2 CommitLog.putMessage】的代码@6,如果返回结果是 OK 的话进入到代码@7,@8。
handleDiskFlush(result,*putMessageResult,*msg);***//*@7
我们先关注正常流程,然后再次梳理流程,特别关注异常流程,锁竞争等情况。
2、 消息刷盘
public void handleDiskFlush(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
// Synchronization flush
if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) { // @1
final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
if (messageExt.isWaitStoreMsgOK()) { // @2
GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
service.putRequest(request);
boolean flushOK = request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
if (!flushOK) {
log.error("do groupcommit, wait for flush failed, topic: " + messageExt.getTopic() + " tags: " + messageExt.getTags()
+ " client address: " + messageExt.getBornHostString());
putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_DISK_TIMEOUT);
}
} else { //@3
service.wakeup();
}
}
// Asynchronous flush //@4
else {
if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
flushCommitLogService.wakeup();
} else {
commitLogService.wakeup();
}
}
刷写磁盘支持同步、异步刷写,下文将分别重点阐述该这两种机制。
代码@1:同步刷写,这里有两种配置,是否一定要收到存储MSG信息,才返回,默认为true。
代码@2:如果要等待存储结果。
代码@3:唤醒同步刷盘线程。
代码@4:异步刷盘机制。
2.1 GroupCommitRequest (同步刷盘)
先来分析一下 GroupCommitRequest 。
- nextOffSet
下一个点的偏移量。 - countDownLatch
闭锁。 - flushOk
是否刷写成功。
GroupCommitService 同步刷盘服务类,一个线程一直的处理同步刷写任务,每处理一个循环后等待 10 毫秒,一旦新任务到达,立即唤醒执行任务。
看一下相关的核心方法:GroupCommitService.run方法。
GroupCommitService 的父类 ServiceThread 的 waitForRunning 方法。
从上面的代码看,我们可以把 doCommit 方法当成业务方法,在 run 方法的循环被调用,每执行完一次 doCommit 等待10毫秒,这也是 waitForRunning 的核心逻辑,doCommit 中的任务是通过调用如下方法:
private void doCommit() {
synchronized (this.requestsRead) {
if (!this.requestsRead.isEmpty()) {
for (GroupCommitRequest req : this.requestsRead) {
// There may be a message in the next file, so a maximum of
// two times the flush
boolean flushOK = false;
for (int i = 0; i < 2 && !flushOK; i++) {
flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();
if (!flushOK) {
CommitLog.this.mappedFileQueue.flush(0); // @1
}
}
req.wakeupCustomer(flushOK); //@2
}
long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
if (storeTimestamp > 0) {
CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
}
this.requestsRead.clear();
} else {
// Because of individual messages is set to not sync flush, it
// will come to this process
CommitLog.this.mappedFileQueue.flush(0);
}
}
代码@1:执行刷盘操作。
代码@2:唤醒用户线程。
刷盘具体实现:MappedFileQueue。
public boolean flush(final int flushLeastPages) {
boolean result = true;
MappedFile mappedFile = this.findMappedFileByOffset(this.flushedWhere, false); // @1
if (mappedFile != null) {
long tmpTimeStamp = mappedFile.getStoreTimestamp();
int offset = mappedFile.flush(flushLeastPages); //@2
long where = mappedFile.getFileFromOffset() + offset;
result = where == this.flushedWhere;
this.flushedWhere = where; //@3
if (0 == flushLeastPages) {
this.storeTimestamp = tmpTimeStamp;
}
}
return result;
代码@1:根据上次刷新的位置,得到当前的 MappedFile 对象。
代码@2:执行 MappedFile 的 flush 方法。
代码@3:更新上次刷新的位置。
MappedFile.flush方法详解:
/**
* @param flushLeastPages
* @return The current flushed position
*/
public int flush(final int flushLeastPages) {
if (this.isAbleToFlush(flushLeastPages)) { // @1
if (this.hold()) {
int value = getReadPosition();
try {
//We only append data to fileChannel or mappedByteBuffer, never both.
if (writeBuffer != null || this.fileChannel.position() != 0) {
this.fileChannel.force(false);
} else {
this.mappedByteBuffer.force();
}
} catch (Throwable e) {
log.error("Error occurred when force data to disk.", e);
}
this.flushedPosition.set(value);
this.release();
} else {
log.warn("in flush, hold failed, flush offset = " + this.flushedPosition.get());
this.flushedPosition.set(getReadPosition());
}
}
return this.getFlushedPosition();
刷写的实现逻辑就是调用 FileChannel 或 MappedByteBuffer 的force 方法。
在继续探讨异步刷盘前,在简单回顾一下消息存储的过程:
2.2 异步刷盘
异步刷盘机制,实现原理很简单,就是按照配置的周期定时提交信息到 MappedFile,定时刷写到磁盘,我们重点关注如下几个配置项。
相关服务类(线程)CommitLog$FlushRealTimeService 、CommitLog$CommitRealTimeService。
- commitIntervalCommitLog
CommitRealTimeService 线程的循环间隔,默认200ms。 - commitCommitLogLeastPages
每次提交到文件中,至少需要多少个页(默认4页)。 - flushCommitLogLeastPages
每次刷写到磁盘(commitlog),至少需要多个页(默认4页)。 - flushIntervalCommitLog
异步刷新线程,每次处理完一批任务后的等待时间,默认为500ms。
刷盘的逻辑在同步时已经讲解,现在我们重点看一下提交操作,参考 MappedFileQueue.commit 方法。
MappedFileQueue#commit
public boolean commit(final int commitLeastPages) {
boolean result = true;
MappedFile mappedFile = this.findMappedFileByOffset(this.committedWhere, false);
if (mappedFile != null) {
int offset = mappedFile.commit(commitLeastPages);
long where = mappedFile.getFileFromOffset() + offset;
result = where == this.committedWhere;
this.committedWhere = where;
}
return result;
}
MappedFile.commit:
public int commit(final int commitLeastPages) {
if (writeBuffer == null) {
//no need to commit data to file channel, so just regard wrotePosition as committedPosition.
return this.wrotePosition.get();
}
if (this.isAbleToCommit(commitLeastPages)) { //@1
if (this.hold()) {
commit0(commitLeastPages); //@2
this.release();
} else {
log.warn("in commit, hold failed, commit offset = " + this.committedPosition.get());
}
}
// All dirty data has been committed to FileChannel.
if (writeBuffer != null && this.transientStorePool != null && this.fileSize == this.committedPosition.get()) {
this.transientStorePool.returnBuffer(writeBuffer);
this.writeBuffer = null;
}
return this.committedPosition.get();
代码@1,看是否可以提交(符合最小需要提交的页)
protected boolean isAbleToCommit(final int commitLeastPages) {
int flush = this.committedPosition.get(); // @1
int write = this.wrotePosition.get(); // @2
if (this.isFull()) { //@3
return true;
}
if (commitLeastPages > 0) { //@4
return ((write / OS_PAGE_SIZE) - (flush / OS_PAGE_SIZE)) >= commitLeastPages;
}
return write > flush;
代码@1:上次刷新偏移量。
代码@2:当前写入偏移量。
代码@3:如果文件已满,返回true。
代码@4:如果commitLeastPages大于0,则需要判断当前写入的偏移与上次刷新偏移量之间的间隔,如果超过commitLeastPages页数,则提交,否则本次不提交。
代码@5,如果没有新的数据写入,本次提交任务结束。
commit0 方法详解:
protected void commit0(final int commitLeastPages) {
int writePos = this.wrotePosition.get();
int lastCommittedPosition = this.committedPosition.get();
if (writePos - this.committedPosition.get() > 0) {
try {
ByteBuffer byteBuffer = writeBuffer.slice();
byteBuffer.position(lastCommittedPosition);
byteBuffer.limit(writePos);
this.fileChannel.position(lastCommittedPosition);
this.fileChannel.write(byteBuffer);
this.committedPosition.set(writePos);
} catch (Throwable e) {
log.error("Error occurred when commit data to FileChannel.", e);
}
}
该方法的实现原理很简单,就不一一解释了。
3、主从同步机制
handleHA(result, putMessageResult, msg);
HA机制本文暂时跳过,后面专题分析。在这里我们只要知道消息同步发送机制时,会首先将消息发送给从,至于数据主从一致性等,下篇重点思考与分析。
再次回顾 ComitLog 消息存储:
ComitLog 存储,主要是先写入到 MappedFile(MappedByteBuffer或FileChannel中:内存中),此过程多个线程串行处理,然后根据不同的磁盘刷写方法进行刷盘操作、主从同步操作,最终返回结果。
4、异常流程
写入MappedByteBuffer阶段相关错误:
1、 END_OF_FILE;
首先我们知道,一个 MappedFile 对象映射一个 commitLog 文件,一个 commitlog 文件被映射为一个MappedFile(MappedByteBuffer), 由于消息不是定长的,故一个文件最后的剩余空间或许放不下一个消息,为了区分,在每一个commitlog 文件的最后会写入8个字节,表示文件的结束。如果一个文件最后的剩余空间无法存放一个消息时,会抛出END_OF_FILE 错误,此时会重新获取下一个MappedFile,再重新存入,故 END_OF_FILE 在业务层面不会抛出。
2、 MESSAGE_SIZE_EXCEEDED、PROPERTIES_SIZE_EXCEEDED、UNKNOWN_ERROR;
其他错误直接返回,封装成PutMessageStatus,有错误,直接将错误码返回给消息发送方。
总结
本文着重理解了消息存储到 CommitLog 文件的过程,大体分为三个步骤:1)消息追加,也就是将消息追加到 CommitLog 文件对应的内存映射区(本过程是加锁的,非并发;2)刷盘阶段(并发)就是将内存区数据刷写到磁盘文件(支持同步、异步刷盘);3)主从同步处理(并发)。
后续文章计划:
1、 消息消费机制,ConsumeQueue文件;
2、 消息主从同步及思考;
3、 事务消息实现原理;
4、 定时消息;
备注:本文是《RocketMQ技术内幕》的前期素材,建议关注笔者的书籍:《RocketMQ技术内幕》。