47、JVM实战:hs_err_pid

30–hs_err_pid


1、介绍

1、 当jvm出现致命错误时,会生成一个错误文件hs_err_pid.log;
2、 hs_err_pid.log文件默认会生成到工作目录下;
3、 hs_err_pid.log包括了导致jvm崩溃的重要信息,可以通过分析该文件定位到导致崩溃的根源,从而改善以保证系统稳定;

1.1、设置 hs_err_pid.log 文件的保存位置

-XX:ErrorFile=./hs_err_pid<pid>.log

1.2、 hs_err_pid.log 包含的 几类关键信息

1、 日志头文件;
2、 导致崩溃的线程信息;
3、 所有线程信息;
4、 安全点和锁信息;
5、 堆信息;
6、 本地代码缓存;
7、 编译事件;
8、 gc相关记录;
9、 jvm内存映射;
10、 jvm启动参数;
11、 服务器信息;

2、日志头文件

2.1、介绍

1、 日志头文件包含概要信息,简述了导致崩溃的原因;
2、 导致崩溃的原因很多,常见的原因有jvm自身的bug,应用程序错误,jvm参数配置不当,服务器资源不足,jni调用错误等;

2.2、字段解析

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fc23dd5dae6, pid=1, tid=0x00007fc2009fdb20
#
# JRE version: OpenJDK Runtime Environment (8.0_242-b08) (build 1.8.0_242-b08)
# Java VM: OpenJDK 64-Bit Server VM (25.242-b08 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea 3.15.0
# Distribution: Custom build (Wed Jan 29 10:43:50 UTC 2020)
# Problematic frame:
# J 77871 C2 com.hd.rcugrc.security.user.db.CachedUserInfoManagerImpl.getUser(Ljava/lang/String;Ljava/lang/String;)Lcom/hd/rcugrc/security/user/User; (104 bytes) @ 0x00007fc23dd5dae6 [0x00007fc23dd5da60+0x86]
#
# Core dump written. Default location: /opt/app/core or core.1
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
#   https://icedtea.classpath.org/bugzilla
#

2.2.1、内容

SIGSEGV (0xb) at pc=0x00007fc23dd5dae6, pid=1, tid=0x00007fc2009fdb20

SIGSEGV

1、 信号名称;
2、 除了"SIGSEGV(0xb)"以外,常见的描述有以下几种;

1、 “EXCEPTION_ACCESS_VIOLATION”:该描述表示jvm崩溃时正在执行jvm自身的代码,这往往是因为jvm的bug导致的崩溃;
2、 “EXCEPTION_STACK_OVERFLOW”:该描述表示这是个栈溢出导致的错误,这往往是应用程序中存在深层递归导致的;

0xb

信号码

pc=0x00007fc23dd5dae6

程序计数器的值

pid=1

进程ID

tid=0x00007fc2009fdb20

线程号

2.2.2、内容


# JRE version: OpenJDK Runtime Environment (8.0_242-b08) (build 1.8.0_242-b08)
# Java VM: OpenJDK 64-Bit Server VM (25.242-b08 mixed mode linux-amd64 compressed oops)

1、 JRE和JVM的版本信息;
2、 mixedmode:运行在mixed模式下;

2.2.3、内容

# Problematic frame:
# J 77871 C2 com.hd.rcugrc.security.user.db.CachedUserInfoManagerImpl.getUser(Ljava/lang/String;Ljava/lang/String;)Lcom/hd/rcugrc/security/user/User; (104 bytes) @ 0x00007fc23dd5dae6 [0x00007fc23dd5da60+0x86]

1、 这是问题帧的信息,这表示出现崩溃时jvm正在执行的代码,这里的"J"表示正在执行java代码,后面的表示执行的方法栈;
2、 "J"属于帧的一种类型;

帧的类型,包括如下

1、 C:本地C帧;
2、 j:解释的Java帧;
3、 V:虚拟机帧;
4、 v:虚拟机生成的存根栈帧;
5、 J:其他帧类型,包括编译后的Java帧;

3、导致 崩溃 的线程信息

Current thread (0x000055bf34361800):  JavaThread "http-nio-8888-exec-34" daemon [_thread_in_Java, id=535, stack(0x00007fc2008fd000,0x00007fc2009fdae0)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000000c

上面是导致崩溃 的线程信息和该线程栈信息

Current thread (0x000055bf34361800): JavaThread

1、 导致出错的线程是0x000055bf34361800(指针),线程类型是java线程;
2、 线程类型;

1、 JavaThread:表示执行的是java线程;
2、 VMThread:jvm的内部线程;
3、 CompilerThread:;

1.  用来调用JITing,实时编译装卸class。
2.  通常,jvm会启动多个线程来处理这部分工作,线程名称后面的数字也会累加,例如:CompilerThread1

4、 GCTaskThread:执行gc的线程;
5、 WatcherThread:;

1.  jvm周期性任务调度的线程,是一个单例对象。
2.  该线程在JVM内使用得比较频繁,比如:定期的内存监控、JVM运行状况监控,还有我们经常需要去执行一些jstat 这类命令查看gc的情况

6、 ConcurrentMarkSweepThread:;

1.  jvm在进行CMS GC的时候,会创建一个该线程去进行GC,该线程被创建的同时会创建一个SurrogateLockerThread(简称SLT)线程并且启动它,SLT启动之后,处于等待阶段。CMST开始GC时,会发一个消息给SLT让它去获取Java层Reference对象的全局锁:Lock
“http-nio-8888-exec-34”

表示线程名

daemon

1、 表示该线程为守护线程,再后面的"[_thread_in_Java"表示线程正在执行解释或者编译后的Java代码;
2、 daemon后面的值有以下几种;

1、 _thread_in_native:线程当前状态;
2、 _thread_uninitialized:线程还没有创建,它只在内存原因崩溃的时候才出现;
3、 _thread_new:线程已经被创建,但是还没有启动;
4、 _thread_in_native:线程正在执行本地代码,一般这种情况很可能是本地代码有问题;
5、 _thread_in_vm:线程正在执行虚拟机代码;
6、 _thread_in_Java:线程正在执行解释或者编译后的Java代码;
7、 _thread_blocked:线程处于阻塞状态;
8、 …_trans:以_trans结尾,线程正处于要切换到其它状态的中间状态;

id=535

表示线程ID

stack(0x00007fc2008fd000,0x00007fc2009fdae0)]

表示栈区间

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000000c

这部分是导致虚拟机终止的非预期的信号信息:其中si_signo和si_code是Linux下用来鉴别异常的,Windows下是一个ExceptionCode

4、所有线程信息

Java Threads: ( => current thread )
  0x000055bf37336800 JavaThread "ForkJoinPool.commonPool-worker-1" daemon [_thread_blocked, id=20558, stack(0x00007fc204bf8000,0x00007fc204cf8ae0)]
  0x000055bf3269b800 JavaThread "boundedElastic-171" daemon [_thread_blocked, id=20553, stack(0x00007fc1fde9a000,0x00007fc1fdf9aae0)]
  0x000055bf36df9000 JavaThread "pool-11-thread-286" [_thread_blocked, id=20544, stack(0x00007fc202e52000,0x00007fc202f52ae0)]
  0x000055bf35cdf000 JavaThread "boundedElastic-170" daemon [_thread_blocked, id=20114, stack(0x00007fc1fd595000,0x00007fc1fd695ae0)]
  0x000055bf36a11000 JavaThread "Timer-94" [_thread_blocked, id=20110, stack(0x00007fc1feca6000,0x00007fc1feda6ae0)]
  0x000055bf358bc800 JavaThread "Keep-Alive-Timer" daemon [_thread_blocked, id=19503, stack(0x00007fc204dfa000,0x00007fc204efaae0)]
  0x000055bf36342800 JavaThread "http-nio-8888-exec-174" daemon [_thread_blocked, id=15760, stack(0x00007fc1fad44000,0x00007fc1fae44ae0)]
  0x000055bf2eb5b800 JavaThread "http-nio-8888-exec-173" daemon [_thread_blocked, id=15759, stack(0x00007fc1faf46000,0x00007fc1fb046ae0)]
  0x000055bf3778f000 JavaThread "http-nio-8888-exec-172" daemon [_thread_blocked, id=15758, stack(0x00007fc1fb1d7000,0x00007fc1fb2d7ae0)]
  0x000055bf2e2fc000 JavaThread "http-nio-8888-exec-171" daemon [_thread_blocked, id=15757, stack(0x00007fc1fb2d8000,0x00007fc1fb3d8ae0)]
  ... ...(省略)
Other Threads:
  0x000055bf2635a800 VMThread [stack: 0x00007fc237f4b000,0x00007fc23804bae0] [id=14]
  0x000055bf27b82000 WatcherThread [stack: 0x00007fc232b9b000,0x00007fc232c9bae0] [id=33]

内容和第3点 内容的类似

5、安全点和锁信息

# 安全线信息为正常运行
VM state:not at safepoint (normal execution)

# 锁信息为未被线程持有
VM Mutex/Monitor currently owned by a thread: None
VM state:安全线信息

1、 notatasafepoint:正常运行状态;
2、 atsafepoint:所有线程都因为虚拟机等待状态而阻塞,等待一个虚拟机操作完成;
3、 synchronizing:一个特殊的虚拟机操作,要求虚拟机内的其它线程保持等待状态;

VM Mutex/Monitor:锁信息

1、 Mutex是虚拟机内部的锁;
2、 Monitor是synchronized锁或者其它关联到的Java对象;

6、堆信息

Heap:
 def new generation   total 1850624K, used 1161138K [0x0000000647800000, 0x00000006c5000000, 0x00000006c5000000)
  eden space 1645056K,  64% used [0x0000000647800000, 0x000000068842c580, 0x00000006abe80000)
  from space 205568K,  48% used [0x00000006b8740000, 0x00000006be9004b8, 0x00000006c5000000)
  to   space 205568K,   0% used [0x00000006abe80000, 0x00000006abe80000, 0x00000006b8740000)
 tenured generation   total 4112384K, used 1768079K [0x00000006c5000000, 0x00000007c0000000, 0x00000007c0000000)
   the space 4112384K,  42% used [0x00000006c5000000, 0x0000000730ea3f30, 0x0000000730ea4000, 0x00000007c0000000)
 Metaspace       used 384636K, capacity 428345K, committed 428504K, reserved 1431552K
  class space    used 39929K, capacity 46997K, committed 47064K, reserved 1048576K

Card table byte_map: [0x00007fc239314000,0x00007fc239ed9000] byte_map_base: 0x00007fc2360d8000

1、 堆信息包括:新生代、老生代、信息;
2、 "Cardtable"表示一种卡表,是jvm维护的一种数据结构,用于记录更改对象时的引用,以便gc时遍历更少的table和root;

7、本地代码缓存


CodeCache: size=245760Kb used=150371Kb max_used=181162Kb free=95388Kb
 bounds [0x00007fc23a299000, 0x00007fc2456b9000, 0x00007fc249299000]
 total_blobs=35255 nmethods=34107 adapters=1057
 compilation: enabled

1、 这是一块用于编译和保存本地代码的内存;
2、 注意是本地代码;

8、编译事件

Compilation events (10 events):
Event: 88196.068 Thread 0x000055bf27b58000 111971       3       java.util.Set::spliterator (6 bytes)
Event: 88196.069 Thread 0x000055bf27b58000 nmethod 111971 0x00007fc23ca56650 code [0x00007fc23ca567e0, 0x00007fc23ca56b78]
Event: 88196.071 Thread 0x000055bf27b53000 111972       4       sun.util.calendar.ZoneInfoFile::getZoneInfo (30 bytes)
Event: 88196.074 Thread 0x000055bf27b53000 nmethod 111972 0x00007fc23af41410 code [0x00007fc23af41580, 0x00007fc23af416c8]
Event: 88203.481 Thread 0x000055bf27b53000 111973       4       com.hd.bpm.engine.db.ReversePolishNotation::infixToRPN (242 bytes)
Event: 88203.688 Thread 0x000055bf27b53000 nmethod 111973 0x00007fc2416aae10 code [0x00007fc2416ab2e0, 0x00007fc2416af2b8]
Event: 88206.608 Thread 0x000055bf27b58000 111974       3       org.hibernate.action.internal.BulkOperationCleanupAction::getAfterTransactionCompletionProcess (9 bytes)
Event: 88206.608 Thread 0x000055bf27b58000 nmethod 111974 0x00007fc23c8b9790 code [0x00007fc23c8b9900, 0x00007fc23c8b9b70]
Event: 88206.614 Thread 0x000055bf27b58000 111975       3       org.hibernate.action.internal.BulkOperationCleanupAction$1::<init> (10 bytes)
Event: 88206.614 Thread 0x000055bf27b58000 nmethod 111975 0x00007fc23c2e6390 code [0x00007fc23c2e6500, 0x00007fc23c2e66d0]

1、 一共编译了10次;
2、 举例:可以看到com.hd.bpm.engine.db.ReversePolishNotation被编译了1次;

9、gc相关记录


GC Heap History (10 events):
Event: 87508.136 GC heap before
{Heap before GC invocations=498 (full 8):
 def new generation   total 1850624K, used 1750484K [0x0000000647800000, 0x00000006c5000000, 0x00000006c5000000)
  eden space 1645056K, 100% used [0x0000000647800000, 0x00000006abe80000, 0x00000006abe80000)
  from space 205568K,  51% used [0x00000006abe80000, 0x00000006b2575308, 0x00000006b8740000)
  to   space 205568K,   0% used [0x00000006b8740000, 0x00000006b8740000, 0x00000006c5000000)
 tenured generation   total 4112384K, used 1750262K [0x00000006c5000000, 0x00000007c0000000, 0x00000007c0000000)
   the space 4112384K,  42% used [0x00000006c5000000, 0x000000072fd3da70, 0x000000072fd3dc00, 0x00000007c0000000)
 Metaspace       used 384571K, capacity 428291K, committed 428504K, reserved 1431552K
  class space    used 39924K, capacity 46988K, committed 47064K, reserved 1048576K
Event: 87508.423 GC heap after
Heap after GC invocations=499 (full 8):
 def new generation   total 1850624K, used 97090K [0x0000000647800000, 0x00000006c5000000, 0x00000006c5000000)
  eden space 1645056K,   0% used [0x0000000647800000, 0x0000000647800000, 0x00000006abe80000)
  from space 205568K,  47% used [0x00000006b8740000, 0x00000006be610a10, 0x00000006c5000000)
  to   space 205568K,   0% used [0x00000006abe80000, 0x00000006abe80000, 0x00000006b8740000)
 tenured generation   total 4112384K, used 1755185K [0x00000006c5000000, 0x00000007c0000000, 0x00000007c0000000)
   the space 4112384K,  42% used [0x00000006c5000000, 0x000000073020c6b0, 0x000000073020c800, 0x00000007c0000000)
 Metaspace       used 384571K, capacity 428291K, committed 428504K, reserved 1431552K
  class space    used 39924K, capacity 46988K, committed 47064K, reserved 1048576K
}
... ...(省略)

1、 可以看到gc次数为10次(fullgc);
2、 这里描述了每次gc前后的内存信息,这里没有内存不足等问题;

10、jvm内存映射

1、 下面是jvm加载的库信息,这些信息是虚拟机崩溃时的虚拟内存列表区域;
2、 它可以告诉你崩溃原因时哪些类库正在被使用,位置在哪里,还有堆栈和守护页信息;

Dynamic libraries:
00400000-00401000 r-xp 00000000 08:02 39454583                           /home/service/jdk1.7.0_55/bin/java
00600000-00601000 rw-p 00000000 08:02 39454583                           /home/service/jdk1.7.0_55/bin/java
013cd000-013ee000 rw-p 00000000 00:00 0                                  [heap]
6f0000000-800000000 rw-p 00000000 00:00 0 
3056400000-3056416000 r-xp 00000000 08:02 57409539                       /lib64/libgcc_s-4.4.7-20120601.so.1
3056416000-3056615000 ---p 00016000 08:02 57409539                       /lib64/libgcc_s-4.4.7-20120601.so.1
3056615000-3056616000 rw-p 00015000 08:02 57409539                       /lib64/libgcc_s-4.4.7-20120601.so.1
353be00000-353be20000 r-xp 00000000 08:02 57409933                       /lib64/ld-2.12.so
353c01f000-353c020000 r--p 0001f000 08:02 57409933                       /lib64/ld-2.12.so
353c020000-353c021000 rw-p 00020000 08:02 57409933                       /lib64/ld-2.12.so
... ...(省略)

00400000-00401000

内存区域

r-xp

1、 权限;
2、 r/w/x/p/s分别表示读/写/执行/私有/共享;

00000000

文件内的偏移量

08:02

1、 文件位置的majorID和minorID;

39454583

索引节点号

/home/service/jdk1.7.0_55/bin/java

文件位置

11、jvm启动参数

# jvm参数
VM Arguments:
jvm_args: -Xms6024m -Xms6024m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintHeapAtGC -XX:+PrintTLAB -XX:+HeapDumpOnOutOfMemoryError -XX:ErrorFile=/var/log/xcoa/hs_err_pid_%p.log -Xloggc:/var/log/xcoa/gc-2022-12-02_20-06-51-100.100.17.7.log -XX:HeapDumpPath=/var/log/xcoa/oom-2022-12-02_20-06-51-100.100.17.7.dump -Dspring.profiles.active=redis_sentinel,workflow_save_mq,security_oauth,search_es_service -Dspring.cloud.consul.host=crld-consul01 -Dspring.cloud.consul.config.prefix=xcoa/dev -javaagent:/pt-agent/pinpoint-bootstrap.jar -Dpinpoint.agentId=100.100.17.7 -Dpinpoint.applicationName=crld-oa-server-prod -Duser.timezone=GMT+8 -Dlogging.file.name=/var/log/xcoa/hd-2022-12-02_20-06-51-100.100.17.7.log -Dlogging.file.path=/var/log/xcoa/ 
java_command: org.springframework.boot.loader.WarLauncher
java_class_path (initial): /opt/app:/pt-agent/pinpoint-bootstrap.jar
Launcher Type: SUN_STANDARD

# 系统的环境配置。
Environment Variables:
JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk
CLASSPATH=
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/jvm/java-1.8-openjdk/bin
LD_LIBRARY_PATH=/usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server:/usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64:/usr/lib/jvm/java-1.8-openjdk/jre/../lib/amd64

12、服务器信息

# 内存信息,主要关注下swap信息,看看有没有使用虚拟内存
/proc/meminfo:
MemTotal:       65916492 kB
MemFree:        14593468 kB
Buffers:          222452 kB
Cached:         28502452 kB
SwapTotal:             0 kB
SwapFree:              0 kB
... ...(省略)

# cpu信息
/proc/cpuinfo:
processor   : 0
vendor_id   : GenuineIntel
cpu family : 6
model       : 62
model name : Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
stepping    : 4
... ...(省略)

13、寄存器上下文


Registers:
RAX=0x00000006c5005228, RBX=0x00000000dcbec61f, RCX=0x000000067b88ff08, RDX=0x00000006c5005228
RSP=0x00007fc2009fa4a0, RBP=0x00000006c5005228, RSI=0x00000006ca8055c0, RDI=0x00000000974b19c3
R8 =0x0000000000000006, R9 =0x00000006c5005228, R10=0x000000000000000a, R11=0x0000000000000000
R12=0x0000000000000000, R13=0x000000067b88ff08, R14=0x0000000000000003, R15=0x000055bf34361800
RIP=0x00007fc23dd5dae6, EFLAGS=0x0000000000010202, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e

14、栈顶程序计数器旁的操作码

Top of Stack: (sp=0x00007fc2009fa4a0)
0x00007fc2009fa4a0:   00000006ca8055c0 00007fc23cc5ddb4
0x00007fc2009fa4b0:   00000006883ed7d0 00000006883edb20
0x00007fc2009fa4c0:   00000006ca8056d0 00007fc23c755ab4
0x00007fc2009fa4d0:   00007fc2009fa540 00007fc23a2a0d80
0x00007fc2009fa4e0:   00000006ca8056d0 00007fc242aeacd4
0x00007fc2009fa4f0:   00007fc23a2a0d80 00000006883ed7d0
0x00007fc2009fa500:   00000000d9500ab8 00007fc242ffabf0
0x00007fc2009fa510:   00000006d1982fd8 00007fc2cf711fe1
0x00007fc2009fa520:   00007fc100000000 00000006883cb590
0x00007fc2009fa530:   0000000000000000 00007fc2009fa580
0x00007fc2009fa540:   00000000d1079670 00007fc23bdab304
0x00007fc2009fa550:   deaddeaf00000003 00000006883ed7d0
Instructions: (pc=0x00007fc23dd5dae6)
0x00007fc23dd5dac6:   45 85 d2 0f 8c 9a 0c 00 00 48 85 c9 0f 85 b9 0c
0x00007fc23dd5dad6:   00 00 49 bd 10 3b 00 c5 06 00 00 00 45 8b 5d 0c
0x00007fc23dd5dae6:   47 8b 54 dc 0c 45 03 d0 41 8b d2 83 c2 04 85 d2
0x00007fc23dd5daf6:   0f 8c 6d 0c 00 00 81 fa 00 00 08 00 0f 87 59 0c 

栈顶程序计数器旁的操作码,可以被反汇编成系统崩溃前执行的指令。

15、寄存器和内存映射信息

Register to memory mapping:

RAX=0x00000006c5005228 is an oop
java.lang.String 
 - klass: 'java/lang/String'
RBX=0x00000000dcbec61f is an unknown value
RCX=
[error occurred during error reporting (printing register info), id 0xb]

16、信号句柄

对于Linux 下的信号机制

Signal Handlers:
SIGSEGV: [libjvm.so+0x5fcb50], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGBUS: [libjvm.so+0x5fcb50], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGFPE: [libjvm.so+0x535b51], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGPIPE: SIG_IGN, sa_mask[0]=00000000000000000000000000000000, sa_flags=none
SIGXFSZ: [libjvm.so+0x535b51], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGILL: [libjvm.so+0x535b51], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGUSR1: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none
SIGUSR2: [libjvm.so+0x535cf9], sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO
SIGHUP: [libjvm.so+0x537d11], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGINT: [libjvm.so+0x537d11], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGTERM: [libjvm.so+0x537d11], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGQUIT: [libjvm.so+0x537d11], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO