文章目录
Android 监控 Crash
从Android系统全局来看,Crash通常分为App/Framework Crash,Native Crash,以及Kernel Crash。
- 对于App层或者Framework层的Crash(也就是Java层面的Crash),主要是出现了未捕获异常,导致程序异常退出
- Native Crash,介于系统framework层与Linux层之间的一层,一般是因为Native层代码触发了系统的signal信号,导致程序异常退出
- Kernel Crash,很多情况是发生kernel panic,对于崩溃往往是驱动或者硬件出现故障
这里主要是分析Java层的Crash和Native层的Crash。这两种Crash的监控和获取堆栈信息的不同。
Java Crash
Java的Crash监控相对比较容易,Java中的Thread定义了一个接口:uncaughtExceptionHandler;用于出于未捕获的异常导致线程的终止(这里catch了是捕获不到的),当我们的应用crash的时候,就会回调走uncaughtExceptionHandler.uncaughtException,在该方法中可以获取到异常的信息,通过Thread.setDefaultUncaughtExceptionHandler该方法来设置线程的默认异常处理器,可以将异常信息保存到本地或者上传到服务器,方便我们快速的定位问题。
class MyCrashHandle : Thread.UncaughtExceptionHandler{
private lateinit var defaultUncaughtExceptionHandler:Thread.UncaughtExceptionHandler
private lateinit var mContext:Context
fun init(context: Context){
mContext = context
defaultUncaughtExceptionHandler = Thread.getDefaultUncaughtExceptionHandler()!!
Thread.setDefaultUncaughtExceptionHandler(this)
}
//没有try catch导致程序崩溃的异常,在这里接收
//1.崩溃的线程2,堆栈信息
override fun uncaughtException(t: Thread, e: Throwable) {
val dir = File(mContext.externalCacheDir,"crash_info")//创建私有目录
if(!dir.exists()){
dir.mkdirs()
}
val time = System.currentTimeMillis()
val file = File(dir,"${time}.txt")
val pw = PrintWriter(FileWriter(file))
pw.println("time.xxx") //崩溃时间
pw.println("thread: " + t.name) //崩溃的线程
e.printStackTrace(pw) //错误堆栈
pw.flush()
pw.close()
//采集到信息后,让系统默认的killHandler处理,弹出对话框,退出程序
if(defaultUncaughtExceptionHandler!=null){
defaultUncaughtExceptionHandler.uncaughtException(t,e)
}
}
}
生成的本地文件
Java捕获未处理异常机制
通过打印系统默认的java未捕获异常处理,可以看到代码在Runtimeinit的killApplicationHandler
RuntimeInit.java
@UnsupportedAppUsage
protected static final void commonInit() {
if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");
/*
* set handlers; these apply to all threads in the VM. Apps can replace
* the default handler, but not the pre handler.
*/
LoggingHandler loggingHandler = new LoggingHandler();
RuntimeHooks.setUncaughtExceptionPreHandler(loggingHandler);
Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));
/*
* Install a time zone supplier that uses the Android persistent time zone system property.
*/
RuntimeHooks.setTimeZoneIdSupplier(() -> SystemProperties.get("persist.sys.timezone"));
/*
* Sets handler for java.util.logging to use Android log facilities.
* The odd "new instance-and-then-throw-away" is a mirror of how
* the "java.util.logging.config.class" system property works. We
* can't use the system property here since the logger has almost
* certainly already been initialized.
*/
LogManager.getLogManager().reset();
new AndroidConfig();
/*
* Sets the default HTTP User-Agent used by HttpURLConnection.
*/
String userAgent = getDefaultUserAgent();
System.setProperty("http.agent", userAgent);
/*
* Wire socket tagging to traffic stats.
*/
TrafficStats.attachSocketTagger();
initialized = true;
}
private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
private final LoggingHandler mLoggingHandler;
/**
* Create a new KillApplicationHandler that follows the given LoggingHandler.
* If {@link #uncaughtException(Thread, Throwable) uncaughtException} is called
* on the created instance without {@code loggingHandler} having been triggered,
* {@link LoggingHandler#uncaughtException(Thread, Throwable)
* loggingHandler.uncaughtException} will be called first.
*
* @param loggingHandler the {@link LoggingHandler} expected to have run before
* this instance's {@link #uncaughtException(Thread, Throwable) uncaughtException}
* is being called.
*/
public KillApplicationHandler(LoggingHandler loggingHandler) {
this.mLoggingHandler = Objects.requireNonNull(loggingHandler);
}
@Override
public void uncaughtException(Thread t, Throwable e) {
try {
ensureLogging(t, e);
// Don't re-enter -- avoid infinite loops if crash-reporting crashes.
if (mCrashing) return;
mCrashing = true;
// Try to end profiling. If a profiler is running at this point, and we kill the
// process (below), the in-memory buffer will be lost. So try to stop, which will
// flush the buffer. (This makes method trace profiling useful to debug crashes.)
if (ActivityThread.currentActivityThread() != null) {
ActivityThread.currentActivityThread().stopProfiling();
}
// Bring up crash dialog, wait for it to be dismissed
ActivityManager.getService().handleApplicationCrash(
mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
} catch (Throwable t2) {
if (t2 instanceof DeadObjectException) {
// System process is dead; ignore
} else {
try {
Clog_e(TAG, "Error reporting crash", t2);
} catch (Throwable t3) {
// Even Clog_e() fails! Oh well.
}
}
} finally {
// Try everything to make sure this process goes away.
Process.killProcess(Process.myPid());
System.exit(10);
}
}
private void ensureLogging(Thread t, Throwable e) {
if (!mLoggingHandler.mTriggered) {
try {
mLoggingHandler.uncaughtException(t, e);
} catch (Throwable loggingThrowable) {
// Ignored.
}
}
}
生成到对应错误堆栈信息后,再结合mapping混淆文件,就可以比较方便快速的定位问题。
Natvie Crash
Linux信号机制
信号机制是Linux进程间通信的一种重要方式,Linux信号一方面用于正常的进程间通信和同步,另一方面它还负责监控系统异常及中断。当应用程序运行异常时,Linux内核将产生错误信号并通知给当前进程。当前进程在接收到该错误信号后,有三种不同的处理方式
- 忽略该信号
- 捕获该信号并执行对应的信号处理程序
- 执行该信号的缺省操作(默认操作,如终止进程)
当Linux应用程序在执行过程时发生严重错误,一般会导致程序崩溃。其中,Linux专门提供了一类crash信号,在程序接受到此信号的时候,缺省操作就是将崩溃的现场信息记录到核心文件(coredump是Linux内核提供的一个原生机制,当程序收到某些致命信号,系统会将当前进程的内存状态完整保留到磁盘上,形成一个二进制文件),然后终止进程。
常见崩溃信号列表:
信号 | 描述 |
---|---|
SIGSEGV(Segment Violation) | 段错误,内存引用无效(如空指针,野指针) |
SIGBUS(Bus Error) | 总线错误,访问内存对象的未定义部分 |
SIGFPE(Floating Point Exception) | 算术运算错误,如除以0 |
SIGILL(IIIegal Instruction) | 非法指令,执行垃圾或特权指令 |
SIGSYS(Bad System Call) | 错误系统调用 |
SIGXCPU(CPU Time Limit Exceeded) | CPU时间超限 |
SIGXFSZ(File Size Limit Exceeded) | 文件大小超限 |
一般的出现崩溃信号,Android系统默认操作是直接退出我们的程序。但是系统允许我们给某一个进程的某一个特定的信号注册一个相应的处理函数(signal),也就对信号的默认处理动作进行修改。因此在对Native层NDK Crash的监控可以采用这种信号机制,捕获崩溃信号执行我们自己的信号处理函数,有点类式上面的java异常捕获。
breakPad
Google breakpad是一个跨平台的崩溃转储和分析框架的工具集合,breakpad在Linux中的实现就是借助Linux信号捕获机制实现的。因为其实现为c/c++,因此在Android是使用需要借助NDK工具。
开源地址:https://github.com/google/breakpad。
Breakpad主要由三部分组成:
- Client,编译进入项目中,随着项目一起编译发布,发布出去的so是strip掉debug信息的,当在用户手机上崩溃的时候,生成minidump文件
- Symbol Dumper(dump_syms工具),当在编译so的时候,除了编译strip后的so文件,还得保留strip前的so。dump_syms就是用来从strip前的so提取符号信息.sym文件(BreakPad格式的符号文件)
- minidump Processor(minidump_stackwalk工具),这个工具通过minidump_stackwalk指令,从.sym符号文件和包含崩溃信息的minidump文件中提取出完整的崩溃时的堆栈信息,最后生成我们可读的C/C++堆栈
minidump文件:
minidump是一种轻量级的崩溃转储格式,由微软提出并广泛应用于Windows平台上的应用程序崩溃处理机制中,相比于完整的core dump,minidump只记录了必要的调试信息,以减少文件大小。不同平台生成的dump格式不同分析工具不同,minidump可以实现各个平台dump文件统一。
引入项目
将Breakpad源码下载解压,查看README.ANDROID文件
打开README.ANDROID
1.导入android/google_breakpad/Android.mk中
2.在自己的Android.mk中加入这些breadpad依赖
LOCAL_PATH := $(call my-dir)/../..
# Defube the client library module, as a simple static library that
# exports the right include path / linker flags to its users.
include $(CLEAR_VARS)
#最后编译出 libbreakpad_client.a
LOCAL_MODULE := breakpad_client
# 指定c++源文件后缀名
LOCAL_CPP_EXTENSION := .cc
# 强制关键系统以32位arm模式生成模块的对象文件
LOCAL_ARM_MODE := arm
# 需要编译的源文件
LOCAL_SRC_FILES := \
src/client/linux/crash_generation/crash_generation_client.cc \
src/client/linux/dump_writer_common/thread_info.cc \
src/client/linux/dump_writer_common/ucontext_reader.cc \
src/client/linux/handler/exception_handler.cc \
src/client/linux/handler/minidump_descriptor.cc \
src/client/linux/log/log.cc \
src/client/linux/microdump_writer/microdump_writer.cc \
src/client/linux/minidump_writer/linux_dumper.cc \
src/client/linux/minidump_writer/linux_ptrace_dumper.cc \
src/client/linux/minidump_writer/minidump_writer.cc \
src/client/linux/minidump_writer/pe_file.cc \
src/client/minidump_file_writer.cc \
src/common/convert_UTF.cc \
src/common/md5.cc \
src/common/string_conversion.cc \
src/common/linux/breakpad_getcontext.S \
src/common/linux/elfutils.cc \
src/common/linux/file_id.cc \
src/common/linux/guid_creator.cc \
src/common/linux/linux_libc_support.cc \
src/common/linux/memory_mapped_file.cc \
src/common/linux/safe_readlink.cc
# 导入头文件
LOCAL_C_INCLUDES := $(LOCAL_PATH)/src/common/android/include \
$(LOCAL_PATH)/src \
$(LSS_PATH)
#使用android ndk中的日志库log
LOCAL_EXPORT_C_INCLUDES := $(LOCAL_C_INCLUDES)
LOCAL_EXPORT_LDLIBS := -llog
# 编译static静态库,类似java的jar包
include $(BUILD_STATIC_LIBRARY)
但是目前NDK默认的构建工具是CMake,通过CMake的方式来添加breakpad,而不是早期Android.mk的方式
通过对照Android.mk文件,我们在自己项目的cpp(也就是工程目录中c/c++源码)目录下创建breakpad目录,并将下载的breakpad源码根目录下的src目录(因此引用breakpad的源文件都在src目录中)复制到我们的项目中:
然后在breakpad目录下创建CMakeList.txt文件
cmake_minimum_required(VERSION 3.22.1)
#头文件目录(这样在.cc文件中导入include文件,就可以省略写这个路径)
include_directories(src src/common/android/include)
#开启arm汇编支持,因为导入源文件中有.s文件(汇编源码)
enable_language(ASM)
# 生成libbreakpad.a并指定源码,对应android.mk中的LOCAL_SRC_FILES加上LOCAL_MODULE
add_library( breakpad STATIC
src/client/linux/crash_generation/crash_generation_client.cc
src/client/linux/dump_writer_common/thread_info.cc
src/client/linux/dump_writer_common/ucontext_reader.cc
src/client/linux/handler/exception_handler.cc
src/client/linux/handler/minidump_descriptor.cc
src/client/linux/log/log.cc
src/client/linux/microdump_writer/microdump_writer.cc
src/client/linux/minidump_writer/linux_dumper.cc
src/client/linux/minidump_writer/linux_ptrace_dumper.cc
src/client/linux/minidump_writer/minidump_writer.cc
src/client/linux/minidump_writer/pe_file.cc
src/client/minidump_file_writer.cc
src/common/convert_UTF.cc
src/common/md5.cc
src/common/string_conversion.cc
src/common/linux/breakpad_getcontext.S
src/common/linux/elfutils.cc
src/common/linux/file_id.cc
src/common/linux/guid_creator.cc
src/common/linux/linux_libc_support.cc
src/common/linux/memory_mapped_file.cc
src/common/linux/safe_readlink.cc
)
# 链接ndk中的log,对应android.mk中 LOCAL_EXPORT_LDLIBS
target_link_libraries(breakpad log)
在Cpp目录下(breakpad同级)还有一个CMakeList.text文件
cmake_minimum_required(VERSION 3.22.1)
project("ndkcrash")
#因为ndkcrash.cpp用到#include "breakpad/src/client/linux/handler/minidump_descriptor.h"
##include "breakpad/src/client/linux/handler/exception_handler.h",所以也要导入
include_directories(breakpad/src breakpad/src/common/android/include)
#引入breakpad的cmakeList,执行并且生成libbreakpad.a(api的实现,类似java的jar包)
add_subdirectory(breakpad)
# 编译之后会生成一个lib${CMAKE_PROJECT_NAME}.so的文件,SHARED生成so,STATIC生成a,生成对应.so文件的源文件为ndkcrash.cpp
add_library(${CMAKE_PROJECT_NAME} SHARED
ndkcrash.cpp)
# 链接android log,breadpad库到项目中,这样ndkcrash.cpp文件就能用库里的代码
target_link_libraries(${CMAKE_PROJECT_NAME}
android
log
breakpad
)
ndk_crash.cpp源文件中的实现为:
#include <jni.h>
#include <string>
#include <android/log.h>
#include "client/linux/handler/minidump_descriptor.h"
#include "client/linux/handler/exception_handler.h"
extern "C" JNIEXPORT jstring JNICALL
Java_com_example_ndkcrash_NativeLib_stringFromJNI(
JNIEnv* env,
jobject /* this */) {
std::string hello = "Hello from C++";
return env->NewStringUTF(hello.c_str());
}
extern "C"
JNIEXPORT void JNICALL
Java_com_example_ndkcrash_NativeCrashHandle_testNativeCrash(JNIEnv *env, jobject thiz) {
__android_log_print(ANDROID_LOG_INFO,"native","xxxxxxxxxx");
int * p = NULL;
*p = 10;
}
bool DumpCallback(const google_breakpad::MinidumpDescriptor& descriptor,
void* context,
bool succeeded) {
//printf("Dump path: %s\n", descriptor.path());
__android_log_print(ANDROID_LOG_INFO,"native","path:= %s",descriptor.path());
//如果回调返回true,Breakpad将把异常视为已完全处理,禁止任何其他处理程序收到异常通知。
//如果回调返回false,Breakpad会将异常视为未处理,并允许其他处理程序处理它。
return false;
}
extern "C"
JNIEXPORT void JNICALL
Java_com_example_ndkcrash_NativeCrashReport_initNativeCrash(JNIEnv *env, jobject thiz,
jstring path_) {
//拿到java传递过来的path
const char * path = env->GetStringUTFChars(path_,0);
//开启breakpad crash监控
google_breakpad::MinidumpDescriptor descriptor(path);
//官方demo没用到static,我们这里要加static让其生命周期在程序运行期间,不加的话当initNativeCrash方法执行完后,会执行对应的析构函数,关闭监控
static google_breakpad::ExceptionHandler eh(descriptor, nullptr, DumpCallback,
nullptr, true, -1);
env->ReleaseStringUTFChars(path_,path);
}
JNI方法的方法名对应了Java类,创建源文件com_example_ndkcrash_NativeCrashReport
package com.example.ndkcrash
import android.content.Context
import java.io.File
class NativeCrashReport {
companion object {
// Used to load the 'ndkcrash' library on application startup.
init {
//会生成ndkcrash.so文件,引入这个供java调用
System.loadLibrary("ndkcrash")
}
}
fun init(context:Context){
// 给定ndk crash监控的存储路径
val applicationContext = context.applicationContext
val file = File(applicationContext.externalCacheDir,"native_crash")
if(!file.exists()){
file.mkdirs()
}
initNativeCrash(file.absolutePath)
}
external fun initNativeCrash(path:String)
}
这个时候如果出现NDK Crash,会在指定的目录:/sdcard/Android/Data/[packageName]/cache/native_crash下生成NDK Crash信息的文件
Crash解析
采集到的Crash信息记录在minidump文件中。minidump是由微软开发的用于崩溃上传的文件格式。我们可以通过将此文件上传到服务器完成上报,但是这个文件是没有可读性的,要将文件解析为可读的崩溃堆栈需要按照breakpad文档编译minidump_stackwalk工具。在AndroidStudio的安装目录下的plugins\android-ndk\resources\lldb\bin里就存在一个对应平台的minidump_stackwalk。
解析miniudump文件
1.dump_syms提取not striped so库的符号信息
./dump_syms libnativelib.so > libnativelib.so.sym
2.根据1中生成的libbreakpad-core.so.sym生成特定的目录结构
├── symbol
│ └── libnativelib.so
│ └── 7101a9976989ac0174715fca686ca5ed0
│ └── libnativelib.so.sym
命令如下:
head -n1 libnativelib.so.sym
MODULE Linux arm64 7101A9976989AC0174715FCA686CA5ED0 libnativelib.so
mkdir -p ./symbol/libnativelib.so/7101A9976989AC0174715FCA686CA5ED0
mv libnativelib.so.sym ./symbol/libnativelib.so/7101A9976989AC0174715FCA686CA5ED0/
3.再利用minidump_stackwalk命令,将dmp文件和sym文件合成可读的crashinfo.txt
./minidump_stackwalk _ccrash.dmp ./symbol > crashinfo.txt
大概内容
Operating system: Android
0.0.0 Linux 5.4.86-qgki-ga5eec0eb1e4c #1 SMP PREEMPT Wed Apr 13 23:55:10 CST 2022 aarch64
CPU: arm64
8 CPUs
GPU: UNKNOWN
Crash reason: SIGSEGV /0x00000000
Crash address: 0x0
Process uptime: not available
Thread 0 (crashed)
0 libnativelib.so!Java_com_example_nativelib_NativeLib_mockNativeCrash [nativelib.cpp : 20]
......
1 libnativelib.so!Java_com_example_nativelib_NativeLib_mockNativeCrash [nativelib.cpp : 18]
......
2 libart.so + 0xd7644
......
......
使用dump_syms的时候有个问题,Android是属于Linux系统的,所以编译得到的dump_syms必须在Linux系统上运行。但是大部分时候是没有在Linux环境上运行dump_syms的。有两种办法,一种是提供各个平台通用的符号表提取工具;第二种是使用minidump_stackwalk解析dmp时,先不指定symbol文件,当产生崩溃时,用户想要查看崩溃信息,此时再通过addr2line 进行解析。
使用这里的工具执行:
minidump_stackwalk xxxx.dump > crash.txt
生成crash.txt打开内容
Operating system: Android
0.0.0 Linux 6.6.30-android15-7-gbb616d66d8a9-ab11968886 #1 SMP PREEMPT Thu Jun 13 23:09:10 UTC 2024 x86_64
CPU: amd64 //abi类型
family 6 model 6 stepping 3
4 CPUs
GPU: UNKNOWN
Crash reason: SIGSEGV //段错误,内容引用无效
Crash address: 0x0
Process uptime: not available
Thread 0 (crashed) //出现crash的线程
0 base.apk + 0x4ad46 //16进制数 寄存器的信息
rax = 0x0000000000000000 rdx = 0x0000000000000002
rcx = 0x26ddd657b175b526 rbx = 0x00007f6778919380
rsi = 0x00007ffc5b6e1bd0 rdi = 0x00007ffc5b6e2fbc
rbp = 0x00007ffc5b6e3440 rsp = 0x00007ffc5b6e3420
r8 = 0x00007ffc5b787080 r9 = 0x00007ffc5b7870a0
r10 = 0x0000000000494e22 r11 = 0x0000000000000202
r12 = 0x00007ffc5b6e3620 r13 = 0x00007f6778919428
r14 = 0x00007ffc5b6e3950 r15 = 0x00007ffc5b6e3950
rip = 0x00007f6660394d46
Found by: given as instruction pointer in context
1 libart.so + 0x2b5ec
rbp = 0x00007ffc5b6e3460 rsp = 0x00007ffc5b6e3450
rip = 0x00007f666482b5ec
Found by: previous frame's frame pointer
2 libart.so + 0x2b59c
rsp = 0x00007ffc5b6e34a0 rip = 0x00007f666482b59c
Found by: stack scanning
3 boot.art] + 0x1f8e00
rsp = 0x00007ffc5b6e34b8 rip = 0x00000000701d0e00
Found by: stack scanning
4 libart.so + 0x12155
rsp = 0x00007ffc5b6e3530 rip = 0x00007f6664812155
Found by: stack scanning
5 libart.so + 0x274bf6
rsp = 0x00007ffc5b6e3590 rip = 0x00007f6664a74bf6
Found by: stack scanning
6 libart.so + 0x3f92f4
rsp = 0x00007ffc5b6e3620 rip = 0x00007f6664bf92f4
Found by: stack scanning
7 libart.so + 0x3f8bbe
rsp = 0x00007ffc5b6e3670 rip = 0x00007f6664bf8bbe
Found by: stack scanning
8 libart.so + 0x58f940
rsp = 0x00007ffc5b6e3680 rip = 0x00007f6664d8f940
Found by: stack scanning
9 libart.so + 0x33102
rsp = 0x00007ffc5b6e37a0 rip = 0x00007f6664833102
Found by: stack scanning
10 libart.so + 0x2a6cab
rsp = 0x00007ffc5b6e37e0 rip = 0x00007f6664aa6cab
Found by: stack scanning
11 libart.so + 0x2a64ed
rsp = 0x00007ffc5b6e3820 rip = 0x00007f6664aa64ed
Found by: stack scanning
12 libart.so + 0x2a6036
rsp = 0x00007ffc5b6e3860 rip = 0x00007f6664aa6036
Found by: stack scanning
13 libart.so + 0x560285
rsp = 0x00007ffc5b6e38e0 rip = 0x00007f6664d60285
Found by: stack scanning
14 core-oj.jar + 0x10e9ec
rsp = 0x00007ffc5b6e38f0 rip = 0x00007f666410e9ec
Found by: stack scanning
15 libart.so + 0x2d2e6
rsp = 0x00007ffc5b6e3930 rip = 0x00007f666482d2e6
Found by: stack scanning
16 libart.so + 0x3f1dea
rsp = 0x00007ffc5b6e3940 rip = 0x00007f6664bf1dea
Found by: stack scanning
17 libart.so + 0x3f85f8
rsp = 0x00007ffc5b6e39d0 rip = 0x00007f6664bf85f8
Found by: stack scanning
18 libart.so + 0x3f92da
rsp = 0x00007ffc5b6e3a10 rip = 0x00007f6664bf92da
Found by: stack scanning
19 libart.so + 0x3f8bbe
rsp = 0x00007ffc5b6e3a60 rip = 0x00007f6664bf8bbe
Found by: stack scanning
20 libart.so + 0x3f0670
rsp = 0x00007ffc5b6e3a90 rip = 0x00007f6664bf0670
Found by: stack scanning
21 libart.so + 0x33102
rsp = 0x00007ffc5b6e3b90 rip = 0x00007f6664833102
Found by: stack scanning
22 libart.so + 0x3489d
rsp = 0x00007ffc5b6e3ba0 rip = 0x00007f666483489d
Found by: stack scanning
23 libart.so + 0x31636
rsp = 0x00007ffc5b6e3bb0 rip = 0x00007f6664831636
Found by: stack scanning
24 boot.art] + 0x94500
rsp = 0x00007ffc5b6e3be8 rip = 0x000000007006c500
Found by: stack scanning
25 boot.art] + 0x1f8e00
rsp = 0x00007ffc5b6e3cc8 rip = 0x00000000701d0e00
Found by: stack scanning
26 libart.so + 0x2d2e6
rsp = 0x00007ffc5b6e3d20 rip = 0x00007f666482d2e6
Found by: stack scanning
27 libart.so + 0x3f1dea
rsp = 0x00007ffc5b6e3d30 rip = 0x00007f6664bf1dea
Found by: stack scanning
28 libart.so + 0x124b7
rsp = 0x00007ffc5b6e3d40 rip = 0x00007f66648124b7
Found by: stack scanning
29 libart.so + 0x3f85f8
rsp = 0x00007ffc5b6e3dc0 rip = 0x00007f6664bf85f8
Found by: stack scanning
30 boot-framework.art] + 0xac86f8
rsp = 0x00007ffc5b6e3dd8 rip = 0x0000000070e206f8
Found by: stack scanning
......
接下来使用Android NDK里面提供的add2line工具将寄存器地址转换为对应符号。addr2Line要用和自己so和ABI匹配的目录,同时需要使用有符号信息的so(debug生成的)
模拟器采用x86架构,因此使用的add2Line路径为
D:\AndroidSDK\ndk\21.4.7075529\toolchains\x86-4.9\prebuilt\windows-x86_64\bin\i686-linux-android-addr2line.exe
i686-linux-android-addr2line.exe -f -C -e so文件路径 需要转换的堆栈错误信息地址
可以看到,声明了一个指向int类型的指针变量p,初始化为NULL,*p = 10,试图通过空指针去写入数据到内存中,因为p没有指向一个合法的内存地址,操作系统不允许访问该地址,导致程序崩溃。
使用addr2line结合not striped so库能够解析出崩溃的地址等信息。但是需要使用debug版的so文件,让用户上传这个debug版的so有代码泄漏的隐患,并且文件size比较大。
因此在真实场景中,客户端一般是在崩溃之前先上传符号表文件和minidump文件,由服务端通过minidump_stackwalk命令,将dump文件和sym文件合成可读的txt。
breakPad注册信号处理
ExceptionHandler::ExceptionHandler(const MinidumpDescriptor& descriptor,
FilterCallback filter,
MinidumpCallback callback,
void* callback_context,
bool install_handler,
const int server_fd)
: filter_(filter),
callback_(callback),
callback_context_(callback_context),
minidump_descriptor_(descriptor),
crash_handler_(nullptr) {
if (server_fd >= 0)
crash_generation_client_.reset(CrashGenerationClient::TryCreate(server_fd));
if (!IsOutOfProcess() && !minidump_descriptor_.IsFD() &&
!minidump_descriptor_.IsMicrodumpOnConsole())
minidump_descriptor_.UpdatePath();
#if defined(__ANDROID__)
if (minidump_descriptor_.IsMicrodumpOnConsole())
logger::initializeCrashLogWriter();
#endif
pthread_mutex_lock(&g_handler_stack_mutex_);
// Pre-fault the crash context struct. This is to avoid failing due to OOM
// if handling an exception when the process ran out of virtual memory.
memset(&g_crash_context_, 0, sizeof(g_crash_context_));
if (!g_handler_stack_)
g_handler_stack_ = new std::vector<ExceptionHandler*>;
if (install_handler) {
InstallAlternateStackLocked();
InstallHandlersLocked();
}
g_handler_stack_->push_back(this);
pthread_mutex_unlock(&g_handler_stack_mutex_);
}
// Runs before crashing: normal context.
// static
bool ExceptionHandler::InstallHandlersLocked() {
if (handlers_installed)
return false;
// Fail if unable to store all the old handlers.
for (int i = 0; i < kNumHandledSignals; ++i) {
//获取原始信号处理函数(备份),备份到 old_handlers[] 数组中,后续可以恢复使用
if (sigaction(kExceptionSignals[i], nullptr, &old_handlers[i]) == -1)
return false;
}
//构建新的信号处理结构体
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sigemptyset(&sa.sa_mask);
// Mask all exception signals when we're handling one of them.
for (int i = 0; i < kNumHandledSignals; ++i)
sigaddset(&sa.sa_mask, kExceptionSignals[i]);
//设置自定义信号处理函数和标志位
sa.sa_sigaction = SignalHandler;
sa.sa_flags = SA_ONSTACK | SA_SIGINFO;
for (int i = 0; i < kNumHandledSignals; ++i) {
if (sigaction(kExceptionSignals[i], &sa, nullptr) == -1) {
// At this point it is impractical to back out changes, and so failure to
// install a signal is intentionally ignored.
}
}
handlers_installed = true;
return true;
}
当特定的信号发生时,SignalHandler
方法就会被调用,SignalHandler
是注册信号时,传入的回调方法。