Linux Signal Handle

Table of Contents

1. Linux signal handle

2. 提纲:

  • 信号基本概念以及定义
  • 信号发送
  • 信号接收
  • SIGSTOP流程
  • SIGKILL流程
  • SIGSEGV流程

2.1. man pages

3. 信号基本概念以及定义

3.1. Signal设计目的

  • siganl提供一个基础的异步通知机制而设计
  • signal是一种IPC手段

与其他IPC手段不同的是,不需要专门的线程block住去等待消息

3.2. 1. Linux 信号定义

include/uapi/asm-generic/signal.h

#define _NSIG       64
#define _NSIG_BPW   __BITS_PER_LONG
#define _NSIG_WORDS (_NSIG / _NSIG_BPW)

#define SIGHUP       1
#define SIGINT       2
#define SIGQUIT      3
#define SIGILL       4
#define SIGTRAP      5
#define SIGABRT      6
#define SIGIOT       6
#define SIGBUS       7
#define SIGFPE       8
#define SIGKILL      9
#define SIGUSR1     10
#define SIGSEGV     11
#define SIGUSR2     12
#define SIGPIPE     13
#define SIGALRM     14
#define SIGTERM     15
#define SIGSTKFLT   16
#define SIGCHLD     17
#define SIGCONT     18
#define SIGSTOP     19
#define SIGTSTP     20
#define SIGTTIN     21
#define SIGTTOU     22
#define SIGURG      23
#define SIGXCPU     24
#define SIGXFSZ     25
#define SIGVTALRM   26
#define SIGPROF     27
#define SIGWINCH    28
#define SIGIO       29
#define SIGPOLL     SIGIO
/*
#define SIGLOST     29
*/
#define SIGPWR      30
#define SIGSYS      31
#define SIGUNUSED   31

/* These should not be considered constants from userland.  */
#define SIGRTMIN    32
#ifndef SIGRTMAX
#define SIGRTMAX    _NSIG
#endif
  • 内核相关结构体

    Signal 内部数据结构

    struct task_struct {
      /* Signal handlers: */
      struct signal_struct      *signal;    //同一线程组共有的sigpending链表
      struct sighand_struct     *sighand;
      sigset_t          blocked;
      sigset_t          real_blocked;
      /* Restored if set_restore_sigmask() was used: */
      sigset_t          saved_sigmask;
      struct sigpending     pending;        //私有的sigpending链表
      unsigned long         sas_ss_sp;
      size_t                sas_ss_size;
      unsigned int          sas_ss_flags;
    };
    
    struct signal_struct {
        atomic_t        sigcnt;
        atomic_t        live;
        int         nr_threads;
        struct list_head    thread_head;
    
        wait_queue_head_t   wait_chldexit;  /* for wait4() */
      /* shared signal handling: */
        struct sigpending   shared_pending;
    
        /* thread group exit support */
        int         group_exit_code;
        struct task_struct  *group_exit_task;
        struct rlimit rlim[RLIM_NLIMITS];
    };
    
    struct sigpending {
        struct list_head list;
        sigset_t signal;
    };
    
    struct sigqueue {
        struct list_head list;
        int flags;
        siginfo_t info;
        struct user_struct *user;
    };
    
    struct sighand_struct {
        atomic_t        count;
        struct k_sigaction  action[_NSIG];
        spinlock_t      siglock;
        wait_queue_head_t   signalfd_wqh;
    };
    

    5b50a492e4b0be50eab7a65a.png?_=1532930354799
    > sigset_t: bitmap for signal state

4. 信号发送

  1. raise(3)
    发送信号给当前线程
  2. kill(2)
    发送给特定进程,进程组,或全部进程
  3. killpg(3)
    发送给进程组
  4. pthread_kill(3)
    发送给指定线程
  5. tgkill(2)
    发送给指定线程,通常用来实现pthread_kill
  6. sigqueue(3)
    发送信号给指定进程,可以携带一个int,或者指针类型数据。
    sigqueue编程示例

4.1. 内核中信号发送流程

linux-send-signal-flow.png

不管从哪条路径发送信号,最终入口都是__send_signal

4.1.1. alloc sigqueue 结构题

注: alloc失败时,内核向进程发送的信号可以顺利发送 ### task选择 ###
complete_signal函数 1. 优先给主线程 2. 在所有线程中查找可以注册的线程
### 在加入信号链表,设置对应的bitmap之后返回 ###

4.2. 信号接收

  1. sig_action 设置信号处理函数
  2. sigwait 同步等待信号
  3. sigsuspend 同步等待信号,仅一次
  4. sigblock 阻塞信号
  5. siginterrupt 更改restart_systemcall行为,默认false(0)
  6. sigpause 废弃,用sigsuspend

4.2.1. 信号处理途径

  • Kernel handler
  • 如果进程没有实现信号处理函数,则由内核默认处理函数处理
  • 部分信号(SIGSTOP,SIGKILL)用户进程无权设置处理函数,也不能block
  • Process defined handler
  • 如果设置了信号处理函数,则可以跳转到自己处理函数执行
  • Ignore
  • 进程设置忽略信号
  • Kernel handler
    • Ignore
    • Terminate
    • Coredump
    • Stop
    “ +--------------------+------------------+
     * | POSIX signal     | default action |
     * +------------------+------------------+
     * | SIGHUP           | terminate
     * | SIGINT           | terminate
     * | SIGQUIT          | coredump
     * | SIGILL           | coredump
     * | SIGTRAP          | coredump
     * | SIGABRT/SIGIOT   | coredump
     * | SIGBUS           | coredump
     * | SIGFPE           | coredump
     * | SIGKILL          | terminate
     * | SIGUSR1          | terminate
     * | SIGSEGV          | coredump
     * | SIGUSR2          | terminate
     * | SIGPIPE          | terminate
     * | SIGALRM          | terminate
     * | SIGTERM          | terminate
     * | SIGCHLD          | ignore
     * | SIGCONT          | ignore
     * | SIGSTOP          | stop
     * | SIGTSTP          | stop
     * | SIGTTIN          | stop
     * | SIGTTOU          | stop
     * | SIGURG           | ignore
     * | SIGXCPU          | coredump
     * | SIGXFSZ          | coredump
     * | SIGVTALRM        | terminate
     * | SIGPROF          | terminate
     * | SIGPOLL/SIGIO    | terminate
     * | SIGSYS/SIGUNUSED | coredump
     * | SIGSTKFLT        | terminate
     * | SIGWINCH         | ignore
     * | SIGPWR           | terminate
     * | SIGRTMIN-SIGRTMAX| terminate
     * +------------------+------------------+
     * | non-POSIX signal | default action |
     * +------------------+------------------+
     * | SIGEMT           | coredump |
     * +--------------------+------------------+”
    

    摘录来自: Raghu Bharadwaj. "Mastering Linux Kernel Development: A
    kernel developer's reference manual。" iBooks.

    signal-send-process.png

  • Process defined handler

    2020-09-09_16-27-04_screenshot.png

    摘录来自: Raghu Bharadwaj. "Mastering Linux Kernel Development: A kernel
    developer's reference manual。" iBooks.

4.2.2. do_signal_stop流程 (SIGSTOP)

main with flags:JOBCTL_STOP_PENDING, group_stop_count is threads thread1
wakeup with JOBCTL_STOP_DEQUEUED thread2 wakeup with
JOBCTL_STOP_DEQUEUED do_notify_parent_cldstop //last one send this
signal

.do-signal-stop.png

.do-signal-d-stop.png

4.2.3. kill流程 (SIGKILL)

.linux-sigkill-process.png

4.2.4. SEGV流程

  • 异常处理表
    static const struct fault_info fault_info[] = {
        { do_bad,       SIGKILL, SI_KERNEL, "ttbr address size fault"   },
        { do_bad,       SIGKILL, SI_KERNEL, "level 1 address size fault"    },
        { do_bad,       SIGKILL, SI_KERNEL, "level 2 address size fault"    },
        { do_bad,       SIGKILL, SI_KERNEL, "level 3 address size fault"    },
        { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 0 translation fault" },
        { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 1 translation fault" },
        { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 2 translation fault" },
        { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 3 translation fault" },
        { do_bad,       SIGKILL, SI_KERNEL, "unknown 8"         },
        { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 1 access flag fault" },
        { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 2 access flag fault" },
        { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 3 access flag fault" },
        { do_bad,       SIGKILL, SI_KERNEL, "unknown 12"            },
        { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 1 permission fault"  },
        { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 2 permission fault"  },
        { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 3 permission fault"  },
        { do_sea,       SIGBUS,  BUS_OBJERR,    "synchronous external abort"    },
        { do_bad,       SIGKILL, SI_KERNEL, "unknown 17"            },
      ...
    };
    static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *regs)
    {
        /*
         * If we are in kernel mode at this point, we have no context to
         * handle this fault with.
         */
        if (user_mode(regs)) {
            const struct fault_info *inf = esr_to_fault_info(esr);
            struct siginfo si = {
                .si_signo   = inf->sig,
                .si_code    = inf->code,
                .si_addr    = (void __user *)addr,
            };
    
            __do_user_fault(&si, esr);
        } else {
            __do_kernel_fault(addr, esr, regs);
        }
    }
    
    static void __do_user_fault(struct siginfo *info, unsigned int esr)
    {
      ...
      arm64_force_sig_info(info, esr_to_fault_info(esr)->name, current);
    }
    
  • tomestoned进程

    system/core/debuggerd/tombstoned/tombstoned.rc

    service tombstoned /system/bin/tombstoned
        user tombstoned
        group system
    
        # Don't start tombstoned until after the real /data is mounted.
        class late_start
    
        socket tombstoned_crash seqpacket 0666 system system
        socket tombstoned_intercept seqpacket 0666 system system
        socket tombstoned_java_trace seqpacket 0666 system system
        writepid /dev/cpuset/system-background/tasks
    
  • 信号处理函数

    for android N
    Android进程Crash处理流程
    for android O

    /*
     * This code is called after the linker has linked itself and
     * fixed it's own GOT. It is safe to make references to externs
     * and other non-local data at this point.
     */
    static ElfW(Addr) __linker_init_post_relocation(KernelArgumentBlock& args) {
      ProtectedDataGuard guard;
      ...
    #ifdef __ANDROID__
      debuggerd_callbacks_t callbacks = {
        .get_abort_message = []() {
          return g_abort_message;
        },
        .post_dump = &notify_gdb_of_libraries,
      };
      debuggerd_init(&callbacks);
    #endif
      g_linker_logger.ResetState();
      ...
    }
    
    // Handler that does crash dumping by forking and doing the processing in the child.
    // Do this by ptracing the relevant thread, and then execing debuggerd to do the actual dump.
    static void debuggerd_signal_handler(int signal_number, siginfo_t* info, void* context) {
      ...
      debugger_thread_info thread_info = {
        .crash_dump_started = false,
        .pseudothread_tid = -1,
        .crashing_tid = __gettid(),
        .signal_number = signal_number,
        .info = info
      };
    
      // Set PR_SET_DUMPABLE to 1, so that crash_dump can ptrace us.
      int orig_dumpable = prctl(PR_GET_DUMPABLE);
      if (prctl(PR_SET_DUMPABLE, 1) != 0) {
        fatal_errno("failed to set dumpable");
      }
    
      // Essentially pthread_create without CLONE_FILES (see debuggerd_dispatch_pseudothread).
      pid_t child_pid =
        clone(debuggerd_dispatch_pseudothread, pseudothread_stack,
              CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID,
              &thread_info, nullptr, nullptr, &thread_info.pseudothread_tid);
      if (child_pid == -1) {
        fatal_errno("failed to spawn debuggerd dispatch thread");
      }
      // Wait for the child to start...
      futex_wait(&thread_info.pseudothread_tid, -1);
    
      // and then wait for it to finish.
      futex_wait(&thread_info.pseudothread_tid, child_pid);
    }
    
    static int debuggerd_dispatch_pseudothread(void* arg) {
      debugger_thread_info* thread_info = static_cast<debugger_thread_info*>(arg);
    
      for (int i = 0; i < 1024; ++i) {
        close(i);
      }
    
      int devnull = TEMP_FAILURE_RETRY(open("/dev/null", O_RDWR));
    
      // devnull will be 0.
      TEMP_FAILURE_RETRY(dup2(devnull, STDOUT_FILENO));
      TEMP_FAILURE_RETRY(dup2(devnull, STDERR_FILENO));
    
      int pipefds[2];
      if (pipe(pipefds) != 0) {
        fatal_errno("failed to create pipe");
      }
    
      // Don't use fork(2) to avoid calling pthread_atfork handlers.
      int forkpid = clone(nullptr, nullptr, 0, nullptr);
      if (forkpid == -1) {
        async_safe_format_log(ANDROID_LOG_FATAL, "libc",
                              "failed to fork in debuggerd signal handler: %s", strerror(errno));
      } else if (forkpid == 0) {
        TEMP_FAILURE_RETRY(dup2(pipefds[1], STDOUT_FILENO));
        close(pipefds[0]);
        close(pipefds[1]);
    
        raise_caps();
    
        char main_tid[10];
        char pseudothread_tid[10];
        char debuggerd_dump_type[10];
        async_safe_format_buffer(main_tid, sizeof(main_tid), "%d", thread_info->crashing_tid);
        async_safe_format_buffer(pseudothread_tid, sizeof(pseudothread_tid), "%d",
                                 thread_info->pseudothread_tid);
        async_safe_format_buffer(debuggerd_dump_type, sizeof(debuggerd_dump_type), "%d",
                                 get_dump_type(thread_info));
    
        execl(CRASH_DUMP_PATH, CRASH_DUMP_NAME, main_tid, pseudothread_tid, debuggerd_dump_type,
              nullptr);
    
        fatal_errno("exec failed");
      } else {
        close(pipefds[1]);
        char buf[4];
        ssize_t rc = TEMP_FAILURE_RETRY(read(pipefds[0], &buf, sizeof(buf)));
        if (rc == -1) {
          async_safe_format_log(ANDROID_LOG_FATAL, "libc", "read of IPC pipe failed: %s",
                                strerror(errno));
        } else if (rc == 0) {
          async_safe_format_log(ANDROID_LOG_FATAL, "libc", "crash_dump helper failed to exec");
        } else if (rc != 1) {
          async_safe_format_log(ANDROID_LOG_FATAL, "libc",
                                "read of IPC pipe returned unexpected value: %zd", rc);
        } else {
          if (buf[0] != '\1') {
            async_safe_format_log(ANDROID_LOG_FATAL, "libc", "crash_dump helper reported failure");
          } else {
            thread_info->crash_dump_started = true;
          }
        }
        close(pipefds[0]);
    
        // Don't leave a zombie child.
        int status;
        if (TEMP_FAILURE_RETRY(waitpid(forkpid, &status, 0)) == -1) {
          async_safe_format_log(ANDROID_LOG_FATAL, "libc", "failed to wait for crash_dump helper: %s",
                                strerror(errno));
        } else if (WIFSTOPPED(status) || WIFSIGNALED(status)) {
          async_safe_format_log(ANDROID_LOG_FATAL, "libc", "crash_dump helper crashed or stopped");
          thread_info->crash_dump_started = false;
        }
      }
    
      syscall(__NR_exit, 0);
      return 0;
    }
    

    clone参数 clone(debuggerd_dispatch_pseudothread, pseudothread_stack,
    CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_CHILD_SETTID |
    CLONE_CHILD_CLEARTID, &thread_info, nullptr, nullptr,
    &thread_info.pseudothread_tid); //
    http://androidxref.com/9.0.0_r3/xref/bionic/libc/bionic/pthread_create.cpp#302
    int flags = CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND |
    CLONE_THREAD | CLONE_SYSVSEM | CLONE_SETTLS | CLONE_PARENT_SETTID |
    CLONE_CHILD_CLEARTID;

4.2.5. Signal调试

  • tracing event

    /d/tracing/events/signal/signal_generate
    /d/tracing/events/signal/signal_deliver

    remote_job_disp-24083 [000] d..2  5497.143322: signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
    ActivityManager-7834  [001] d..2  5497.171845: signal_generate: sig=9 errno=0 code=0 comm=id.printspooler pid=25155 grp=1 res=0
       FileObserver-25196 [000] d..3  5497.176538: signal_generate: sig=17 errno=0 code=262146 comm=main pid=7514 grp=1 res=0
               main-7514  [002] d..2  5497.176804: signal_deliver: sig=17 errno=0 code=262146 sa_handler=7f836cfef8 sa_flags=0
    ActivityManager-7834  [001] d..2  5497.222412: signal_generate: sig=9 errno=0 code=0 comm=rsonalassistant pid=24800 grp=1 res=0
    ActivityManager-7834  [001] d..2  5497.227639: signal_generate: sig=9 errno=0 code=0 comm=rsonalassistant pid=24800 grp=1 res=2
      Profile Saver-24878 [000] d..3  5497.229721: signal_generate: sig=17 errno=0 code=262146 comm=main pid=717 grp=1 res=0
               main-717   [001] d..2  5497.230300: signal_deliver: sig=17 errno=0 code=262146 sa_handler=f31702e1 sa_flags=4000000
    remote_job_disp-24083 [000] d..3  5497.285461: signal_generate: sig=17 errno=0 code=262146 comm=main pid=717 grp=1 res=0
               main-717   [001] d..2  5497.285844: signal_deliver: sig=17 errno=0 code=262146 sa_handler=f31702e1 sa_flags=4000000
            SysUiBg-8259  [000] d.h6  5497.365086: signal_generate: sig=32 errno=0 code=131070 comm=POSIX timer 344 pid=15551 grp=0 res=0
          Thread-24-25413 [001] d.h3  5497.751070: signal_generate: sig=32 errno=0 code=131070 comm=POSIX timer 0 pid=8158 grp=0 res=0
          Thread-24-25413 [001] d.h3  5497.868609: signal_generate: sig=32 errno=0 code=131070 comm=POSIX timer 344 pid=15551 grp=0 res=0
      Binder:7518_3-8391  [002] d.h2  5497.958303: signal_generate: sig=14 errno=0 code=128 comm=sensors.qcom pid=614 grp=1 res=0
              perfd-2666  [007] .n.1  5498.123490: tracing_mark_write: B|459|perf_lock_acq: send output handle 10233 to client(pid 7767, tid=8320)
    
  • 查看进程信号屏蔽,处理信息

    cat /proc/xxx/status

    mido:/ # cat /proc/7767/status
    Name:   system_server
    State:  S (sleeping)
    Tgid:   7767
    Pid:    7767
    PPid:   7514
    TracerPid:  0
    Uid:    1000    1000    1000    1000
    Gid:    1000    1000    1000    1000
    Ngid:   0
    FDSize: 1024
    Groups: 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1018 1021 1032 3001 3002 3003 3006 3007 3009 3010 9801
    VmPeak:  2826432 kB
    VmSize:  2702540 kB
    VmLck:    144456 kB
    VmPin:         0 kB
    VmHWM:    383340 kB
    VmRSS:    325492 kB
    VmData:   438872 kB
    VmStk:      8196 kB
    VmExe:        16 kB
    VmLib:    140536 kB
    VmPTE:      1824 kB
    VmSwap:    24688 kB
    Threads:    211
    SigQ:   6/10397                         //size/limits
    SigPnd: 0000000000000000              //挂起,等待处理的信号(本线程专属)
    ShdPnd: 0000000000000000              //挂起,等待处理的信号(线程组公用)
    SigBlk: 0000000000001204              //被sigwait注册处理的信号, 这里 3) SIGQUIT, 10) SIGUSR1, 13) SIGPIPE被上层通过系统调用等待
    SigIgn: 0000000000000001              //忽略的信号
    SigCgt: 20000002000084f8              //被上层通过sigaction注册捕捉的信号,这个地方SIGABRT, SIGBUS, SIGSEGV等异常信号都被捕捉,用以输出tomestone
    CapInh: 0000000000000000
    CapPrm: 0000001007897c20
    CapEff: 0000001007897c20
    CapBnd: 0000000000000000
    Seccomp:    0
    Cpus_allowed:   d7
    Cpus_allowed_list:  0-2,4,6-7
    Mems_allowed:   1
    Mems_allowed_list:  0
    voluntary_ctxt_switches:    69902
    nonvoluntary_ctxt_switches: 3480
    
Contact me via :)
虚怀乃若谷,水深则流缓。