OOM Killer 原理
时间:2015-08-19
最近线上一台机器的 Redis 进程频繁被系统自动 Kill 掉,syslog(/var/log/syslog)记录了当时的日志:
Aug 19 11:16:31 jsldg kernel: [4836881.950256] Out of memory: Kill process 11063 (redis-cli) score 538 or sacrifice child Aug 19 11:16:31 jsldg kernel: [4836881.950338] Killed process 11063 (redis-cli) total-vm:4649480kB, anon-rss:3098176kB, file-rss:4kB
这段日志表示 Redis 触发了 OOM(Out of Memory) Killer 机制。
当系统资源不足时,Linux 会调用 out_of_memory 函数,out_of_memory 会选出系统中占用资源过大的进程并 kill 掉,其中 out_of_memory 会调用 select_bad_process 函数,选出占用资源最大的进程。
OOM Kill 机制相关代码在/mm/oom_kill.c:
p = select_bad_process(&points, totalpages, mpol_mask, force_kill); /* Found nothing?!?! Either we hang forever, or we panic. */ if (!p) { dump_header(NULL, gfp_mask, order, NULL, mpol_mask); panic("Out of memory and no killable processes...\n"); } if (p != (void *)-1UL) { oom_kill_process(p, gfp_mask, order, points, totalpages, NULL, nodemask, "Out of memory"); killed = 1;
当找到占用资源最大的进程后,函数调用 oom_kill_process 来结束进程。
select_bad_process 函数负责选举要被 kill 的进程,首先遍历进程,然后找出资源占用最大的进程。实现如下:
static struct task_struct *select_bad_process(unsigned int *ppoints, unsigned long totalpages, const nodemask_t *nodemask, bool force_kill) { struct task_struct *g, *p; struct task_struct *chosen = NULL; unsigned long chosen_points = 0; rcu_read_lock(); for_each_process_thread(g, p) { unsigned int points; switch (oom_scan_process_thread(p, totalpages, nodemask, force_kill)) { case OOM_SCAN_SELECT: chosen = p; chosen_points = ULONG_MAX; /* fall through */ case OOM_SCAN_CONTINUE: continue; case OOM_SCAN_ABORT: rcu_read_unlock(); return (struct task_struct *)(-1UL); case OOM_SCAN_OK: break; }; points = oom_badness(p, NULL, nodemask, totalpages); if (!points || points < chosen_points) continue; /* Prefer thread group leaders for display purposes */ if (points == chosen_points && thread_group_leader(chosen)) continue; chosen = p; chosen_points = points; } if (chosen) get_task_struct(chosen); rcu_read_unlock(); *ppoints = chosen_points * 1000 / totalpages; return chosen; }
其中调用了 oom_badness 函数来给每个进程打分,根据 point 的分数大小决定哪个进程被杀。