近期遇到好几起oom,从日志来看,都是inactive_file比较多(200万+,约为10G左右)的情况下触发了oom,为什么这么多pagecache无法回收呢?
通过翻阅代码(4.19.195)发现,在shrink_page_list()函数中,
if (PageDirty(page)) { //处理页面是脏页的情况
/*
* Only kswapd can writeback filesystem pages
* to avoid risk of stack overflow. But avoid
* injecting inefficient single-page IO into
* flusher writeback as much as possible: only
* write pages when we've encountered many
* dirty pages, and when we've already scanned
* the rest of the LRU for clean pages and see
* the same dirty pages again (PageReclaim).
*/
if (page_is_file_cache(page) &&
(!current_is_kswapd() || !PageReclaim(page) ||
!test_bit(PGDAT_DIRTY, &pgdat->flags))) {
/*
* Immediately reclaim when written back.
* Similar in principal to deactivate_page()
* except we already have the page isolated
* and know it's dirty
*/
inc_node_page_state(page, NR_VMSCAN_IMMEDIATE);
SetPageReclaim(page); //设置reclaim标志
goto activate_locked;
}
非kswapd进程无法对文件页进行回写,即使是kswapd(),也需要该node脏页达到一定数量,以及设置了PageReclaim标志(在第一次扫描时设置),否则不回收,只将该页面设置为可回收放到活动LRU链表.
此外,在shrink_page_list()函数中会调用page_check_references()检查对应page的pte引用数量,
static enum page_references page_check_references(struct page *page,
struct scan_control *sc)
{
referenced_ptes = page_referenced(page, 1, sc->target_mem_cgroup, //检查该页面访问、引用了多少个pte
&vm_flags);
if (referenced_ptes) { //若该页面引用了pte
/*
* All mapped pages start out with page table
* references from the instantiating fault, so we need
* to look twice if a mapped file page is used more
* than once.
*
* Mark it and spare it for another trip around the
* inactive list. Another page table reference will
* lead to its activation.
*
* Note: the mark is set for activated pages as well
* so that recently deactivated but used pages are
* quickly recovered.
*/
SetPageReferenced(page);
if (referenced_page || referenced_ptes > 1)
return PAGEREF_ACTIVATE;
可以看到,如果有两个或以上进程共享映射了这个pagecache,那referenced_ptes > 1条件成立,这种类型的页也不会被回收。
总的看起来,即使是pagecache,要能被回收,也是需要满足许多条件的呀。