pagecache过多导致oom的排查记录

发布于：2025-07-06 ⋅ 阅读:(101) ⋅ 点赞:(0)

近期遇到好几起oom，从日志来看，都是inactive_file比较多（200万+，约为10G左右）的情况下触发了oom，为什么这么多pagecache无法回收呢？
通过翻阅代码（4.19.195）发现，在shrink_page_list()函数中，

if (PageDirty(page)) { //处理页面是脏页的情况
			/*
			 * Only kswapd can writeback filesystem pages
			 * to avoid risk of stack overflow. But avoid
			 * injecting inefficient single-page IO into
			 * flusher writeback as much as possible: only
			 * write pages when we've encountered many
			 * dirty pages, and when we've already scanned
			 * the rest of the LRU for clean pages and see
			 * the same dirty pages again (PageReclaim).
			 */
			if (page_is_file_cache(page) &&
			    (!current_is_kswapd() || !PageReclaim(page) ||
			     !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
				/*
				 * Immediately reclaim when written back.
				 * Similar in principal to deactivate_page()
				 * except we already have the page isolated
				 * and know it's dirty
				 */
				inc_node_page_state(page, NR_VMSCAN_IMMEDIATE);
				SetPageReclaim(page); //设置reclaim标志

				goto activate_locked;
			}

非kswapd进程无法对文件页进行回写，即使是kswapd()，也需要该node脏页达到一定数量，以及设置了PageReclaim标志（在第一次扫描时设置），否则不回收，只将该页面设置为可回收放到活动LRU链表.
此外，在shrink_page_list()函数中会调用page_check_references()检查对应page的pte引用数量，

static enum page_references page_check_references(struct page *page,
						  struct scan_control *sc)
{
	referenced_ptes = page_referenced(page, 1, sc->target_mem_cgroup, //检查该页面访问、引用了多少个pte
					  &vm_flags);

	if (referenced_ptes) { //若该页面引用了pte
		/*
		 * All mapped pages start out with page table
		 * references from the instantiating fault, so we need
		 * to look twice if a mapped file page is used more
		 * than once.
		 *
		 * Mark it and spare it for another trip around the
		 * inactive list.  Another page table reference will
		 * lead to its activation.
		 *
		 * Note: the mark is set for activated pages as well
		 * so that recently deactivated but used pages are
		 * quickly recovered.
		 */
		SetPageReferenced(page);
		if (referenced_page || referenced_ptes > 1) 
			return PAGEREF_ACTIVATE;

可以看到，如果有两个或以上进程共享映射了这个pagecache，那referenced_ptes > 1条件成立，这种类型的页也不会被回收。
总的看起来，即使是pagecache，要能被回收，也是需要满足许多条件的呀。

pagecache过多导致oom的排查记录

网站公告

今日签到

热门文章

最新发布