July 18, 2011

The Kernel Samepage Merging Process

KSM, simply put is a service daemon which scans the page addresses to find duplicate pages, merges them and therefore reduces the memory density. The code used in this post as example can be found under /mm/ksm.c in the kernel source.

Before continuing, it is important to keep in mind that:

  • KSM uses a red-black tree for the stable and unstable trees - efficiency is $$O(\log\ n)$$ per tree since the height can never be more than $$(2log\ (n+1))$$ with n being the number of nodes.
  • KSM only scans anonymous pages, file backed pages such as HugePages are not scanned and cannot be merged by KSM. This is different to Transparent Huge pages where as in RedHat 6.1, KSM will break up THP into small pages if shareable 4K pages are found and only if the system is running out of memory.
  • Merged pages are read-only as they are CoW protected.
  • Userspace application can register candidate regions for merging through the madwise() system call. We will not tackle the KSM API details in this post.
  • Because of the CoW nature, a merged page write action by an application will raise a page fault, which in return triggers the break_cow() routine, which issue a copy of the merged page to the writing application.
    static void break_cow(struct rmap_item *rmap_item)
             struct mm_struct *mm = rmap_item->mm;
             unsigned long addr = rmap_item->address;
             struct vm_area_struct *vma;
      * It is not an accident that whenever we want to break COW
      * to undo, we also need to drop a reference to the anon_vma.
             if (ksm_test_exit(mm))
                     goto out;
             vma = find_vma(mm, addr);
             if (!vma || vma->vm_start > addr)
                     goto out;
             if (!(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma)
                     goto out;
             break_ksm(vma, addr);

With this in mind, here is a summarized view on how KSM works per steps

KSM scan pages and elects whether a page could be considered to be merged… these pages are referred as “candidate” pages. To quickly state it, a candidate page which does not exists in the stable tree is added as a node to the unstable tree, but we will get to this later in the post. To determine if a page has changed or not, KSM relies on a 32bit checksum,¬†which is then added to the page content and evaluated on the next scan.

    static u32 calc_checksum(struct page *page)
             u32 checksum;
             void *addr = kmap_atomic(page, KM_USER0);
             checksum = jhash2(addr, PAGE_SIZE / 4, 17);
             kunmap_atomic(addr, KM_USER0);
             return checksum;

In other words, KSM finds page X, creates a checksum, stores it in page X - on the next scan, if the checksum of page X did not change, then it is considered as a candidate page.

For each candidate page, KSM starts a memcmp_pages() operation to the stable tree which contains the merged pages.

    static int memcmp_pages(struct page *page1, struct page *page2)
             char *addr1, *addr2;
             int ret;
             addr1 = kmap_atomic(page1, KM_USER0);
             addr2 = kmap_atomic(page2, KM_USER1);
             ret = memcmp(addr1, addr2, PAGE_SIZE);
             kunmap_atomic(addr2, KM_USER1);
             kunmap_atomic(addr1, KM_USER0);
             return ret;

This unique process is as follow:

    ret = memcmp_pages(page, tree_page);
    if (ret < 0) {
    node = node->rb_left;
    } else if (ret > 0) {
    node = node->rb_right;
    } else
    return tree_page;

Understanding the following requires an understanding of how a binary tree works in general, more specifically how a red-black tree works.

The stable tree is walked left if the candidate page is less than the page in the stable tree, right if the candidate page is superior to the stable page and the page is simply merge and the candidate page freed if both pages are identical.

The stable tree search function is referenced at http://lxr.free-electrons.com/source/mm/ksm.c#L985

Now if the candidate page was not found in the stable tree, its checksum is re-computed to determine whether the data has changed since or not. If it has changed it is then ignored; if not, the searching process continues in the unstable tree as with the search in the stable tree. The recursion __unstable_tree_search_insert() __can be seen at http://lxr.free-electrons.com/source/mm/ksm.c#L1078.

While searching the unstable tree, KSM will create a new node in this binary tree if the candidate page is unique

    rmap_item->address |= UNSTABLE_FLAG;
    rmap_item->address |= (ksm_scan.seqnr & SEQNR_MASK);
    rb_link_node(&rmap_item->node, parent, new);
    rb_insert_color(&rmap_item->node, &root_unstable_tree);

and if not unique such as the unstable tree contains similar candidate pages, it will be merged to the existing similar node and moved to the stable tree.

Once the KSM scan is done, the unstable tree is destroyed and recreated on the next iteration

I hope that was informative.