KLEE其实已经久仰大名了,在读The PhD Grind的时候就看的胆战心惊.. 这学期的CS510 Software Engineering课上终于可以一展真容了!


KLEE

KLEE的paper在这里OSDI’08, 这是最初始一篇

KLEE的官网在这里

KLEE的作者之一在哪里说到过(忘了原链接在哪儿了.. 相信我,有的)

At its core, KLEE is just an interpreter for LLVM bitcode.

我现在的理解是,KLEE是一个symbolic execution的工具,在source code编译而成的LLVM bytecode文件基础上,symbolic地跑一遍,跑各种path从而生成各种test case, 其间还可以顺便检查一些常见的错误出来。

Gcov & KLEE之间不得不说的关系

KLEE的效果评估就需要用到Gcov了。首先需要介绍一个Code Coverage的概念,wiki上说,Code Coverage指的是衡量test suite测试了整个program代码的标准,e.g. 通过多少个sub-routine被调用过,多少行被调用过这样子

Gcov就是这么一个在gcc toolchain下的已经成熟的code coverage tool. wiki说明如下:

Gcov is a source code coverage analysis and statement-by-statement profiling tool. Gcov generates exact counts of the number of times each statement in a program is executed and annotates source code to add instrumentation.

据说,gcov生成的log文件,记录了source file内每一行执行了几次,而gcc toolchain下的另一个工具gprof给出的是执行时间信息,两个可以结合在一起用来profiling & tuning.

Gcov可以用来评估KLEE的coverage. 具体过程如下:

  1. llvm-gcc/clang, 将source code编译成LLVM bytecode
  2. KLEE在LLVM bytecode上生成一堆test cases
  3. gcc, 将source code加上 - -coverage参数生成二进制文件(从而才能用于gcov)
  4. klee-replay命令可以将上述GCC生成的binary file在所有生成的test cases上跑一遍。其间会生成.gcda文件记录coverage信息
  5. gcov通过这些.gcda文件计算类似下边这些数据的信息

    Lines executed:77.78% of 45
    Branches executed:75.00% of 16
    Taken at least once:75.00% of 16
    Calls executed:64.71% of 17

这样,就算是评估了KLEE自动生成的test cases的coverage.

KLEE Paper Notes

以下是上边那篇OSDI’08 paper的笔记,没有特别细致地看,若有错误在所难免,欢迎指正 :)

Reviews from citations

有很多引用这篇paper的papers, 但是它们的着眼点基本在于KLEE中用到的某一个技术,比如说怎么search执行下一步

KLEE [17] and EXE [19] can implement a DFS search with a configurable maximum depth for cyclic paths to prevent infinite loops.

// 注:这个EXE好像是KLEE同一批人做的

再比如说没有处理longjmp什么的:

A significant amount of previous work in forward symbolic execution does not directly address the symbolic jump problem [9] [17] ….

这些我就不摘录了,值得摘录的有下边这段,作者之一在他的另外一篇paper中的介绍,主要讲的是相对于EXE的改进:

KLEE [11] is a redesign of EXE, built on top of the LLVM [39] compiler infrastructure. Like EXE, it performs mixed concrete/symbolic execution, models memory with bit-level accuracy, employs a variety of constraint solving optimizations, and uses search heuristics to get high code coverage.

One of the key improvements of KLEE over EXE is its ability to store a much larger number of concurrent states, by exploiting sharing among states at the object-, rather than at the page-level as in EXE. Another important improvement is its ability to handle interactions with the outside environment — e.g., with data read from the file system or over the network — by providing models designed to explore all possible legal interactions with the outside world.

My Notes

关于概括KLEE, 它自己这篇paper就说的不错:

We present a new symbolic execution tool, KLEE, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs.

然后说也可以用作一个bug-finding tool! 话说我看了The PhD Grind之后一直以为它只是一个自动找bug的工具..

文中列举了symbolic execution的主要两种问题:

  1. the exponential number of paths through code and
  2. the challenges in handling code that interacts with its surrounding environment, such as the operating system, the network, or the user.

从文中来看似乎KLEE主要有改进的是第二点,它把这种environment的影响也变成input那样的symbolic的东西,然后进一步执行,即为 - -sym-args & - -sym-files参数

文中讲到usage的时候也说了大致要传怎样的参数进去,像max-time, argc/argv这样的都是需要/可以指定的。此外,paper里讲的- -sym-args用法和现在官网上写的用法不一样呃….

  • [paper版本]:- -sym-args 1 2 3 指的是3个参数,第一个大小是1,第二个大小是2,第三个大小是3
  • [官网版本]: - -sym-args 1 2 3 指的是1到2个参数,每一个的大小是3


KLEE在每次遇到branch predicate的时候,都会调用”constraint solver STP”, 这个就是一个之前上课刚说过的bit-vector logic decision procedure!!

KLEE makes no approximations: its constraints have perfect accuracy down to the level of a single bit. If KLEE reaches an assert and its constraint solver states the false branch of the assert cannot execute given the current path constraints, then it has proved that no value exists on the current path that could violate the assertion, modulo bugs in KLEE or non-determinism in the code.

这个STP的调用开销是大头,所以他们竭尽全力地去优化那些bit-vector logical formula.

Almost always, the cost of constraint solving dominates everything else — unsurprising, given that KLEE generates complicated queries for an NP-complete logic.

至于symbolic execution的描述,也是有存一份state, 从reg到memory好像都有,然后到岔路口的时候fork一下变成几份state分别执行下去。所以会占用很多空间,于是他们有用到copy-on-write.

关于自动找bug, 他们有那种dangerous operation的区分(比如*p, assertion), 然后在symbolic执行到那里的时候会特别注意check有没有什么可能造成bug的可能性

另外他们在state scheduling里提到,他们其中的一种scheduling策略是random path selection, 就是俩branch选谁先走概率一样,这样能够防止fork bombing! 这个我还真没想到诶!

至于他们的metric - coverage, 是用line coverage来计算的,说这个比较没有争议,但是没法展现KLEE’s thoroughness (all possible paths). 如果用一个path-based metric应该会有更漂亮的数据的说