Here is my note for presenting this new paper Learning Syntactic Program Transformations from Examples (08/31/2016 version) in our weekly paper reading group on 09/09/2016. See our reading group webpage in 16fall.

## Why I picked this paper

This paper was pushed to my email via Google Scholar Alert in 09/05. I found it to be a Sumit Gulwani’s new paper, and it seems pretty relevant to my yet another long term research topic – program transformation / optimization. So I picked it for our reading group this week. Although after some detailed reading, I didn’t find it completely well-explained in itself at the moment.. But after all it is just my understanding, please correct me if I am wrong. :)

## Demonstration Examples

So, in this paper their goal is to synthesize program transformations. By program transformation they are referring to those such as:

• TA in MOOC (Massive Open Online Courses, e.g. Coursera) can automatically fix incorrect submissions of online students. Otherwise it will be a great burden for a TA to manually handle that many students’ submissions.

• Refactoring in IDE, such as renaming, extracting code into a function, etc. More generally, this is “repetitive codebase editing”.

The Fig. 1 and Fig. 2 in paper are some examples of such transformations. Let’s just look at Fig. 2 because it is also used in subsequent demonstrations:

The snippet above basically replaces the type/kind check from “compare using the return value of CSharpKind() method call” to “delegate to a field method IsKind()“. This kind of changes is fairly common when refactoring.

Their synthesis result for these code edits will specify the transformation as follows. This is how they specify a transformation. Their DSL is operating on Abstract Syntax Tree (AST).

1. location expression: (where the transformation takes place)

It basically instructs to find some == expression with the left hand side being some expression calling its CSharpKind field method, and its right hand side being any expression.

2. operation: (what the transformation transforms)

It basically replaces the == expression (sub-tree) with a new l1.IsKind(l2) node. Here it is again matching the same x as just before, but could be matching with different patterns, apparently. l1 is referring to the te – any caller object, which is Absolute("1") child in π1. l2 is referring to the te – any right hand side expression, which is Absolute("2") child in π2.

I believe this is the interpretation of the path grammar in their DSL, and they claim that:

This allows for a rich variety of possible pattern-matching expressions, constraining the ancestors or the descendants of the desired locations in the input AST.

Other than this, I think their DSL (defined in Fig. 4) is pretty straightforward.

• Abstract token represents a kind of AST nodes, while Concrete token represents those with some specific value.

• Their allowed operations include Insert / Delete / Update / InsertBefore.

## Technical Details

So, how does their synthesis algorithm actually work? Well.. they implement it based on their (two authors’) previous OOPSLA’15 work – FlashMeta framework. This is why they say:

… leverages state-of-the-art programming-by-example methodology …

To me, it’s like one instantiation of that framework with respect to program transformation. From my limited knowledge, FlashMeta framework seemed mainly applicable to string / data manipulation.

So to use this FlashMeta framework, they will need to specify:

• the witness function
• ranking heuristics

Witness function is some concept used in FlashMeta framework. It deduces a smaller necessary specification from a larger given specification. In this way the framework can sort of divide and conquer the whole big synthesis problem. This whole procedure is also called backpropagation.

Therefore in my comprehension, it actually embeds the domain specific knowledge from user! This domain specific knowledge is, of course, also embedded in the user-defined DSL. Grammars of DSL are already well-structured (inductively defined), except the top level grammar – transformation. There is one more gap from input/output examples to transformation rules, that’s where they relatively explained more:

The key part of this process is the witness function for the top-level Transformation operator.

This may actually be viewed as pre-processing as well, IMO.

With above in mind, here is what they mentions to do:

1. Given a input/output example, which may contain a whole program with just a few lines of code edits. They first calculate the tree edit distance between input and output programs, to separate out individual code edits.

I searched about this unfamiliar tree edit distance, and found one introduction website. It returns a sequence of node edit operations to minimize the corresponding assigned cost. Therefore, the concrete transformation sequence is now in your hand..

2. But at present the generated code edits are still scattered, they conjecture that components with similar edit distances consitute examples of the same rewrite rule.

3. Here comes the part I am still a bit confused:

Then, in lines 7-11, for each similar component, we extract the topmost operation to create an example for the corresponding rewrite rule. This example contains the subtree where the operation was applied in the input AST and the resulting subtree in the output AST.

I don’t quite follow their pseudo-code, either.. They wrote:

Line 8: … create a single concrete operation based on ops

In my guess, they may only need to specify the operation, and can leave the rest unspecified at the moment. Underlying FlashMeta framework will automatically synthesize the details in each operator expression. That makes sense to me.

4. After the steps above, a lot of candidate transformations will be generated. They instruct to select the transformation by a ranking function heuristics that:

1. favors those using existing nodes, instead of constructing new nodes..
2. favors those considering the surrounding context..
3. favors those with shorter patterns..

Honestly, I was expecting more on the ranking functions.. Using, for example, some probabilistic foundations in the ranking / recommendation would be more interesting to me.. They also depicted the ranking function as

… rank them with respect to their trade-off between over-generalization and over-specialization.

But I didn’t find further discussion on this trade-off.. 囧

## End

I’ll skip the evaluation section, since we (well, actually I) are mainly interested in technique details.

To end this note, this is an pretty intersting paper to me. I am aware of several different formalizations of compiler transformation / optimizations. It’s interesting to see that the based-on-AST one can lead to synthesis using FlashMeta framework. I am wondering if similar approaches could be applied to those in a relaxed memory setting.

Again, please correct me if I am wrong. :)