2011/12/07

TOOLS Europe 2011 – Day 2

Filed under: Uncategorized — Tags: — alexbergel @ 16:30

My name is Alexandre Bergel and this blog entry is about the second day of the TOOLS EUROPE 2011 conference.

The day is opened by Judith Bishop, the program co-chair. Before introducing the day’s keynote speaker, Frank Tip, Judith briefly introduced the new Microsoft Academic Search facility; it allows you to search for papers, co-author, impact factor (http://academic.research.microsoft.com). I was surprised at how intuitive it is. Beside the classical g & h indexes and graph of coauthors, it has some exotic features, for example, who cites you the most. It also produces an interactive graph of citations, which is at the very least entertaining.

The second keynote of TOOLS is given by Frank Tip, about security on web applications. I met Frank for the first time in 2003, at the Joint-Modular Language Conference (JMLC) conference on modularity in programming languages. Then, he gave a great talk about security; I am confident his new talk will be excellent.

Making web applications secure is pretty difficult. One reason is that everything happens with low-level string manipulations, and also because of the pervasive use of weakly typed languages. I first was a bit surprised by this statement since the core of the Pharo, Smalltalk and Ruby communities are web developers. You won’t make many friends in these communities if you say that dynamic typing is a bad thing.

As Frank goes on, I understand his argument better: web applications often use some dynamic features that are evil when you need to deal with security. A Web application that modifies itself while being executed usually leads to a complex mixture of different languages (PHP and HTML, at the minimum). Making all web browsers happy is challenging due to their subtle incompatibilities. Last, but not least, web applications are difficult to test because of the event-driven execution model. My personal experience shows that it is easy to write unit tests for this model, but simulating mouse clicks in the browser is not easy to automatize. I am not expert in web development, but I feel that expressing control flows in web application is one source of problems. Some say the Seaside framework addresses it in a sensible way, but I’m not sufficiently familiar with it.

Another typical problem in web applications is execution SQL injection and cross-site referencing. Wikipedia tells me that a cross-site injection is an attack that works by including a link or a script in a pages that accesses a site to which the user is known to have been authenticated. Imagine a link <img src=”bank.example.com/withdraw?account=bob&amp;amount=10000000&amp;for=hacker” alt=”” /> available on a page once Bob has logged in.

Frank’s talk continues, and he addresses general quality problems in Web applications; these are usually tackled by, for example, test generation. The idea of this is to identify inputs that expose failures in PHP applications.

“Apollo” is a tool developed at IBM Research for this, and Frank introduces it. Apollo identifies inputs that expose failures using dynamic symbol execution. The idea of dynamic symbolic executions is described by the following 3 steps: (i) give an empty input to your program then execute your program; (ii) you can deduce path constraints on input parameters based on the control flow predicates (i.e., the condition in if-statements); (iii) from this new path constraint, you can create a new input, you then jump to (ii). Frank empirically demonstrated this is a viable technique: many failures were found in numerous applications.

Another approach to addressing quality problems in web applications is fault localization, which is all about identifying where faults are located in the source code. Using ideas from Apollo, it is possible to generate an input that leads to a crash; but where the error is localized in the source code? This is a highly active area of research. The idea is to use information from failing and passing executions to predict location of faults. For each statement, its suspiciousness rating is computed using some very complicated (yet impressive) formulas.

Frank then moved on to directed test generation: the generation of test suite that enables better fault localization (I liked how the different contributions in Frank’s talk layered on each other). When a test fails, we usually want to know where the fault in our program is. What if no test suite is available? How can we generate a test suite with good fault characteristics? Tests that fail for the same fault can be exploited to localize the fault. Dynamic symbol execution with its complicated formulas is effective to generate tests by being based on similarity metrics (path constraint similarity, input similarity).

Finally, Frank talked about automatic program repair. Usually it is not difficult to determine how to correct  malformed HTML since documents have a regular structure. Most errors are rather simple (e.g., missing ending tag). The easy cases can be solved by analyzing string literals in PHP source code and check for proper nesting and other simple problems (incorrect attributes). But how do we come up with a fix for the PHP program that generated the html code? The most difficult cases seem to involve solving a system of string constraints. Frank suggests to generate one constraint for each test case.

Frank summarised some of the key lessons he’d learned, in particular that we need better languages for developing web applications, and need to get them adopted. Until such languages are widely adopted, we are faced with a fragile mix of weakly typed languages and their problems (structured output constructed using low-level string manipulation, dynamic features are difficult to analyze statically, interactive user input). The analysis technique Frank presented seem promising to aid with this.

Coq Tutorial by Catherine Dubois

A major novelty of TOOLS Europe 2011 this year is the tutorial of Catherine Dubois on Coq. Coq means “Calculus of Construction” and it is a proof assistant. It allows users to write programs and their specifications. I am not sure what this really means, but apparently “Coq is based on a typed lambda-calculus with dependent types and inductive types (the richest in the Barendregt’s cube).”. People who dedicate their life to code generation will be happy to know that Coq can produce ML programs.
Coq has various applications:

  • Formalization of mathematics (constructive mathematics, four color theorem, geometry),
  • program verification (data structure and algorithms),
  • security (electronic banking protocols, mobile devices),
  • programming languages semantics and tools (ML type inference, POPLmark challenge, formalization of FeatherweightJava, JVM, compiler verification)

The tutorial began well, but after 25 minutes I was more lost than the first day I left my hometown. What I learnt is however useful: Coq allows you to write functional programs, pretty much as you would do in a typed version of Scheme. You can then write some specifications, and then find and apply a number of tactics to drive a proof.

Coq, as you may have guessed, is based on defining functions (anonymous or named one), applying them (partially or not) and composing them. Functions must be total (defined for _all_ the domain, e.g., the head function must be defined for an empty and non empty list). A function also has to always terminate. Functions are first class citizens. No side effects and no exceptions are supported. To give you a flavor of the Coq syntax, here are a few examples:

  • Assuming the type of natural numbers and the + function: Definition add3 (x: nat) := x + 3.
  • Assuming the type of lists and the concatenation ++ function: Definition at_end (l: list nat)(x: nat) := l++(xx::nil).
  • Higher order: Definition twice_nat (f: nat -> nat)(x: nat) := f (f x).

Catherine told me that LaTeX can be produced from a proof. Handy to write paper, isn’t it? For people who want to know more about Coq, “Coq in a hurry” is apparently an excellent tutorial (http://cel.archives-ouvertes.fr/docs/00/45/91/39/PDF/coq-hurry.pdf). I read it and understood everything.

Session 14:30 chaired by Juan de Lara

The afternoon technical session ran in parallel with Catherine’s tutorial, so unfortunately I missed the first talk. The second talk was “Systems evolution and software reuse in OOP and AOP” and is presented by Adam Przybylek. The research question pursed in this work is “Is the software built in AOP more maintainable than when written in OOP?”. To answer this question, five maintenance scenarios are solved twice, with AOP and OOP. The number of lines of code is used as a performance metric. As a result, an OOP solution requires less change on the code than with AOP. Stefan Hannenberg worked on a similar topic. His work sparkled an incredible amount of divergence in opinions. I immediately knew that Przybylek will be under some fire.
Christoph Bockisch opened the dance by asking “how are you sure that the aspects you have programmed are done in the right way?”. Other similar questions were raised. Apparently people from the AOP World felt a bit attacked. Dirk concluded “There is a lot more than AspectJ in AOP”. I wished the questions were not so defensive.

Session 16:00 chaired by Manuel Oriol

The final session of the day consisted of four papers, and was chaired by a previous TOOLS Europe programme chair! The first paper is titled “A Generic Solution for Syntax-driven Model Co-evolution” and is presented by Protic, Van Den Brand and Verhoeff. The problem that is addressed in this work is to ensure the evolution of a meta-model if a model evolves. They introduce a special metamodel for metamodels and a bijective transformation for transforming metamodels into models. It is not immediately clear to me why this is not called “instantiation”.

The second paper is “From UML Profiles to EMF Profiles and Beyond”. A UML profile is a lightweight language extension mechanism of the UML. A profile consists of stereotypes. A stereotypes extend base classes and may introduce tagged values. Profiles are applied to UML models. This paper explored how to use EMF profiles (in Eclipse) to support UML profiling concepts and technologies.

The third paper is “Domain-specific profiling”, presented by Jorge Ressia. I coauthored this work. Profiling is the activity of monitoring and analyzing program execution. The motivation is that traditional profilers reason on the stack frames by regularly sampling the method call stack, which is quite unadapted as soon as you have a non-trivial model. Our solution consists of offering profilers that will compute metrics against objects that compose the domain.
Nathaniel Nystrom asked how the profiler maps to the domain? What about “domain specific debugger” was asked by Mark van den Brand? Oh yes! Profiler and Debugger have many commonalities; the domain on which they reason. The model on which traditional code profilers and debuggers reason on is the stack frame: what a pretty poor abstraction for applications made of objects and components. This topic will probably entertain me for the coming years.

Summary of Days 3 and 4

On Day 3, Oscar advertised the Journal of Object Technology (JOT). Oscar has been the Editor-in-Chef of JOT for a little more than one year. JOT offers a transparent reviewing process, accepted submissions are quickly put online, and it has a presence on various social networks. I personally hope JOT will become an ISI journal. This will encourage South-American researchers to submit to JOT. Unfortunately, ISI publications still remain important for emerging countries. Things are changing, but slowly. Our paper TOOLS paper “Domain-Specific Profiling” has been selected in a special issue for JOT.

The Dyla workshop was held on Day 4. Dyla is the workshop on dynamic languages and applications. I am happy to have co-organized it. It was the fifth edition and a sheer crowd attended. Dyla is focussed on demos instead of technical and academic presentation. We had a number of exciting demos, including type systems for dynamic languages, visualizations of program execution, context-oriented programming and meta-object protocols. Dyla will be happen again in 2012.

I hope these two blog posts provide you with something of a flavour of TOOLS Europe in 2011, and I hope to blog from future events!

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress