alexbergel — The JOT Blog

2011/12/07

TOOLS Europe 2011 – Day 2

Filed under: Uncategorized — Tags: tools — alexbergel @ 16:30

My name is Alexandre Bergel and this blog entry is about the second day of the TOOLS EUROPE 2011 conference.

The day is opened by Judith Bishop, the program co-chair. Before introducing the day’s keynote speaker, Frank Tip, Judith briefly introduced the new Microsoft Academic Search facility; it allows you to search for papers, co-author, impact factor (http://academic.research.microsoft.com). I was surprised at how intuitive it is. Beside the classical g & h indexes and graph of coauthors, it has some exotic features, for example, who cites you the most. It also produces an interactive graph of citations, which is at the very least entertaining.

The second keynote of TOOLS is given by Frank Tip, about security on web applications. I met Frank for the first time in 2003, at the Joint-Modular Language Conference (JMLC) conference on modularity in programming languages. Then, he gave a great talk about security; I am confident his new talk will be excellent.

Making web applications secure is pretty difficult. One reason is that everything happens with low-level string manipulations, and also because of the pervasive use of weakly typed languages. I first was a bit surprised by this statement since the core of the Pharo, Smalltalk and Ruby communities are web developers. You won’t make many friends in these communities if you say that dynamic typing is a bad thing.

As Frank goes on, I understand his argument better: web applications often use some dynamic features that are evil when you need to deal with security. A Web application that modifies itself while being executed usually leads to a complex mixture of different languages (PHP and HTML, at the minimum). Making all web browsers happy is challenging due to their subtle incompatibilities. Last, but not least, web applications are difficult to test because of the event-driven execution model. My personal experience shows that it is easy to write unit tests for this model, but simulating mouse clicks in the browser is not easy to automatize. I am not expert in web development, but I feel that expressing control flows in web application is one source of problems. Some say the Seaside framework addresses it in a sensible way, but I’m not sufficiently familiar with it.

Another typical problem in web applications is execution SQL injection and cross-site referencing. Wikipedia tells me that a cross-site injection is an attack that works by including a link or a script in a pages that accesses a site to which the user is known to have been authenticated. Imagine a link <img src=”bank.example.com/withdraw?account=bob&amount=10000000&for=hacker” alt=”” /> available on a page once Bob has logged in.

Frank’s talk continues, and he addresses general quality problems in Web applications; these are usually tackled by, for example, test generation. The idea of this is to identify inputs that expose failures in PHP applications.

“Apollo” is a tool developed at IBM Research for this, and Frank introduces it. Apollo identifies inputs that expose failures using dynamic symbol execution. The idea of dynamic symbolic executions is described by the following 3 steps: (i) give an empty input to your program then execute your program; (ii) you can deduce path constraints on input parameters based on the control flow predicates (i.e., the condition in if-statements); (iii) from this new path constraint, you can create a new input, you then jump to (ii). Frank empirically demonstrated this is a viable technique: many failures were found in numerous applications.

Another approach to addressing quality problems in web applications is fault localization, which is all about identifying where faults are located in the source code. Using ideas from Apollo, it is possible to generate an input that leads to a crash; but where the error is localized in the source code? This is a highly active area of research. The idea is to use information from failing and passing executions to predict location of faults. For each statement, its suspiciousness rating is computed using some very complicated (yet impressive) formulas.

Frank then moved on to directed test generation: the generation of test suite that enables better fault localization (I liked how the different contributions in Frank’s talk layered on each other). When a test fails, we usually want to know where the fault in our program is. What if no test suite is available? How can we generate a test suite with good fault characteristics? Tests that fail for the same fault can be exploited to localize the fault. Dynamic symbol execution with its complicated formulas is effective to generate tests by being based on similarity metrics (path constraint similarity, input similarity).

Finally, Frank talked about automatic program repair. Usually it is not difficult to determine how to correct malformed HTML since documents have a regular structure. Most errors are rather simple (e.g., missing ending tag). The easy cases can be solved by analyzing string literals in PHP source code and check for proper nesting and other simple problems (incorrect attributes). But how do we come up with a fix for the PHP program that generated the html code? The most difficult cases seem to involve solving a system of string constraints. Frank suggests to generate one constraint for each test case.

Frank summarised some of the key lessons he’d learned, in particular that we need better languages for developing web applications, and need to get them adopted. Until such languages are widely adopted, we are faced with a fragile mix of weakly typed languages and their problems (structured output constructed using low-level string manipulation, dynamic features are difficult to analyze statically, interactive user input). The analysis technique Frank presented seem promising to aid with this.

Coq Tutorial by Catherine Dubois

A major novelty of TOOLS Europe 2011 this year is the tutorial of Catherine Dubois on Coq. Coq means “Calculus of Construction” and it is a proof assistant. It allows users to write programs and their specifications. I am not sure what this really means, but apparently “Coq is based on a typed lambda-calculus with dependent types and inductive types (the richest in the Barendregt’s cube).”. People who dedicate their life to code generation will be happy to know that Coq can produce ML programs.
Coq has various applications:

Formalization of mathematics (constructive mathematics, four color theorem, geometry),
program verification (data structure and algorithms),
security (electronic banking protocols, mobile devices),
programming languages semantics and tools (ML type inference, POPLmark challenge, formalization of FeatherweightJava, JVM, compiler verification)

The tutorial began well, but after 25 minutes I was more lost than the first day I left my hometown. What I learnt is however useful: Coq allows you to write functional programs, pretty much as you would do in a typed version of Scheme. You can then write some specifications, and then find and apply a number of tactics to drive a proof.

Coq, as you may have guessed, is based on defining functions (anonymous or named one), applying them (partially or not) and composing them. Functions must be total (defined for _all_ the domain, e.g., the head function must be defined for an empty and non empty list). A function also has to always terminate. Functions are first class citizens. No side effects and no exceptions are supported. To give you a flavor of the Coq syntax, here are a few examples:

Assuming the type of natural numbers and the + function: Definition add3 (x: nat) := x + 3.
Assuming the type of lists and the concatenation ++ function: Definition at_end (l: list nat)(x: nat) := l++(xx::nil).
Higher order: Definition twice_nat (f: nat -> nat)(x: nat) := f (f x).

Catherine told me that LaTeX can be produced from a proof. Handy to write paper, isn’t it? For people who want to know more about Coq, “Coq in a hurry” is apparently an excellent tutorial (http://cel.archives-ouvertes.fr/docs/00/45/91/39/PDF/coq-hurry.pdf). I read it and understood everything.

Session 14:30 chaired by Juan de Lara

The afternoon technical session ran in parallel with Catherine’s tutorial, so unfortunately I missed the first talk. The second talk was “Systems evolution and software reuse in OOP and AOP” and is presented by Adam Przybylek. The research question pursed in this work is “Is the software built in AOP more maintainable than when written in OOP?”. To answer this question, five maintenance scenarios are solved twice, with AOP and OOP. The number of lines of code is used as a performance metric. As a result, an OOP solution requires less change on the code than with AOP. Stefan Hannenberg worked on a similar topic. His work sparkled an incredible amount of divergence in opinions. I immediately knew that Przybylek will be under some fire.
Christoph Bockisch opened the dance by asking “how are you sure that the aspects you have programmed are done in the right way?”. Other similar questions were raised. Apparently people from the AOP World felt a bit attacked. Dirk concluded “There is a lot more than AspectJ in AOP”. I wished the questions were not so defensive.

Session 16:00 chaired by Manuel Oriol

The final session of the day consisted of four papers, and was chaired by a previous TOOLS Europe programme chair! The first paper is titled “A Generic Solution for Syntax-driven Model Co-evolution” and is presented by Protic, Van Den Brand and Verhoeff. The problem that is addressed in this work is to ensure the evolution of a meta-model if a model evolves. They introduce a special metamodel for metamodels and a bijective transformation for transforming metamodels into models. It is not immediately clear to me why this is not called “instantiation”.

The second paper is “From UML Profiles to EMF Profiles and Beyond”. A UML profile is a lightweight language extension mechanism of the UML. A profile consists of stereotypes. A stereotypes extend base classes and may introduce tagged values. Profiles are applied to UML models. This paper explored how to use EMF profiles (in Eclipse) to support UML profiling concepts and technologies.

The third paper is “Domain-specific profiling”, presented by Jorge Ressia. I coauthored this work. Profiling is the activity of monitoring and analyzing program execution. The motivation is that traditional profilers reason on the stack frames by regularly sampling the method call stack, which is quite unadapted as soon as you have a non-trivial model. Our solution consists of offering profilers that will compute metrics against objects that compose the domain.
Nathaniel Nystrom asked how the profiler maps to the domain? What about “domain specific debugger” was asked by Mark van den Brand? Oh yes! Profiler and Debugger have many commonalities; the domain on which they reason. The model on which traditional code profilers and debuggers reason on is the stack frame: what a pretty poor abstraction for applications made of objects and components. This topic will probably entertain me for the coming years.

Summary of Days 3 and 4

On Day 3, Oscar advertised the Journal of Object Technology (JOT). Oscar has been the Editor-in-Chef of JOT for a little more than one year. JOT offers a transparent reviewing process, accepted submissions are quickly put online, and it has a presence on various social networks. I personally hope JOT will become an ISI journal. This will encourage South-American researchers to submit to JOT. Unfortunately, ISI publications still remain important for emerging countries. Things are changing, but slowly. Our paper TOOLS paper “Domain-Specific Profiling” has been selected in a special issue for JOT.

The Dyla workshop was held on Day 4. Dyla is the workshop on dynamic languages and applications. I am happy to have co-organized it. It was the fifth edition and a sheer crowd attended. Dyla is focussed on demos instead of technical and academic presentation. We had a number of exciting demos, including type systems for dynamic languages, visualizations of program execution, context-oriented programming and meta-object protocols. Dyla will be happen again in 2012.

I hope these two blog posts provide you with something of a flavour of TOOLS Europe in 2011, and I hope to blog from future events!

Comments Off

2011/08/04

TOOLS Europe 2011 – Day 1

Filed under: Conference Report — Tags: Tools europe 2011 keynote — alexbergel @ 17:55

My name is Alexandre Bergel and I’m an assistant professor at the University of Chile; for JOT, I’ll be blogging about TOOLS EUROPE 2011. This is the fourth TOOLS EUROPE I’ve attended. This 49th edition took place in the lovely city of Zurich, in the German-speaking part of Switzerland. This blog entry relates the conference as I lived it.

Opening Session

The conference is opened by Judith Bishop, Antonio Vallecillo (the program co-chairs) and Bertrand Meyer. Observation (and evidence!) suggests that this year’s event has more participants than the previous year. The amphitheater is full, and not many attendees had their laptop in their hand, which is a good sign! Judith and Antonio give us some stats, which you can find elsewhere, but what I found noteworthy was that the acceptance rate was 28%; by comparison, ECOOP’11 has an acceptance rate of 28% and OOPSLA’11 36% (!).

This year’s event contains three novelties: Catherine Dubois will give a tutorial about Coq on Wednesday; a best paper prize will be awarded; and a selection of the papers will appear in a special issue of JOT. I like surprises, especially when they are this good. A tutorial by an expert, at no additional cost, is indeed something one should not miss. The award is also an inexpensive mechanism to advertise the conference: winners are always happy to mention it. The special issue is also a good thing. Unfortunately, JOT is (still) not an ISI journal, which means that is not considered by national science agencies in South America. Oscar, the editor-in-chief, is aware of the importance of being in ISI; I cannot resist to raise this with him each time we meet. Apparently JOT has a nice backlog, meaning that the journal camera ready version of papers is due in December.

Bertrand then tells us something about next year. TOOLS EUROPE 2012 will be the 50th event and will be held in Prague at the end of May. The conference will take place just before ICSE, which will be held in Zurich. This is earlier than usual, but Prague is terrific and I hope the conference as a result attracts lots of great papers.

As much as we like surprises, it’s now time to move on to the even better stuff: the talks!

Keynote – Oscar Nierstrasz

Oscar’s keynote is called “Synchronizing Models and Code”. He considered naming it “Mind the Gap”. However, Oscar is Canadian and lives in Bern, where there is no underground 🙂

Oscar divides his keynote into four parts.

Part 1: “programming is modeling”. This first part is about trying to understand what is modeling in the object-oriented paradigm. Oscar asks what is the “OO Paradigm”? What people traditionally think about OOP is “reuse”, “design”, “encapsulation”, “objects+classes+inheritance”, “programs = objects+messages”, or “everything is an object”. Oscar argues that none of these buzzwords gives an adequate definition, and that the essence of object-orientation lies in the possibility for one to define their own domain specific language. “Design your own paradigm” seems to reflect the truth behind OOP. This reflects a slight change from his previous blog on that topic (Ten Things I Hate About Object-Oriented Programming): Oscar now agrees Object-Orientation is indeed a paradigm. I buy Oscar’s claim that the essence of OOP cannot be easily captured, however I am wondering whether the ability to easily express new paradigms and domain specific abstractions is really what crystalizes OOP. Scheme programmers have a powerful macro mechanism that seems to provide equal capability.

Part 2: The second part of the talk is subtitled “mind the gap”. Here, Oscar is arguing that there is a gap between model and code. This can be easily seen from activity by the reverse engineering community, which spends most of its time making sense of code by providing adequate representations. The modeling community is actively working in the other direction (as can be seen by some of the papers at ICMT’11, which is co-located with TOOLS EUROPE 2011). Oscar presents his argument by clarifying that there are actually many gaps. For example, there is a gap between static and dynamic views: what you see in the source code is classes/inheritance; what you see in the executing software is objects sending messages. I like Oscar’s punch. It’s astonishing to see that code debuggers, code profilers and test coverage tools still base their analysis on classes, methods and stack frames. We profile and debug Java applications the same way as COBOL and C. How can one effectively identify the culprit of a bottleneck or a failure by solely focussing on classes and methods? There is indeed something odd here and Oscar puts his finger on it.

Part 3 of the talk is called “bridging the gaps”. After so many open questions, the natural one to dwell on is: how can be we bridge the gaps between model and code? Oscar’s group has extensive experience in reverse engineering and component-based software engineering. Moose, a platform for software and data analysis (moosetechnology.org) is a solution for bridging these gaps. Oscar spends some time talking about Moose’s features, including an extensible metamodel, metrics and visualisations. Polymetric views are particularly interesting. They’ve had a strong impact on my personal work since most of my research results are visually presented using a polymetric view. To demonstrate some of these ideas, Oscar talks about Mondrian for a while; it’s an agile visualization engine that visualizes any arbitrary data from short scripts written in Smalltalk/Pharo. I am myself the maintainer of Mondrian. I encourage the reader to try it out, people find it mind blowing 🙂

Oscar finishes this part of the talk by summarising a number of language and runtime extensions, including Bifrost (for live feature analysis), Pinocchio (an extensible VM for dynamic languages), Classboxes (which I defined in my PhD), which are the precursor to Changeboxes, which enable forking and merging application versions at runtime.

We now arrive at Part 4: “lessons learned”. What Oscar’s group has learned is that “less is more”, i.e., “keep it simple”. All the experience gained by Oscar’s group is that simple programming languages can often be more successful than complex languages. “Keep it simple” is apparently a favorite lesson learnt of many keynote these days (e.g., Craig Chambers at ECOOP’11, David Ungar at ECOOP’09).

What else has Oscar’s group learned? Visualizations often use simple metrics defined on simple structures. Oscar also argues that currently there are too many different tools to support software engineering that are not connected. I indeed agree with Oscar. I often found opportunities by connecting different mechanisms and tools (e.g., profiling and visualization).

Oscar’s conclusion is that the things we talk about in software engineering are about managing change. We spend too much time recreating links between artifacts. Instead, we should maintain the links between them. To do this, we should manipulate models, not code. How can I disagree? Oscar’s talk is a success and leads to more than 30 minutes of questions. I had the impression that most of them are about software visualization.

Session 1

After coffee (much needed), we moved to the first technical session, which was actually a shared session with ICMT. As such, the presentations were chosen to be of interest to both TOOLS EUROPE and ICMT attendees. This is always a nice part of TOOLS, the cross-pollination of presentations from different co-located events.

Jochen Kuester presents the first paper on “Test Suite Quality for Model Transformation Chains”, focusing on issues to do with testing model transformation chains (and not just single transformations, which are easier) based on specifying models used as input and output. The authors have lots of experience in dealing with workflow models in particular. What is interesting here is how the authors put an emphasis on the quality of the test suite, including element coverage. This seems to be a good move towards a proper testing discipline for model transformations.

The second talk of the session is given by Marco Trudel, and is titled “Automated Translation of Java Source Code to Eiffel”. Their work is driven by (i) the need to use Java libraries within Eiffel and (ii) compile Java applications to native code. Translating Java code into Eiffel has a number of challenges: “continue” and “return” Java instruction, exceptions, Java Native Interface (JNI). Only 2.2 overhead performance (based on running Mauve, the java.io unit tests). There are lots of questions after the talk, many of which seem to be abstractions of the question “why translate Java into Eiffel in the first place?”, even though the presenter addressed this at the start.

The third talk is by Markus Lepper, and it is an ICMT paper but from a group outside of the typical ICMT community (more from the compiler construction community). The paper is “Optimizing of Visitor Performance by Reflection-Based Analysis”. This is a bit outside of my personal research interests, but I understood the following: “Everything is a model. Every computation is a model transformation”.

The fourth and final talk of the session is by Fredrik Seehusen entitled “An evaluation of the Graphical Modeling Framework (GMF) based on the Development of the CORAS tool”. GMF is an Eclipse-based framework for development of graphical language editors. The paper evaluates GMF as a result of building a tool called CORAS, and some interesting empirical observations are made, even though the presenter noted that there were potentially simpler approaches to producing the GMF editor (e.g., using a tool like EuGENia) that would lead to different observations. Nevertheless, this is an interesting presentation showing clear benefits, but I admit this conveys an impression of deja-vu for the Smalltalker that I am!

Session 2

The second session of the day focuses purely on TOOLS EUROPE presentations. The first talk is called “KlaperSuite: an Integrated Model-Driven Environment for Non-Functional Requirements Analysis of Component-Based Systems”. The problem on which the authors are focussing on is how to control the cost of enhancing QoS. Their approach is to use an automatic QoS prediction at design-time. It is based on an automatic model transformation of several input models. This is an interesting presentation of a complicated transformation with real applications.

The second talk is titled “Unifying Subjectivity”, by Langone, Ressia and Nierstrasz. “Subjective” means “based on or include by personal feelings”. Subjectivity is related to ContextL, perspective (from Us), roles (from gBeta). The problem addressed by Langone et al. is that all these related work are based on a common point of view. One of their message is that “context is subjective”.

Session 3

The first day ends with three presentations. First up is “Lifted Java: A minimal Calculus for Translation Polymorphism”. Compared to FeatherweightJava, Lifted Java has base classes, role classes, and a playedBy relation. It also has lifting and lowering to both shrink or widen a class during execution. This is an excellent talk on an excellent paper; it turns out that it has been awarded the best paper prize. I observe that today there are a number of papers about context and sujectivity. This is a clear indication that this is a hot topic, and not only for TOOLS!

There are two more talks in the session, “Location Types for Safe Distributed Object-Oriented Programming” and “Static Dominance Inference”, but I am off discussing ideas with some of the previous presenters. Apologies to the last two speakers; there’s so much good research I can’t keep up!

Comments (1)