From foldl to difference lists

November 9th, 2008

Most declarative languages have lists. Usually this lists are either a nil atom (empty list) or a cons (a couple whose second element is a list). It is easy to see how we can append things to the head of the list.

In my former article we explored lists and foldl. Indeed, it is clear that we need an accumulator because appending at the end of the list is basically not possible (with decent performance). Appending at the end of the list means copying the whole list. Why? Consider how lists are done.

You have the head (say X1) and a pointer to another list. And this other list is built in the very same way. The last element has a pointer to an empty list.

 

A list schema

A list schema

 

If we want to add something at the end (even though we had a magical last :: [E] -> E function which got to the last element of a list in O(1) time) we need to “modify” the last element in the process. Since modifying is not allowed, we need to create another last element whose car (first element) is 3 (following our example) and whose second element (cdr) is a cons(4, []). Of course, at this point we should modify another element in the list. Basically we need to create a copy of the list. That’s O(N).

However, if we bound the cdr of the last element not to an empty list but to an “unbound variable” we could simply add at the end cons(4, Unbound2), binding the Unbound1 to cons(3, Unbound1). Amazing… there are two major drawbacks. 

First, we need logical variables. Which is simply true if we are using Prolog or Oz (sorry, no Erlang this time). In Erlang/Haskell you just can’t do almost anything with unbound variables. In Prolog you can.

The second problem is our magical last function. We can build such a function, though it runs O(N). We can’t do better. That means that appending at the end of a list, even though we left the tail of the list unbound, costs a lot. Well, usually O(N) is good, the point is that this operation is very low level and if we got a lot of O(N) stuff in inner “loops” the algorithm  will suck.

Now, let us get back to the reverse with accumulator.

standard_reverse(Xs, Ys) :- standard_reverse(Xs, [], Ys).
standard_reverse([X|Xs], Acc, Ys) :-
    standard_reverse(Xs, [X| Acc], Ys).
standard_reverse([], Ys, Ys).

Let us change the argument order:

standard_reverse2([X|Xs], Ys, Acc) :-
    standard_reverse2(Xs, Ys, [X| Acc]).
standard_reverse2([], Ys, Ys).

Ok, this change seems unimportant to say the least. Let us do a further modification: instead of using two arguments for the reverted list and for the accumulator, let us use one single argument which includes both. For example Reverted-Acc.

diff_reverse([X|Xs], Ys-Acc) :-
    diff_reverse(Xs, Ys-[X|Acc]).
diff_reverse([], Ys-Ys).

Extremely nice! Here we have difference lists. Difference lists are a subject difficult to master. Their declarative meaning is quite obvious (and I will explain it in the next paragraphs), however their procedural/operational meaning is quite messy.

Basically a difference list is a way to represent an ordinary list. For example the list [1, 2, 3] can be represented by the difference lists [1, 2, 3, a]-[a] or by the difference list [1, 2, 3, a, 4]-[a, 4] or by [1, 2, 3]-[] or in another thousand of different ways, the most interesting of which is [1, 2, 3|X]-X. Notice that the latter unifies which any of the others. The careful reader may have noticed that now we may have a way to get to the last element of the list in order to add stuff in O(1).  But let’s not jump to conclusions.

Any ordinary list L can be represented as L-[], so it’s easy to build a difference list representing an ordinary list. Unfortunately this difference lists is fully specified (if L was).

A classical problem in Prolog is appending two lists. This should be an O(M) operation, where M is the length of the shortest list. In fact in Prolog it is O(N) where N is the length of the first list:

myappend([], Xs, Xs).
myappend([H|T], Xs, [H|Ys]) :-
    myappend(T, Xs, Ys).

Now, let’s benchmark the thing:

?- time(myappend([1, 2, 3], [2, 5], L)).
% 4 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)
L = [1, 2, 3, 2, 5].

?- time(myappend([1, 2, 3, a, b, c], [2, 5], L)).
% 7 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)
L = [1, 2, 3, a, b, c, 2, 5].

However, we can do that in O(1) using difference lists:

append_dl(Xs-Ys, Ys-Zs, Xs-Zs).
?- time(append_dl([1, 2, 3, a, b, c|Xs]-Xs, [2, 5]-[], Z)).
% 1 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)
Xs = [2, 5],
Z = [1, 2, 3, a, b, c, 2, 5]-[].

When applicable, this is very nice. There is not much more on difference lists, still having the ability to add elements at the end of a structure is great. How? Think about it:

push_back(X, L-[X|Xs]).
?- X = [1, 2|Xs]-Xs, time(push_back(3, X)).
% 1 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)
X = [1, 2, 3|_G305]-[3|_G305],
Xs = [3|_G305].

Now we know how to put new elements to the front or to the back of a difference lists. That means we can build a deque.

Declarative Programming, Programming, Prolog | No Comments | Trackback

Accumulators and Foldl in declarative languages

November 8th, 2008

Using accumulators in declarative programming is common. Basically it is possible to convert not tail recursive functions into tail recursive ones using an accumulator.

Consider a very simple procedure: sum numbers in a list. The naive implementation is:

naive_sum([]) ->
    0;
naive_sum([H|T]) ->
    H + naive_sum(T).

This one is not tail recursive. Every time the second clause matches, naive_sum is called and the return value of the recursive call is needed in order to compute the result. The standard way to write this one is:

standard_sum(L) ->
    standard_sum(L, 0).
standard_sum([], Acc) ->
    Acc;
standard_sum([H|T], Acc) ->
    standard_sum(T, Acc+H).

Notice that this way the last operation of each recursive calling clause is the recursive clause. This means that the new call can take the place of the old one. Simple. It’s so simple (and the pattern is so common), that it has a name foldl.

sum(L) ->
    lists:foldl(fun(X, Y) -> X + Y end, 0, L).

Basically, it’s one of the primitives of functional languages. For example, in Haskell, it would be:

mysum lst = foldl (+) 0 lst

However, an Haskell programmer would write. This trasformation is known as eta-reduction.

mysum = foldl (+) 0

In Prolog, the naive and the “standard” sum becomes:

naive_sum([], 0).
naive_sum([H|T], Res) :-
    naive_sum(T, Res1),
    Res is Res1 + H.

standard_sum(L, Res) :-
    standard_sum(L, 0, Res).
standard_sum([], Acc, Acc).
standard_sum([H|T], Acc, Res) :-
    Acc1 is H * Acc,
    standard_sum(T, Acc1, Res).

Since Prolog clauses can’t return a value (because they are not functions, of course!), we need an extra argument which unifies with the result. Besides, in the Prolog version it is quite explicit that naive_sum can’t be tail recursive. The difference between the naive and the tail recursive version is that the first one uses linear (O(N)) space and the latter uses constant (O(1)) space. Both are O(n) in terms of running time.

Another typical example on accumulators use is the “revert list” in Prolog. This is the naive version:

naive_revert([], []).
naive_revert([H|T], L1) :-
    naive_revert(T, T1),
    append(T1, [H], L1). 

This is both not tail recursive and has a worse than expected running time: indeed it is O(N^2) while it should be O(N).
The first solution is using the built-in reverse predicate. It is even the right thing to do in production, but it is not useful to learn. :)

Let’s examine the following code:

standard_reverse(Xs, Ys) :- standard_reverse(Xs, [], Ys).
standard_reverse([X|Xs], Acc, Ys) :-
    standard_reverse(Xs, [X| Acc], Ys).
standard_reverse([], Ys, Ys).

Once again, we got tail recursion (using an accumulator). Basically we read the first element in the first list and we push it in the accumulator. When we take another element this one is pushed again (and the former is after this one). Then, when we finish the list, we unify the accumulator with the last value. This version is O(n) in time and space.

The Erlang version is extremely similar (and slightly more readable):

standard_revert(L) ->
    standard_revert(L, []).
standard_revert([X|Xs], Acc) ->
    standard_revert(Xs, [X|Acc]);
standard_revert([], Acc) ->
    Acc.

We can explore this definition further:

rev(_Op, [], Acc) ->
    Acc;
rev(Op, [H|T], Acc) ->
    rev(Op, T, Op(Acc, H)).
strange_revert(L) ->
    rev(fun(A, B) -> [B|A] end, L, []).

Here, instead of using the [X|Xs] constructor explicitly, we moved it in anonymous function. Notice that we also reverted the parameters. That is to say we don’t have a cons function:

Cons = fun(H, T) -> [H|T] end

but a :

Flipped_Cons = fun(T, H) -> [H|T] end

This is not unexpected. Let’s consider the definition of the foldl function itself:

foldl(_F, Acc, []) ->
    Acc;
foldl(F, Acc, [H|T]) ->
    foldl(F, F(Acc, H), T).

They look very similar. Indeed they have almost the very same shape. We just have to flip a couple of parameters (foldl has the accumulator as the second parameter, not as the third one). We must notice that the foldl above is not akin Erlang foldl. In Haskell the type of foldl is:

Hugs> :type foldl
foldl :: (a -> b -> a) -> a -> [b] -> a

The binary operation has the accumulator as the first parameter (here it has type a), which is the same convention we adopted for the handmade foldl written above (from now on our:foldl). Erlang standard foldl follows another convention, the binary function has the accumulator as the second parameter.

    foldl(Fun, Acc0, List) -> Acc1

        Types  Fun = fun(Elem, AccIn) -> AccOut
               Elem = term()
               Acc0 = Acc1 = AccIn = AccOut = term()
               List = [term()]

If we want to use foldl to build the reverse function we must chose which foldl to use. our:foldl needs the Flip (Flip_Cons) operation, while the standard Erlang foldl is fine with a Cons operation.

foldl_revert(L) ->
    lists:foldl(fun(H,T) -> [H|T] end, [], L).

foldl_revert2(L) ->
    our:foldl(fun(T,H) -> [H|T] end, [], L).

And here we are foldling in Prolog:

foldl(Functor, Acc, [H|T], Foldled) :-
    call(Functor, Acc, H, Acc1),
    foldl(Functor, Acc1, T, Foldled).
foldl(_Functor, Acc, [], Acc).
 
cons(T, H, [H|T]) :- !.

foldl_revert(L, Reverted) :-
    foldl(cons, [], L, Reverted). 

Declarative Programming, Erlang, Functional Programming, Haskell, Programming, Prolog | Comments Off

Extol of Declarativeness

November 6th, 2008

Far too many programmers underestimate the beauty of declarative programming. This is something which cames from the industry: functional languages are traditionally under-employed and perceived as academic choices. In fact a lot of research has been done with Lisp and Prolog (mostly in U.S.A. and Europe respectively) in AI and formal languages fields.

However, declarative languages are very suited for the industry and to solve practical problems. Indeed, one of the most used domain specific languages (SQL) is a declarative language (even though not a turing complete one). Nobody thinks that SQL is academic. It’s just a good language for the task (and the underlying mathematical model).
Indeed, Prolog would be suited for the task as well (and Datalog is a proof), was it not turing complete (indeed, termination is a very good property to have in a query and data manipulation language.

Programming without explicit state is a nice property. A typical example is transactional semantics: in imperative languages implementing transactional behaviour is quite a hard task. Every operation must be logged (or check-pointed or both). Another option is some kind of shadow copy behaviour. Basically you access data through a pointer, which points to some kind of structure. If you want to edit the structure, you make a copy of it, edit the copy and set the pointer to point the copy if and only if every operation in the transaction has success.

 

Transaction Semantics

Transaction Semantics

 

 

All these operations are costly. Basically this kind of behaviour is free in functional languages. You return from a function and if you had no explicit side effects (which is not possible in haskell and easily avoidable inpr other functional languages) you automatically roll back to the state before the function call. This behaviour has some runtime cost, and this is the reason why functional languages are usually quite fast and not extremely fast (especially in certain tasks). However, its both easy to code (in fact you do not have to do anything explicitly) and rather optimized, since it’s done by the system (which should be rather efficient).

On the other hand, some other operations are quite more costly in functional environments. This happens when you often modify structures: that is to say when the algorithm uses on destructive write operations. Sometimes, declarative languages have limited support for “classical” variables: think about Oz cells, Lisp or OCaml. Erlang has ets, dets and mnesia tables. However, these algorithms are usually rather tricky to parallelize. In the most lucky environment some data decomposition comes natural, unfortunately this is not always the case. On the other hand, in declarative environments parallelism is almost free.

In Erlang parallelism is build into the very language core. In Haskell writing concurrent software is extremely easy (or at least not any more difficult than writing non-parallel software). Oz, with its dataflow variables, is also an interesting choice (and presents a concurrency model extremely educative and different from the ugly thread/synchronized Java vision). Basically in Oz variables themselves can be used to let threads communicate.

In many declarative languages variables can be bound or unbound. Unbound variables shall be bound before being used. That is to say that reading an unbound variable is a programming error (however, this does not go unnoticed and is quite different from reading an uninitialized variable/pointer). In Oz, if you have many threads and a thread tries to read an unbound variable, it hangs until someone binds that variable.

This is a nice way to do concurrency: variables are used. They can be used to build barriers, for example. They are a good IPC primitive as well. Since in Oz variables are logical variables (read-only, once bound), there is no need to worry about multiple writes. Nice.

Declarative Programming, Erlang, Functional Programming, Haskell, Open Source and Free Software, Programming | No Comments | Trackback

Do not program defensively. (Sure?)

November 1st, 2008

How it is true that pragmatics matters in programming! That is to say, best practices in a given environment/platform may be bad practices in another. For example the following lines are quoted from Erlang’s Programming Rules as recommended from Ericsson:

A defensive program is one where the programmer does not “trust” the input data to the part of the system they are programming. In general one should not test input data to functions for correctness. Most of the code in the system should be written with the assumption that the input data to the function in question is correct. Only a small part of the code should actually perform any checking of the data. This is usually done when data “enters” the system for the first time, once data has been checked as it enters the system it should thereafter be assumed correct. 

There is a whole generation (more likely a couple or more) of C programmers grown with the idea of defensive programming. Which may be a good idea, indeed, especially considering the quite large bug number of C programs. Java programmers more or less share the same idea (further misled by a wrong idea of typing, but that’s entirely another story). 

Here, we find that defensive programming not only is not considered a good practice: it’s even considered a bad practice. Indeed, the Unix operating system, about 30 years ago used a similar idea: less code to manage erroneous conditions (in comparison with Multics). Kernel Panic and reboot. Of course there is quite a difference between “not recovering from errors” and “not doing defensive programming”.

The point is that well written code can’t be put in a “wrong” state by a wrong input. At least it should not. A typical example is a buffer overflow: as far as I’m concerned, a legitimate behaviour is a core dump (as long as the dump is not user accessible, of course!). The wrong behaviour is corrupting memory (which is C behaviour out of the box).

Back to Erlang, the example code of the section is:

%% Args: Option is all|normal
get_server_usage_info(Option, AsciiPid) ->
    Pid = list_to_pid(AsciiPid),
    case Option of
        all -> get_all_info(Pid);
        normal -> get_normal_info(Pid)
    end.

Which is quite explicative: if Options is neither all nor normal, Erlang “crashes”, it does not do something implementation dependent or lets the user crack the process. Besides, Erlang was created from real needs from the industry, thus is a practical language suited to solve practical problems (and is also a beautiful functional language).

I want to stress the industrial nature of Erlang to show that the advise does not come from academics or inexperienced programmers unaware of “real world” needs. Maybe there is still hope.

Declarative Programming, Erlang, Functional Programming, Programming | No Comments | Trackback

The Y combinator in Python

October 29th, 2008

I’ve been asked how would the Y combinator look like in Python (see comments of this post). In that post the Y combinator is also explained detailedly (and may be an interesting reading).

Python is an imperative language, however, its functions are first class citizens and can be manipulated in a quite functional way. We have closures, anonymous functions and recursion. That’s all we need to build the Y combinator in Python.

Moreover, the Y combinator in Python looks like a decorator. Quite unfortunately, using recursion in Python is usually a poor choice. Python has a rather low limit on recursive calls (and even if you can rise it, it’s not good programming practice).
In Python function calls are quite expensive and there is no tail recursion optimization.

Indeed, even though Python is great in manipulating functions (and is something I use relatively often), recursion should not the preferred method to loop. It’s much better to use iterators/generators in for loops ore as a second (usually inferior) choice, while loops.

However, this is the implementation of the Y combinator in Python (along with the traditional factorial example):

    def y(f):
        def g(k):
            return f(lambda a: (k(k))(a))
        return g(g)

    @y
    def fact(h):
        def fact_aux(n):
            if n == 0:
                return 1
            else:
                return n * h(n-1)
        return fact_aux

    print fact(5)

Declarative Programming, Functional Programming, Programming, Python | 2 Comments | Trackback

Anonymous Recursive Erlang Functions with and without the Y combinator

October 24th, 2008

In Erlang creating recursive anonymous recursive functions is not trivial. Basically the point is that the anonymous function has no name (gee…), thus you can’t reference a function inside it’s own definition. Which makes sense, by the way. In Haskell, for example, this problem does not exist. You can write:

main =
    let fact 0 = 1
         fact n = n * fact (n-1)
    in show $ mainfact 5

and everything works as expected. If we want to write this in Erlang we can’t just write:

main() ->
    fact = fun(0) -> 1;
              (N) -> N * fact(N-1)
           end,
    io:format("~p~n", [fact(5)]).

because fact is not bound when we examine the anonymous function body. The solution is dramatically trivial.

First think about this:

X = fun(X) -> X(X) end,
X(X).

Ok: you got it, it loops forever (Ctrl-C is your friend). Basically you forever call X.
However, it’s not hard to see how we can implement some kind of counter:

-module(recursive).
-compile(export_all).

main() ->
    F = fun(0, F) -> io:fwrite("Stop.\n");
           (N, F) -> io:format("~p~n", [N]),
                     F(N-1, F)
        end,
    F(3, F).

We just did recursion with Erlang anonymous functions. If you, like me, just wanted a solution to this problem, you may stop reading here. Now we will get to the Y combinator solution extending this way of thinking and then will present the Y combinator from a theoretical point of view.

We’ve got a function with 2 parameters, one of which is just a way to carry around the function itself. What happens if we decide to curry the function? What does currying mean? From wikipedia:

In computer science, currying, invented by Moses Schönfinkel and Gottlob Frege, and independently by Haskell Curry, is the technique of transforming a function that takes multiple arguments (or more accurately an n-tuple as argument) in such a way as it can be called as a chain of functions each with a single argument.

Of course, you want to reverse the parameters so that we have:

main2() ->
    F = fun(F, 0) ->
        io:fwrite("Stop.\n");
       (F, N) ->
        io:format("~p~n", [N]),
        F(F, N-1)
    end,
    F(F, 3).

Then the idea is that:
f: A x B → C
becomes:
f: A → (B → C)
so that the return value of f(x) is a function h: B → C. Thus f(x, y) becomes (f(x))(y).

Ok, nothing new under the sun. Haskell programmers curry and uncurry functions all the time. You can do that even in Python! Well, kind of.

main3() ->
    F = fun(F) ->
        fun(0) -> io:fwrite("Stop.\n");
           (N) -> io:format("~p~n", [N]), (F(F))(N-1)
        end
    end,
    (F(F))(3).

Don’t you find that using (F(F))(x) all the time sucks? I do. The first thing I would do is to define:

g(F, N) ->
    (F(F))(N).

main4() ->
    F = fun(F) ->
        fun(0) -> io:fwrite("Stop.\n");
           (N) -> io:format("~p~n", [N]), g(F, N-1)
        end
    end,
    g(F, 3).

Well, quite more readable (as soon as you find some fancy name for g). However, this approach has at least one major drawback: we can use g to obtain some number, but functional programming is about function manipulation. Being able to build a recursive function from scratch is out ultimate goal.

The idea is that we should “inject” g behaviour into “F”. That is to say we should put into F something that has g behaviour as well. A key idea could be currying g:

g2(H) ->
    fun(N)->
        (H(H))(N)
    end.

main5() ->
    F = fun(H) ->
        fun(0) -> io:fwrite("Stop.\n");
           (N) -> io:format("~p~n", [N]), (g2(H))(N-1)
        end
    end,
    (g2(F))(3).

We are getting nearer to the solution. In fact g2(F) is almost the function we are looking for.
Unfortunately it is not the right candidate to be passed to F as well.

Basically we want to write :

fun(H) ->
    fun(0) -> io:fwrite("Stop.\n");
       (N) -> io:format("~p~n", [N]),  H(N-1)
    end
end

and H should “do the right thing”. Moreover, we want to build H from out anonymous function in a fashion similar to what we got with g2(F). The idea of g2 is (using anonymous functions):

main6() ->
    F = fun(H) ->
		fun(0) -> io:fwrite("Stop.\n");
		   (N) -> io:format("~p~n", [N]),
			  ((fun(K) -> fun(A) -> (K(K))(A) end end)(H))(N-1)
		end
	end,
    ((fun(K) -> fun(A) -> (K(K))(A) end end)(F))(3).

this is done “factoring out” (fun(K) -> fun(A) -> (K(K))(A) end end). We leave H in its place and pass (fun(K) -> fun(A) -> (K(K))(A) end end) from the caller. Code is easier than the explanation.

main7() ->
    F = fun(H) ->
		fun(0) -> io:fwrite("Stop.\n");
		   (N) -> io:format("~p~n", [N]), H(N-1)
		end
	end,
    ((fun(K) -> F(fun(A) -> (K(K))(A) end) end)
     (fun(K) -> F(fun(A) -> (K(K))(A) end) end))(3).

Now things are quite messy and you have almost as many parentheses as if you were coding in Lisp. However, this way we were able to put the recursion logic out of F. And we are almost done.
Let’s give the (fun(K) -> F(fun(A) -> (K(K))(A) end) end) a name (G) and rewrite the messy code:

main8() ->
    F = fun(H) ->
		fun(0) -> io:fwrite("Stop.\n");
		   (N) -> io:format("~p~n", [N]), H(N-1)
		end
	end,
    G = (fun(K) -> F(fun(A) -> (K(K))(A) end) end),
    (G(G))(3).

We notice that:

  1. iteration logic is not in F
  2. G does not substantially depend from how F is done.
  3. We can define a function that takes F and using G returns the G(G) function
y(F) ->
    G = (fun(K) -> F(fun(A) -> (K(K))(A) end) end),
    G(G).

main9() ->
    F = fun(H) ->
		fun(0) -> io:fwrite("Stop.\n");
		   (N) -> io:format("~p~n", [N]), H(N-1)
		end
	end,
    Recursive_F = y(F),
    Recursive_F(3),
    Recursive_F(7).

Well… the function we called y is the Erlang version of the Y combinator. Basically Y can be used to make recursive any erlang function with one argument. Besides, there are versions which can deal with functions with more formal parameters. However, the construction is quite the same as the one I presented for this version.

Declarative Programming, Erlang, Functional Programming, Haskell, Programming | 2 Comments | Trackback

CMake (Part 2 - External Libraries)

October 21st, 2008

After the introductory article on CMake, it is time to tackle with libraries, one of the first problems not easily solvable with plain Makefiles.

While quite often these libraries are installed in default places (which means no need for -L and -I options), sometimes they are not (for example in a given system these libraries may be placed in /opt), for example because developers may want to link against development versions built with debugging symbols.

Moreover, there can be version mismatches, and even though both this and the former problem may be solved if the library comes with some *-config program (which answers question about the library), some don’t.

Generally speaking, things were even worse before Linux relegated most proprietary unices to smaller market shares. Still, even today, something like the autotools is needed to solve dependency problems swiftly.

Luckily enough, CMake is even easier to do. CMake comes with a lot of pre-packaged “finder” scripts which are able to detect installation places for libraries and set appropriate environment variables.

Suppose you are writing C++ software, and you are wisely using Boost. Then you want to include Boost headers and perhaps link against some boost libraries (most Boost packages are header only, since rely heavily on templates and extern templates are yet somewhat experimental).


FIND_PACKAGE(BOOST REQUIRED)

That’s it. Variables ${BOOST_INCLUDE_PATH} and ${BOOST_LIBRARIES} are set. The next step is to tell cmake that some headers are in ${BOOST_INCLUDE_PATH} and some libraries we may link are in ${BOOST_LIBRARIES}. This is accomplished through the INCLUDE_DIRECTORIES and LINK_LIBRARIES commands:


INCLUDE_DIRECTORIES(${CMAKE_SOURCE_DIR} ${BOOST_INCLUDE_PATH})
LINK_LIBRARIES(${BOOST_LIBRARIES})

The REQUIRED keyword means that cmake should not proceed if the library is not found. Suppose Boost is an optional dependency; for example you only use shared_ptrs, which are in tr:: as well. Thus, if you don’t find boost, you simply use tr:: and the build can go on. Then the command is:


FIND_PACKAGE(BOOST) # (without REQUIRED)

The ${BOOST_FOUND} variable is true if and only if Boost was found. You can test with the IF command:


IF(BOOST_FOUND)
# do something with boost
ELSE(BOOST_FOUND)
# do something without boost
ENDIF(BOOST_FOUND)

Notice that cmake requires the condition to be repeated in ELSE and ENDIF clauses to improve readability. This can be disabled. However, I quite like it and I prefer to leave the default on.

C and C++, Programming | No Comments | Trackback

Python considered harmful?

October 17th, 2008

Yesterday morning I started writing a simple tool which should compile all C++ files in a given directory tree. Ok, simple enough, it’s just a matter of a simple os.walk and some subprocess.call.

After that, I thought that it would be nice if the tool did something more. For example, I could write down a function that was able to “make clean” the executables I built. This is simple as well. You just have to be smart since on *nix executables do not have extensions. And I could make the system Windows friendly. It was a matter of a couple of functions more.

However, at that point I needed some kind of interface. A simple command line tool was nice. But… well, I remembered using a Prolog tool which did some heuristic to get wrong commands right (zsh does this as well)… so, why not? A levenshtein distance + some more checks could do. And it did. Nice.

But… what if I just want to compile one single file? Well, I had already most of the infrastructure. And why not letting me specify a single file to clean? As easy as breathing. Done. By the way, the system at that point supported building and cleaning object files as well.

And I was already wondering that I left out C files. After all I needed to process C++ files, but the tool would be surely more useful if I could use it with C files too. And why not Pascal? Ok, I have the right design which could support kind of configuration to map file types to compilers… and I could parametrize compiler options as well. Somewhere in my mind an automated compiler detection system was already lurking.

I refactored the tool. It became a package, with a lot of useful functions… But well… why rebuilding all the sources? I need only to rebuild executables which are older than their source file. This is easy, a couple of stat…

At that point I realized that if only I added some kind of file dependency discovery I would have basically reinvented make. That is to say I was reinventing a square wheel. One of these days I’ll get the very first script and modify so that instead of compiling files generates a CMakeLists.txt, calls cmake and then Make.

The end of the story is developing in Python is too fast. :P

BTW: the author is not suggesting Python should not be used, just the opposite!

Open Source and Free Software, Programming, Python | 9 Comments | Trackback

CMake (Part 1)

October 15th, 2008

autoreconf, autoconf, automake, libtool… We are talking about the autotools. Everybody (almost) uses autotools, and it seems that everybody has to use them for good.

Luckily enough, this is not the case. There are alternatives. There are many alternatives. scons and cmake just to name two of the most used.

The problem the autotools solve is not an easy one. Unfortunately the solution is unnecessarily complex. Using autotools for simple projects is just a mess and often their full power is not needed. Consequently I looked for alternatives.

Even though I’m quite a Python enthusiast, I chose to learn cmake first (over scons).

Basically a cmake build script can be as simple as:

project(project_name)
add_executable(foo foo.c bar.h bar.c)

And building a library (shared or static) is not any more difficult:
project(mylib)
add_library(mylib SHARED foo.c bar.h bar.c)

Put the lines above in a CMakeLists.txt and run cmake. You can generate unix makefiles, code::blocks projects, xcode projects, visual studio projects, etc.

Nice. Moreover, writing a CMakeLists.txt is quite easier than writing the corresponding Makefile (and the repetitive stages of writing install and clean targets is automated as well).

What about more detailed instructions? Ok, ok, you are right. So…

Suppose you have this project (directories are followed by ‘:’, while files… well, you should recognize them ;) ):

foo:
    bar:
        bar.c
        bar.h
        barbar.c
        barbar.h
    foo.c
    opts.h
    opts.h

If we run ls -R, we get:

% ls -R
CMakeLists.txt bar            foo.c          opts.c         opts.h

./bar:
CMakeLists.txt bar.c          bar.h          barbar.c       barbar.h

Actually you want to create a libbar shared library and the main foo executable (which links libbar).

% cat foo.c
#include <bar/bar.h>
#include <bar/barbar.h>
#include "opts.h"

int main () {
    bar(1);
    barbar();
    opts("ciao");

    return 0;
}

Notice that <bar/bar.h> and <bar/barbar.h> are not local includes. Now we write the CMakeLists.txt for the foo directory. We start examining the main CMakeLists.txt:

% cat CMakeLists.txt
PROJECT(foo)
CMAKE_MINIMUM_REQUIRED(VERSION 2.6)

# bar must be processed
ADD_SUBDIRECTORY(bar)

# sets variable SOURCES to the project source files.
SET(SOURCES foo.c opts.c opts.h)

# puts the bar directory in the headers search path
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})

# puts the bar directory in the dynamic libraries search path
LINK_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR}/bar)

 

# links with libbar
LINK_LIBRARIES(bar)
# creates the executable
ADD_EXECUTABLE(foo ${SOURCES})

I added comments in order to explain statements as you read them. Basically the ADD_SUBDIRECTORY command tells cmake to look for another CMakeLists.txt in the specified directory. Then we set the SOURCES variable. SET sets variables. If you need the value of a variable ${VARIABLE} is the way to get it (remember Make?).

${CMAKE_CURRENT_SOURCE_DIR} and ${CMAKE_CURRENT_BINARY_DIR} are variables set by cmake in order to simplify configuration.

Another useful command is MESSAGE (a ‘print’ command which can be used for debugging purposes).

Now, it is time for the CMakeLists.txt in bar:
% cat bar/CMakeLists.txt
PROJECT(libbar)

SET(SOURCES bar.c bar.h barbar.c barbar.h)
ADD_LIBRARY(bar SHARED ${SOURCES})

We are done!


% cmake .
– The C compiler identification is GNU
– The CXX compiler identification is GNU
– Check for working C compiler: /usr/bin/gcc
– Check for working C compiler: /usr/bin/gcc — works
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working CXX compiler: /usr/bin/c++
– Check for working CXX compiler: /usr/bin/c++ — works
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
/Users/riko/src/foo/bar
– Configuring done
– Generating done
– Build files have been written to: /Users/riko/src/foo


% make
Scanning dependencies of target bar
[ 25%] Building C object bar/CMakeFiles/bar.dir/bar.c.o
[ 50%] Building C object bar/CMakeFiles/bar.dir/barbar.c.o
Linking C shared library libbar.dylib
[ 50%] Built target bar
Scanning dependencies of target foo
[ 75%] Building C object CMakeFiles/foo.dir/foo.c.o
[100%] Building C object CMakeFiles/foo.dir/opts.c.o
Linking C executable foo
[100%] Built target foo

 

More info on CMake can be found in these posts:

  1. CMake and Libraries

C and C++, Programming | 2 Comments | Trackback

ICLP 08

October 13th, 2008

Tomorrow I registered to the ICLP 08 conference.
I’m looking forward to go!

Prolog | No Comments | Trackback

Recent Posts

Blogroll

Siti amici

Misc

Recent Comments

Categories

Enrico Franchi graduated in Maths and Computer Science and is now studying for a Computet Science MSc (though because of italian bureaucracy that very course is to be cancelled).

RiK0's Tech Temple is using WP-Gravatar