Il più lungo viaggio della mia vita.

Posted in Vita reale on August 26th, 2006 by bisbiglio
25/08/2006 h: 12.30

Sono in macchina e, presa dalla mia solita, sfuggevole, ispirazione mi sono affrettata, con penna, a cercare una "superficie" su cui scrivere...
al posto di una paginetta bianca, del mio ridicolo Harmony, mi sono ritrovata a vomitare un frammenticino della mia anima su di un quadernetto verde a quadri.

Mi prende dentro un sentire strano.
I pensieri che scorrono,
il tutto che passa..

Io mi fermo.

h: 15:52

E' cominciato a piovere..
e nonostante tutto, il tempo scorre inesorabile.

E si fa beffe di me.

h: 15:58

Non mi piace scrivere passo passo.
Si perde il piacere della malinconia, e il sapore agrodolce del ricordo di una gioia perduta.

h: 19:37

A volte ho un moto di insofferenza nei confronti del genere umano.
O..dovrei dire animale?


Links for 2006-08-25 [del.icio.us]

Posted in Vita reale on August 26th, 2006 by kill-9.it

evaluating-languages

Posted in Vita reale on August 26th, 2006 by Enrico's pages

Evaluating programming languages for playing with Debtags

Since having workable bindings for the C++ Debtags libraries seems to be still a bit in the future, I'm planning to build a bit of native infrastructure in some higher level language. First step is seeing what language I could start playing with.

The problem

At the most basic level, in Debtags we have a number of packages, each of which have a set of tags.

The way I usually save tags is a file with the format:

package1, package2: tag1, tag2, tag3
package3: tag1, tag2

That is, every line has a list of packages with the same tags, and the list of their tags.

Since any script I'm going to write has to at least be able to parse the data into something like a package -> tags hash, then print it out.

Let's see how perl, python and ruby perform.

Tests

C++

The reference point for the experiment will be the C++ implementation, tagcoll:

$ time tagcoll copy package-tags > /dev/null
real    0m0.421s
user    0m0.412s
sys     0m0.000s

Perl

First attempt is with Perl, creating the script that parses into a hash of package => set of tags and prints the result.

There are set modules for Perl on CPAN, but I have none handy at the moment. However, since they are implemented using hashes, I can approximate them by using a hash.

Note that I also want to have a different copy of the tag set for every package, so that I can manipulate them in the future without unwanted side effects.

Here is the code:

#!/usr/bin/perl -w

use strict;

my %db;

# Read the tag database
while (<>)
{
    chop();
    my ($pkgs, $tags) = split(': ');
    # Create the tagset using keys of a hash
    my %tags = map { $_ => undef } split(', ', $tags);
    for my $p (split(', ', $pkgs))
    {
        # Make a copy of the tagset
        $db{$p} = {%tags};
    }
}

# Write the tag database
while (my ($pkg, $tags) = each %db)
{
    print $pkg, join(', ', keys %$tags), "\n";
}

Here is the running time:

$ time ./parse.pl package-tags > /dev/null
real    0m0.448s
user    0m0.436s
sys     0m0.008s

Not so bad, comparable with tagcoll.

Python

Then comes Python. I'm not much of a Python fancier, but I'm rather attracted by the new set native type introduced with Python 2.4, which seems to have most of what I need nice and done.

Here is the script:

#!/usr/bin/python

import sys

input = sys.stdin
if len(sys.argv) > 1:
    input = open(sys.argv[1],"r")

# Read the tag database
db = {}
for line in input:
    # Is there a way to remove the last character of a line that does not
    # make a copy of the entire line?
    line = line.rstrip("\n")
    pkgs, tags = line.split(": ")
    # Create the tag set using the native set
    tags = set(tags.split(", "))
    for p in pkgs.split(", "):
        db[p] = tags.copy()

# Write the tag database
for pkg, tags in db.items():
    # Using % here seems awkward to me, but if I use calls to
    # sys.stdout.write it becomes a bit slower
    print "%s:" % (pkg), ", ".join(tags)

Here is the running time:

$ time ./parse.py  package-tags  > /dev/null
real    0m0.418s
user    0m0.376s
sys     0m0.036s

I'm pleased, very pleased. Using the native set seems to be not only handy, but efficient.

Ruby

Finally, Ruby. I like to use Ruby. In this case, however, it lacks a native set implementation, although it has a set module which is implemented using a hash.

Here is the script:

#!/usr/bin/ruby

require 'set'

infile = ARGV[0] ? File.new(ARGV[0]) : $stdin

# Read the tag database
db = {}
infile.each_line do |line|
    line.chop()
    pkgs, tags = line.split(": ")
    # Create the set using the Set module
    tags = Set.new(tags.split(", "))
    pkgs.split(", ").each do |p|
        # Is this a copy or a reference?  I need to find out.
        db[p] = tags
    end
end

# Write the tag database
db.each do |key, tags|
    # Ouch, Set does not do join by itself
    print key, ": ", tags.to_a.join(", ")
end

Here is the running time:

$ time ./parse.rb package-tags > /dev/null
real    0m1.637s
user    0m1.572s
sys     0m0.052s

I hope I got something wrong in the script, but I can't see what.

Results

As much as I don't fancy Python, it looks like it's currently the best choice for playing around with Debtags. I hope the native sets will bring me joy.

If in the future I'll be asked "how come you chose Python for this Debtags thing?", I can point to this page.

autostraBe

Posted in Vita reale on August 25th, 2006 by shammash
Oggi, arrivato al casello per pagare il pedaggio,
sono finito nella coda piu' veloce.. strano!

... forse solo perche' ero in anticipo ...

tagging-intro-notes

Posted in Vita reale on August 25th, 2006 by Enrico's pages

Introductory notes about tagging

A group of nice Swiss researchers are doing some research on Debian, and as part of their research they are about to tag around 400 libraries using Debtags. They feel that being able to distinguish projects by categories will help us to improve their analysis of component (library) reuse.

They asked me if I had useful Debtags information for them besides the website and the Debconf5 paper, so I did some research and I share it here.

The list archives have various useful posts:

Also, the vocabulary itself has short and long comments for each tag, and the long comments sometimes have useful instructions.

In order to contribute properly reviewed tags, I suggested to start by posting tag patches on the mailing list: after we discussed some of them and we trust each others, I'll be happy to give commit access to the svn repository.

To create a tag patch, one can use tagcoll:

svn cat svn://svn.debian.org/debtags/tagdb/tags > tags
cp tags tags.edited
[...edit tags.edited...]
tagcoll diff tags tags.edited

Note that there are probably many new tags in the not-yet-revied database, so one may want to proceed this way:

  1. svn cat svn://svn.debian.org/debtags/tagdb/tags > tags
  2. remove from tags all the lines corresponding to the packages you're not interested in
  3. wget http://debtags.alioth.debian.org/tags/tags-current.gz
  4. remove from tags-current all the lines corresponding to the packages you're not interested in
  5. tagcoll diff tags tags-current > changes
  6. edit changes removing those changes that make no sense you can review the edits you made to the changes patch this way: svn cat svn://svn.debian.org/debtags/tagdb/tags | tagcoll --patch-from=changes.orig > tmp1 svn cat svn://svn.debian.org/debtags/tagdb/tags | tagcoll --patch-from=changes.edited > tmp2 tagcoll diff tmp1 tmp2
  7. apply the reviewed patch to the svn repository: svn cat svn://svn.debian.org/debtags/tagdb/tags | tagcoll --patch-from=changes copy > tags-patched
  8. work from there

Everyone is of course free to use the list for any questions.

Links for 2006-08-24 [del.icio.us]

Posted in Vita reale on August 25th, 2006 by kill-9.it

Man on the Moon

Posted in Vita reale on August 24th, 2006 by annak

Jim Carrey magistrale

Man on the Moon