Automating LibreOffice using Perl and UNO

I've recently taken an interest in macros and automation of office software. While at work I contented myself with writing Excel macros in Visual Basic, but for personal use I was more intrigued at the possibility of combining my favourite programming language, Perl, with my office suite of choice, LibreOffice. Online searching turned up some tutorials about automating LibreOffice with Python and Excel with Perl, and also, more importantly, a Perl module that can indeed interface with LibreOffice. Frustratingly though, the docs for that module seem to consist of a single example and a rather unhelpful suggestion to go read the API docs. Nevertheless, I decided to try it out, and now I hope that by documenting my experience I may be of help to someone else who decides to try this.

I began by getting the module from CPAN, which can be done with a simple:

cpan install OpenOffice::UNO

In order to compile it I needed to have the ODK installed. I have a Linux from Scratch system, so I had compiled and installed LibreOffice as per these instructions. Of course, as noted there, I needed to remove the "--disable-odk" switch to install the LibreOffice SDK! Once that was done, I tried to run the single example. I added the option

"-accept=socket,host=localhost,port=8100;urp;StarOffice.ServiceManager"

to my desktop file for launching LibreOffice so that I wouldn't have to launch it from the command line. I then saved the rest of the example code as a plx file and ran it… and got an error: "Binary URP bridge disposed during call.". I thought my explorations had come to an untimely end, but a bit more research revealed that this is a problem that Pythonistas also face, and that the solution is simple: prior to running the script, one needs to set an environment variable:

export URE_BOOTSTRAP=vnd.sun.star.pathname:/opt/libreoffice-5.4.0.3/lib/libreoffice/program/fundamentalrc

Note that "/opt/libreoffice-5.4.0.3" is specific to my installation; for many it may be /usr or /usr/local, or else something else in /opt. To make this easier to remember, I added an alias in my ~/.bashrc:

alias odkurebootstrap='export URE_BOOTSTRAP=vnd.sun.star.pathname:/opt/libreoffice-5.4.0.3/lib/libreoffice/program/fundamentalrc'

So now I just have to type "odkurebootstrap" before running a Perl or Python script for driving LibreOffice. (Better yet, one could simply put the "export" line directly into the bashrc, and then not have to type anything.)

After that, I tweaked the example to create a new document rather than open an existing one. This involved changing the line

$sdoc = $desktop->loadComponentFromURL("file:///home/jrandom/test1.sxw",
                                       "_blank", 0, [$pv]);

to:

$sdoc = $desktop->loadComponentFromURL("private:factory/swriter",
                                       "_blank", 0, [$pv]);

To figure out how to do this, I needed to wade through Java examples on the Apache OpenOffice wiki, even though I'm using LibreOffice, which gives you some idea of how painfully undocumented this whole thing is.

Right, so I was able to open a document and/or create a new one. How to save it? Well, eventually I found someone who had forked this Perl module and filled their Git repository with some more helpful examples. I took a look at savedoc.pl and it seemed all right, except for the "filter" bit:

$pv2->Name("FilterName");
$pv2->Value("swriter: StarOffice XML (Writer)");

Evidently, I'm not working with StarOffice, and I'm using ODT files, not SXWs. At first I tried using the literal name of the filter from the "Save As" dialog in my LibreOffice, which (since I'm using the Irish localization) was "Téacs ODF (.odt)". As expected, this didn't work. My next idea was to try recording a macro in LibreOffice in which I saved a document. Bingo! Here's an excerpt from the macro:

args1(1).Name = "FilterName"
args1(1).Value = "writer8"

dispatcher.executeDispatch(document, ".uno:SaveAs", "", 0, args1())

So apparently the filter in LibreOffice for ODT files is just "writer8". Fair enough!

This macro excerpt brings up another issue: the use of the "dispatcher". While writing a Perl script (or indeed a Python script or Java program) one will tend to use directly the objects provided by LibreOffice and their methods. However, these macros recorded in Basic instead use a dispatcher to do everything. The reason for this seems to be that the macro recorder just follows menu clicks and whatnot, which use "dispatches" to trigger the document's "controller" (i.e. containing window) to do things to the document. It disregards what the controller actually does in response to these dispatches, which makes it much less sophisticated than Microsoft's macro recorder. Presumably it's also much easier to maintain, which is understandable. Either way, this means that the LibreOffice macro recorder is considerably less helpful than Microsoft's would be for suggesting how to go about automating the software. Oh well!

Another thing I noticed was that a lot of functions seemed to take references to arrays of "com.sun.star.beans.PropertyValue" objects. The two examples I mentioned build up these arrays "by hand" (as do recorded macros, although it's slightly simpler in Basic) but I felt that this was a bit of a waste of space. Surely it would make much more sense to pass these property-value pairs as hashes. Therefore, I decided to create a function to convert a hash to one of these array references, and another to reverse this conversion:

sub createUnoArgSet {
        # Takes UNO object, plus a hash (or reference thereto)
        # and returns a reference to an array of PropertyValues.
        my $uno = shift;
        die "createUnoArgSet called without reference to UNO object!"
                unless ref($uno) eq "OpenOffice::UNO";

        my %inputhash;
        my @outputlist;

        if (scalar(@_) == 1) {
                # Got a scalar - must be a reference!
                %inputhash = %{$_[0]};
        } else {
                # Got more than one value - assume a hash.
                %inputhash = @_;
        }

        for my $arg (keys %inputhash) {
                my $pv = $uno->createIdlStruct("com.sun.star.beans.PropertyValue");
                $pv->Name($arg);
                $pv->Value($inputhash{$arg});
                push @outputlist, $pv;
        }

        return \@outputlist;
}

sub unwrapUnoArgSet {
        # Takes a reference to an array of PropertyValues
        # and returns a reference to a hash.
        my @pvset = @{$_[0]};
        my %outputhash;

        for my $pv (@pvset) {
                $outputhash{$pv->Name()} = $pv->Value();
        }

        return \%outputhash;
}

I was concerned that the lack of ordering in Perl hashes could cause problems, but when I tested it it seemed to work fine. The other problem is that "createUnoArgSet" needs to take the $uno object (created at the start of the program) as its first argument, before the hash. The "die" line wasn't always there, resulting in a lot of head-scratching situations in which I had forgotten to pass this object! Perhaps one could instead get away with creating a new OpenOffice::UNO object each time the function is called, but I didn't try this.

Anyway, at this point, I felt ready to try actually making something. What I had in mind was a script to produce a pretty "trace report". That is, something that would take in the output of an strace command, such as

strace -v -f -s300 -y -yy -q -tt -o example.txt someprogram

and tabulate it in a document. I also wanted to intersperse this report with screenshots. That is, it could search in a defined place for image files, compare their creation time with the timestamps in the trace output, and then insert them between tables in the report. So, I wanted a program that could do this:

tracereport.plx TraceReport.odt ~/TraceScreens example.txt

Here "TraceReport.odt" is the name of the output report file, "~/TraceScreens" is an optional directory containing the screenshots, and "example.txt" is the actual output from the strace command given above.

Some quick experimentation revealed that ODT output would not be suitable due to the limitation on text table sizes, so I decided to make a script to produce ODS spreadsheets instead. After days of trawling through the API docs (which at the time were for LibreOffice 5.3, but now appear to have been bumped to LibreOffice 6.0!), as well as the Developer section of the Apache OpenOffice wiki, I came up with the non-image part of the report generator:

#!/usr/bin/perl
#tracereport.plx
# An attempt to take strace output and put it into a spreadsheet using LibreOffice Calc...
use warnings;
use strict;
use OpenOffice::UNO;

# Check if we have a spreadsheet name.
if (scalar(@ARGV) < 1) { die("You need to specify a spreadsheet path (absolute!) as an argument!\n"); } my $docname = shift @ARGV; # connect to the OpenOffice.org server my $uno = OpenOffice::UNO->new;
my $cxt = $uno->createInitialComponentContext;
my $sm  = $cxt->getServiceManager;
my $resolver = $sm->createInstanceWithContext
                ("com.sun.star.bridge.UnoUrlResolver", $cxt);
my $rsm = $resolver->resolve
    ("uno:socket,host=localhost,port=8100;urp;StarOffice.ServiceManager");

# get an instance of the Desktop service
my $rc = $rsm->getPropertyValue("DefaultContext");
my $desktop = $rsm->createInstanceWithContext("com.sun.star.frame.Desktop", $rc);
# and of the Dispatcher service
#my $dispatcher = $rsm->createInstanceWithContext("com.sun.star.frame.DispatchHelper", $rc);

# create a name/value pair to be used in opening the spreadsheet
my $open_args = createUnoArgSet($uno, {Hidden => OpenOffice::UNO::Boolean->new(1)});

# open a spreadsheet
my $sdoc = $desktop->loadComponentFromURL("private:factory/scalc",
                                       "_blank", 0, $open_args);

# Start recording trace info.
my @trace_output_lines;
my @unfinished_output_lines;
my $mostparams = 0;

while(my $line = <>) {
    # First extract the PID and timestamp.
    $line =~ s/^(\d*)\s*([\d:]*)\s*//;
    my $pid = $1;
    my $timestamp = $2;

    # Next figure out if it's unfinished.
    if($line =~ s/\s*$//) {
        # It's an unfinished line. Record it, along with its call and PID.
        $line =~ /^([^\(]*)\(/;
        my $call = $1;
        chomp($line);
        push @unfinished_output_lines,
            {call => $call,
            line => $line,
            pid => $pid};
    } else {
        # Not unfinished. Is it resumed?
        if($line =~ s/^< \.\.\.\s(.*)\sresumed>//) {
            # This is the resumption of an unfinished line.
            # Identify its call and PID and match it to one we recorded earlier.
            my $call = $1;
            # Loop through list of unfinished lines.
            my $index = 0;
            $index++ until (($unfinished_output_lines[$index]->{"pid"} eq $pid and
                $unfinished_output_lines[$index]->{"call"} eq $call) or
                $index >= $#unfinished_output_lines); # Prevent infinities!
            # Concatenate previous line with this one.
            $line = $unfinished_output_lines[$index]->{"line"}.$line;
            # Take it away from the unfinished list, since it's finished now!
            splice(@unfinished_output_lines, $index, 1);
        }

        # Next parse the full line.
        if($line =~ /^([^\(]*)\((.*)\)\s*=\s*(-?[\d\?]+.*?)$/) {
            my $call = $1;
            my $params = $2;
            my $result = $3;
            # Split up params into an array.
            # Treat structures as single params.
            my @pararray = parseparams($params);
            # Push a nice structure onto our list of lines.
            push @trace_output_lines,
                {pid => $pid,
                timestamp => $timestamp,
                call => $call,
                params => [@pararray],
                result => $result};
            # Determine the longest list of parameters we have.
            $mostparams = scalar(@pararray) if scalar(@pararray) > $mostparams;
        } else {
            # Unparseable.
            chomp($line);
            push @trace_output_lines,
                {pid => $pid,
                timestamp => $timestamp,
                rawline => $line};
        }
    }
}

# Add in the header line.
unshift @trace_output_lines,
    {pid => "PID",
    timestamp => "Timestamp",
    call => "Call",
    result => "Result",
    params => [map {"Parameter $_"} (1..$mostparams)]};

# Get access to our controller.
my $controller = $sdoc->getCurrentController();
# Freeze the header row...
$controller->freezeAtPosition(0,1);

# Get access to our first (and only) spreadsheet.
my $sheet = $sdoc->getSheets()->getByIndex(0);
$sheet->setName("Trace Report");

# Go through the rows.
for my $lineindex (0..$#trace_output_lines) {
    my %linehash = %{$trace_output_lines[$lineindex]};
    $sheet->getCellByPosition(0,$lineindex)->setString($linehash{"pid"});
    $sheet->getCellByPosition(1,$lineindex)->setString($linehash{"timestamp"});
    # Now, was the line parseable or not?
    if (defined $linehash{"rawline"}) {
        # TODO: Merge remaining cells.
        $sheet->getCellByPosition(2,$lineindex)->setString($linehash{"rawline"});
    } else {
        $sheet->getCellByPosition(2,$lineindex)->setString($linehash{"call"});
        $sheet->getCellByPosition(3,$lineindex)->setString($linehash{"result"});
        my @pararray = @{$linehash{"params"}};
        for my $parindex (0..$#pararray) {
            if($parindex == 1019 and $#pararray > 1019) {
                # Max column index is 1023!
                $sheet->getCellByPosition(4+$parindex,$lineindex)->setString("etc.");
                last;
            }
            last if $parindex > 1019;
            my $param = '';
            defined $pararray[$parindex] and $param = $pararray[$parindex];
            # Eval any quote blocks!
            for my $quoteblock ($param =~ /("[^"]*")/g) {
                my $evaled = eval $quoteblock;
                defined $evaled and $param =~ s/("[^"]*")/$evaled/;
            }
            #print "Attempting to put '$param' in cell.\n";
            $sheet->getCellByPosition(4+$parindex,$lineindex)->setString($param);
        }
    }
}

# Make the columns optimal-width.
my $rightmostcol = 4 + $mostparams;
$rightmostcol = 1023 if $rightmostcol > 1023;
for my $colindex (0..$rightmostcol) {
    $sheet->getColumns()->getByIndex($colindex)->setPropertyValue("OptimalWidth",OpenOffice::UNO::Boolean->new(1));
}

# save the spreadsheet
my $save_args = createUnoArgSet($uno, {Overwrite => OpenOffice::UNO::Boolean->new(1),
                    FilterName => "calc8"});
$sdoc->storeAsURL("file://" . $docname, $save_args);

# close the spreadsheet
$sdoc->dispose();

###############
# SUBROUTINES #
###############
sub createUnoArgSet {
    # Takes UNO object, plus a hash (or reference thereto)
    # and returns a reference to an array of PropertyValues.
    my $uno = shift;

    my %inputhash;
    my @outputlist;

    if (scalar(@_) == 1) {
        # Got a scalar - must be a reference!
        %inputhash = %{$_[0]};
    } else {
        # Got more than one value - assume a hash.
        %inputhash = @_;
    }

    for my $arg (keys %inputhash) {
        my $pv = $uno->createIdlStruct("com.sun.star.beans.PropertyValue");
        $pv->Name($arg);
        $pv->Value($inputhash{$arg});
        push @outputlist, $pv;
    }

    return \@outputlist;
}

sub unwrapUnoArgSet {
    # Takes a reference to an array of PropertyValues
    # and returns a reference to a hash.
    my @pvset = @{$_[0]};
    my %outputhash;

    for my $pv (@pvset) {
        $outputhash{$pv->Name()} = $pv->Value();
    }

    return \%outputhash;
}

sub parseparams {
    # Split parameters from calls, respecting structures
    # enclosed by brackets/parentheses/braces.
    # Mostly based on https://stackoverflow.com/a/5052668
    my ($string) = @_;
    my @fields;

    my @comma_separated = split(/,\s*/, $string);

    my @to_be_joined;
    my $depth = 0;
    foreach my $field (@comma_separated) {
        my @brackets = $field =~ /(\(|\)|\[|\]|\{|\})/g;
        foreach (@brackets) {
            $depth++ if /\(|\[|\{/;
            $depth-- if /\)|\]|\}/;
        }

        if ($depth == 0) {
            push @fields, join(", ", @to_be_joined, $field);
            @to_be_joined = ();
        } else {
            push @to_be_joined, $field;
        }
    }

    return @fields;
}

Note that this code is unfinished, not just with respect to the insertion of images. There are other problems with it, such as the lack of sanitation of quoteblocks passed to eval in the parsing section. It also can't deal with timestamps that include microseconds, which are the output given by the example strace -tt command given earlier.

I think this gives the gist though. First it connects to LibreOffice and opens up a new spreadsheet, just to make sure that the setup is functional. The spreadsheet is "Hidden", so the user can't see what's going on. This (I believe) reduces the overhead since stuff doesn't have to be drawn and updated on the screen. Either way, I had "Hidden" set to false during testing so I could see exactly what was going on.

Anyway, the next step is to start reading trace lines from stdin and parsing them. This isn't really relevant to LibreOffice/UNO, so I won't dwell on it. It takes the data and organizes it into an array of hashes. I probably haven't done it the best way!

Once this is done, the really interesting bit starts. We get an object representing the document's "controller" which is a(n invisible) window. We use it to freeze the top row so the headers are always visible in the resulting spreadsheet/report.

We then get an object for the actual spreadsheet itself (the single "tab" of the document). We rename the sheet to "Trace Report" (rather than "Sheet1") and then start filling in its cells, row by row. It's fairly straightforward really. We use the fact that the spreadsheet object implements XCellRange to access its cells using the getCellByPosition method. Each cell object implements XTextRange, meaning we can use setString to insert text directly into the cell. The aforementioned eval is there to turn octal/hex representations of special characters in the strace output into the characters themselves. As it's written here though, it's not really fool-proof. We also make sure we don't go past the 1024th column, since that is the maximum number of columns in a Calc spreadsheet at the moment; attempting to go past it would crash the script with an ignominious "index out of bounds".

Once everything is filled in, we go back to the cosmetic, by going through the columns and making sure they're optimal-width, to improve readability. Again, we make sure not to go past the 1024th! We use the getColumns method of XColumnRowRange to get an object representing the set of columns in the spreadsheet. This object itself implements XIndexAccess, so we can call getByIndex to access a particular column. The column objects have properties (by virtue of implementing XPropertySet), one of which is OptimalWidth, which we can set to a boolean value in the same way as we set the Hidden property of the document when we opened (created) it.

Finally the spreadsheet document is saved, using the "calc8" filter, and then closed using the dispose method. Not bad! We now have something that takes output from strace and summarizes it in a spreadsheet. (Your mileage may vary as to how useful you think that actually is!)

Right, so there are still a few things missing. It doesn't do the image thing at all, and it would also be good if the "unparseable" lines that are just dumped into the spreadsheet "as-is" could be put in merged cells.

Some further perusal of the wiki led me to the conclusion that trying to merge the cells using the "proper" method with UNO objects and methods would be way too convoluted, so instead I opted for the "dispatcher" method as used by a recorded macro. I replaced the TODO line with:

$controller->select($sheet->getCellRangeByPosition(2,$lineindex,$rightmostcol,$lineindex));
$dispatcher->executeDispatch($controller,".uno:MergeCells","",0,[]);

In order for this to work, I also had to uncomment the "my $dispatcher" line near the top of the script, and also move the lines defining $rightmostcol higher up in the script, before the "for my $lineindex" loop. You'll notice that I was able to select a cell range without using a dispatch, with the help of $controller. However, the actual merging is done by a dispatch, which executes a command in LibreOffice that merges the selected cells.

Next I had to get the images working. Naturally, this involved adding a lot of non-LibreOffice-related code, but I won't concentrate on that for now. Suffice it to say, my array of hashes could now contain "lines" specifying only an image path. I dealt with these by adding a block to the start of the "for my $lineindex" loop, just after the declaration of %linehash:

if (defined $linehash{"image"}) {
    # This "line" is a pic.
    # Go to the cell we want to put it at.
    $controller->select($sheet->getCellByPosition(0,$lineindex));
    # Now execute the dispatch to insert the picture.
    # First it wants to know the filter for whatever reason.
    my $filter;
    if ($linehash{"image"} =~ /png$/) {
        $filter = "PNG - Portable Network Graphic";
    } elsif ($linehash{"image"} =~ /gif$/) {
        $filter = "GIF - Graphics Interchange Format";
    } elsif ($linehash{"image"} =~ /jpg$/) {
        $filter = "JPEG - Joint Photographic Experts Group";
    }
    $dispatcher->executeDispatch($controller->getFrame(),".uno:InsertGraphic","",0,createUnoArgSet($uno, {
                FileName => "file://".$linehash{"image"},
                FilterName => $filter,
                AsLink => OpenOffice::UNO::Boolean->new(0)}));
    # Now, none of the rest of this stuff is relevant.
    next;
}

You'll notice I used the dispatcher again. This is because inserting an image into a spreadsheet is something for which I just couldn't seem to find any documentation. I know it's probably not something people do that often, but there seemed to be absolutely nothing that could help me with it, either in the API docs or on the OpenOffice wiki.

Actually, this particular use of the dispatcher was one of the head-scratching incidents I alluded to earlier. I forgot to pass $uno to createUnoArgSet, so it was getting only one argument (my hash reference) and therefore producing an empty array from an empty hash. The result was that the dispatch wasn't getting any arguments passed to it, so the "Insert Picture" dialog kept coming up asking me to choose an image! Eventually I figured it out, but that wasted a lot of my time…

This is fine, but it doesn't adjust the cell height, so the big screenshot covers a load of output lines below it. Adjusting cell height seemed like it should be a straightforward enough task, but no matter what I tried it just kept ignoring me. At first I just wanted to try setting the height to 4.5 cm, roughly ten times the normal, as a proof of concept. This is something like what I had:

# DOESN'T WORK:
my $row = $sheet->getRows()->getByIndex($lineindex);
$row->setPropertyValue("Height",4500); # Dimensions are in hundredths of a millimetre.

These lines of code seemed completely ineffectual, a suspicion confirmed by something along the lines of

# Diagnostic:
print $row->getPropertyValue("Height");

which gave me back the default value. Attempting the same thing with an interactive Python console gave no issues, confirming that the problem was with Perl, or rather with the very-poorly-documented Perl module I use. For the sake of comparing like with like I loaded up an interactive Perl console and tried to mess around with the UNO interface.

I had the feeling that the fact that the integer was being ignored was somehow related to the fact that booleans need to be passed in an "OpenOffice::UNO::Boolean" object. I eventually realized I could put these two fairly simple lines into my console:

use OpenOffice::UNO;
for my $key (keys %OpenOffice::UNO::) { print "$key\n"; }

This gave me a list of symbols available in the OpenOffice::UNO package, which was:

createComponentContext
Struct::
VERSION
dl_load_flags
Interface::
import
Int32::
bootstrap
Any::
createInitialComponentContext
EXPORT
Exception::
DESTROY
Boolean::
ISA
createIdlStruct
Int64::
new

Sure enough, there's an "Int32" type and an "Int64" type. Now you see what I mean about this Perl module having terrible docs!

To that end, the next thing I tried was to modify the non-functional code above to something like:

# WORKS:
my $row = $sheet->getRows()->getByIndex($lineindex);
$row->setPropertyValue("Height",OpenOffice::UNO::Int32->new(4500)); # Dimensions are in hundredths of a millimetre.

What do you know, it worked! As such, I created a new wrapper function, and modified my createUnoArgSet function to invoke it:

sub createUnoObj {
    # Take a number or boolean and turn it into an appropriate
    # Uno object. Useful for setting properties and argsets.
    my $value = shift;
    # First of all, is it already an object?
    if(ref($value) =~ /^OpenOffice::UNO/) {
        return $value; # Send it on its merry way!
    }

    # Okay, let's see what to do with it...
    unless ($value & ~$value) {
        # Numeric.
        # Is it a float?
        if(int($value) - $value) {
            # Just send it...
            return $value;
        } elsif (abs($value) < 2**16) {
            # It's an int...
            return OpenOffice::UNO::Int32->new($value);
        } else {
            # Won't fit in an Int32.
            return OpenOffice::UNO::Int64->new($value);
        }
    } else {
        # String...
        # Okay, is it a Boolean?
        if($value eq "True") {
            return OpenOffice::UNO::Boolean->new(1);
        } elsif($value eq "False") {
            return OpenOffice::UNO::Boolean->new(0);
        } else {
            # Okay, it's just a string.
            return $value;
        }
    }
}

sub createUnoArgSet {
    # Takes UNO object, plus a hash (or reference thereto)
    # and returns a reference to an array of PropertyValues.
    my $uno = shift;
    die "createUnoArgSet called without reference to UNO object!"
        unless ref($uno) eq "OpenOffice::UNO";

    my %inputhash;
    my @outputlist;

    if (scalar(@_) == 1) {
        # Got a scalar - must be a reference!
        %inputhash = %{$_[0]};
    } else {
        # Got more than one value - assume a hash.
        %inputhash = @_;
    }

    for my $arg (keys %inputhash) {
        my $pv = $uno->createIdlStruct("com.sun.star.beans.PropertyValue");
        $pv->Name($arg);
        $pv->Value(createUnoObj($inputhash{$arg}));
        push @outputlist, $pv;
    }

    return \@outputlist;
}

I created another new function to wrap setPropertyValue:

sub setUnoProps {
    # Wrapper of setPropertyValue that sends nice UNO objects.
    my $unoObj = shift;
    die "setUnoProps called without reference to an UNO struct/interface/any!"
        unless ref($uno) =~ /^OpenOffice::UNO/;

    my %inputhash;

    if (scalar(@_) == 1) {
        # Got a scalar - must be a reference!
        %inputhash = %{$_[0]};
    } else {
        # Got more than one value - assume a hash.
        %inputhash = @_;
    }

    for my $arg (keys %inputhash) {
        $unoObj->setPropertyValue($arg,createUnoObj($inputhash{$arg}));
    }
}

Finally I needed to marry these two things: I could insert an image at a cell, and change the height of the row containing that cell; how to change the height of the row to the height of the image? The answer came when I somehow realized that each spreadsheet has a LibreOffice Draw page superimposed on it, which contains all the "shapes", which include drawn shapes, but also images.

Therefore, I added a line to get access to this Draw page, just before the "for my $lineindex" loop:

my $drawpage = $sheet->getDrawPage();

Strangely enough, the Draw page object itself implements XIndexAccess, so a shape (in this case an image) is accessed by calling getByIndex directly on $drawpage. Now, naturally, I want to adjust the height of a row immediately after inserting the image, at which point it is the last object in this list of shapes (because it's the most recently inserted!). Therefore, the height can be obtained using a one-liner:

my $graphicheight = $drawpage->getByIndex($drawpage->getCount()-1)->getSize()->Height;

Funnily enough, it's possible to just call Height on the object representing the image's size, which I have to do because that object doesn't implement XPropertySet. Setting the row height is then a simple matter of:

my $row = $sheet->getRows()->getByIndex($lineindex);
setUnoProps($row,{Height=>$graphicheight});

I took this for a test drive, running a trace on Rayman 2, which I ran through Wine. The script ran for ages, then crashed because it hit the row limit, which is a "mere" 220! Therefore, I modified the script to truncate the report if it hits this limit. The final script looks like this, with a bit more polish:

#!/usr/bin/perl
#tracereport.plx
# An attempt to take strace output and put it into a spreadsheet using LibreOffice Calc...
use warnings;
use strict;
use OpenOffice::UNO;

# Check if we have a spreadsheet name.
if (scalar(@ARGV) < 1) {
    die("You need to specify a spreadsheet path (absolute!) as an argument!\n");
}
my $docname = shift @ARGV;
# Next look for a directory of images.
my $imgdir;
if (scalar(@ARGV) and -d($ARGV[0])) {
    $imgdir = shift @ARGV;
} else {
    use Cwd;
    $imgdir = cwd();
}

my @pics;
my @pictimestamps;

# Check for images in our directory.
open FILELIST, "ls -gt --time-style=\"+%X.%N\" $imgdir/{*.png,*.gif,*.jpg} |";
while() {
    my ($perms,$num,$group,$size,$timestamp,$filepath) =
        /^([^\s]*)\s*([^\s]*)\s*([^\s]*)\s*([^\s]*)\s*([^\s]*)\s*(.*)$/;
    # ls sorts by newest first, so unshift to get oldest first.
    unshift(@pictimestamps,makeseconds($timestamp));
    unshift(@pics,$filepath);
}
close FILELIST;

# connect to the OpenOffice.org server
my $uno = OpenOffice::UNO->new;
my $cxt = $uno->createInitialComponentContext;
my $sm  = $cxt->getServiceManager;
my $resolver = $sm->createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", $cxt);
my $rsm = $resolver->resolve("uno:socket,host=localhost,port=8100;urp;StarOffice.ServiceManager");

# get an instance of the Desktop service
my $rc = $rsm->getPropertyValue("DefaultContext");
my $desktop = $rsm->createInstanceWithContext("com.sun.star.frame.Desktop", $rc);
# and of the Dispatcher service
my $dispatcher = $rsm->createInstanceWithContext("com.sun.star.frame.DispatchHelper", $rc);

# create a name/value pair to be used in opening the spreadsheet
my $open_args = createUnoArgSet($uno, {Hidden => "False"});

# open a spreadsheet
my $sdoc = $desktop->loadComponentFromURL("private:factory/scalc","_blank", 0, $open_args);

# Start recording trace info.
my @trace_output_lines;
my @unfinished_output_lines;
my $mostparams = 0;

while(my $line = <>) {
    # First extract the PID and timestamp.
    $line =~ s/^(\d*)\s*([\d:\.]*)\s*//;
    my $pid = $1;
    my $timestamp = $2;

    # Next figure out if it's unfinished.
    if($line =~ s/\s*<unfinished\s\.\.\.>$//) {
        # It's an unfinished line. Record it, along with its call and PID.
        $line =~ /^([^\(]*)\(/;
        my $call = $1;
        chomp($line);
        push @unfinished_output_lines,
            {call => $call,
            line => $line,
            pid => $pid};
    } else {
        # Not unfinished. Is it resumed?
        if($line =~ s/^<\.\.\.\s(.*)\sresumed>//) {
            # This is the resumption of an unfinished line.
            # Identify its call and PID and match it to one we recorded earlier.
            my $call = $1;
            # Loop through list of unfinished lines.
            my $index = 0;
            $index++ until (($unfinished_output_lines[$index]->{"pid"} eq $pid and
                $unfinished_output_lines[$index]->{"call"} eq $call) or
                $index >= $#unfinished_output_lines); # Prevent infinities!
            # Concatenate previous line with this one.
            $line = $unfinished_output_lines[$index]->{"line"}.$line;
            # Take it away from the unfinished list, since it's finished now!
            splice(@unfinished_output_lines, $index, 1);
        }

        # Next parse the full line.
        my $parsedline;
        if($line =~ /^([^\(]*)\((.*)\)\s*=\s*(-?[\d\?]+.*?)$/) {
            my $call = $1;
            my $params = $2;
            my $result = $3;
            # Split up params into an array.
            # Treat structures as single params.
            my @pararray = parseparams($params);
            # Put together a nice structure to push onto our list of lines.
            $parsedline =
                {pid => $pid,
                timestamp => $timestamp,
                seconds => makeseconds($timestamp),
                call => $call,
                params => [@pararray],
                result => $result};
            # Determine the longest list of parameters we have.
            $mostparams = scalar(@pararray) if scalar(@pararray) > $mostparams;
        } else {
            # Unparseable.
            chomp($line);
            $parsedline =
                {pid => $pid,
                timestamp => $timestamp,
                seconds => makeseconds($timestamp),
                rawline => $line};
        }

        # Next see if there are any images to put in!
        while (scalar @pics and ($pictimestamps[0] < $parsedline->{"seconds"})) {
            push @trace_output_lines,
                {image => $pics[0]};
            # Clean out the arrays as we're going, so we can always check entry zero.
            shift @pics;
            shift @pictimestamps;
        }
        # Now stick in the actual trace output line.
        push @trace_output_lines, $parsedline;
    }
}

# Add in the header line.
unshift @trace_output_lines,
    {pid => "PID",
    timestamp => "Timestamp",
    call => "Call",
    result => "Result",
    params => [map {"Parameter $_"} (1..$mostparams)]};

# Maximum number of rows…
warn "Too many lines in this trace! The report will be truncated!" if (@trace_output_lines > 2**20);

# Get access to our controller.
my $controller = $sdoc->getCurrentController();
# Freeze the header row...
$controller->freezeAtPosition(0,1);

# Get access to our first (and only) spreadsheet.
my $sheet = $sdoc->getSheets()->getByIndex(0);
$sheet->setName("Trace Report");
# Get access to the draw page too, to manage the "shapes" (images).
my $drawpage = $sheet->getDrawPage();

# Go through the rows.
my $rightmostcol = 4 + $mostparams;
$rightmostcol = 1023 if $rightmostcol > 1023;
for my $lineindex (0..$#trace_output_lines) {
    # Have we reached the end of the spreadsheet
    if ($lineindex == (2**20 - 1) and $#trace_output_lines > $lineindex) {
        # Explain what happened…
        $sheet->getCellByPosition(0,$lineindex)->setString("Report truncated!");
        $controller->select($sheet->getCellRangeByPosition(0,$lineindex,$rightmostcol,$lineindex));
        $dispatcher->executeDispatch($controller,".uno:MergeCells","",0,[]);
        # We're done here.
        last;
    }

    # Otherwise tutto va bene.
    my %linehash = %{$trace_output_lines[$lineindex]};
    if (defined $linehash{"image"}) {
        # This "line" is a pic.
        # Go to the cell we want to put it at.
        $controller->select($sheet->getCellByPosition(0,$lineindex));
        # Now execute the dispatch to insert the picture.
        # First it wants to know the filter for whatever reason.
        my $filter;
        if ($linehash{"image"} =~ /png$/) {
            $filter = "PNG - Portable Network Graphic";
        } elsif ($linehash{"image"} =~ /gif$/) {
            $filter = "GIF - Graphics Interchange Format";
        } elsif ($linehash{"image"} =~ /jpg$/) {
            $filter = "JPEG - Joint Photographic Experts Group";
        }
        $dispatcher->executeDispatch($controller->getFrame(),".uno:InsertGraphic","",0,createUnoArgSet($uno, {
                    FileName => "file://".$linehash{"image"},
                    FilterName => $filter,
                    AsLink => "False"}));
        # Grab the newest-inserted graphic.
        my $graphicheight = $drawpage->getByIndex($drawpage->getCount()-1)->getSize()->Height;
        # Now get the row and adjust its height.
        my $row = $sheet->getRows()->getByIndex($lineindex);
        setUnoProps($row,{Height=>$graphicheight});
        # Now, none of the rest of this stuff is relevant.
        next;
    }
    $sheet->getCellByPosition(0,$lineindex)->setString($linehash{"pid"});
    $sheet->getCellByPosition(1,$lineindex)->setString($linehash{"timestamp"});
    # Now, was the line parseable or not?
    if (defined $linehash{"rawline"}) {
        # Just throw in the entire line.
        $sheet->getCellByPosition(2,$lineindex)->setString($linehash{"rawline"});
        # Now merge remaining cells. Need to dispatch this too it seems.
        $controller->select($sheet->getCellRangeByPosition(2,$lineindex,$rightmostcol,$lineindex));
        $dispatcher->executeDispatch($controller,".uno:MergeCells","",0,[]);
    } else {
        $sheet->getCellByPosition(2,$lineindex)->setString($linehash{"call"});
        $sheet->getCellByPosition(3,$lineindex)->setString($linehash{"result"});
        my @pararray = @{$linehash{"params"}};
        for my $parindex (0..$#pararray) {
            if($parindex == 1019 and $#pararray > 1019) {
                # Max column index is 1023!
                $sheet->getCellByPosition(4+$parindex,$lineindex)->setString("etc.");
                last;
            }
            last if $parindex > 1019;
            my $param = '';
            defined $pararray[$parindex] and $param = $pararray[$parindex];
            # Eval any quote blocks!
            for my $quoteblock ($param =~ /("[^"]*")/g) {
                # Get rid of sigils that confuse Perl.
                $quoteblock =~ s/\$/\\\$/g;
                $quoteblock =~ s/\@/\\\@/g;
                $quoteblock =~ s/\%/\\\%/g;
                # Evaluate
                my $evaled = eval $quoteblock;
                $param =~ s/("[^"]*")/$evaled/ if defined $evaled;
            }
            $sheet->getCellByPosition(4+$parindex,$lineindex)->setString($param);
        }
    }
}

# Make the columns optimal-width.
for my $colindex (0..$rightmostcol) {
    setUnoProps($sheet->getColumns()->getByIndex($colindex),{OptimalWidth=>"True"});
}

# save the spreadsheet
my $save_args = createUnoArgSet($uno, {Overwrite => "True",
                    FilterName => "calc8"});
$sdoc->storeAsURL("file://" . $docname, $save_args);

# close the spreadsheet
$sdoc->dispose();

###############
# SUBROUTINES #
###############
sub makeseconds {
    my $timestamp = shift;
    my ($hours, $minutes, $floatseconds) = ($timestamp =~ /(\d\d):(\d\d):(.*)/);
    return $hours * 3600 + $minutes * 60 + $floatseconds;
}

sub createUnoObj {
    # Take a number or boolean and turn it into an appropriate
    # Uno object. Useful for setting properties and argsets.
    my $value = shift;
    # First of all, is it already an object?
    if(ref($value) =~ /^OpenOffice::UNO/) {
        return $value; # Send it on its merry way!
    }

    # Okay, let's see what to do with it...
    unless ($value & ~$value) {
        # Numeric.
        # Is it a float?
        if(int($value) - $value) {
            # Just send it...
            return $value;
        } elsif (abs($value) < 2**16) {
            # It's an int...
            return OpenOffice::UNO::Int32->new($value);
        } else {
            # Won't fit in an Int32.
            return OpenOffice::UNO::Int64->new($value);
        }
    } else {
        # String...
        # Okay, is it a Boolean?
        if($value eq "True") {
            return OpenOffice::UNO::Boolean->new(1);
        } elsif($value eq "False") {
            return OpenOffice::UNO::Boolean->new(0);
        } else {
            # Okay, it's just a string.
            return $value;
        }
    }
}

sub createUnoArgSet {
    # Takes UNO object, plus a hash (or reference thereto)
    # and returns a reference to an array of PropertyValues.
    my $uno = shift;
    die "createUnoArgSet called without reference to UNO object!"
        unless ref($uno) eq "OpenOffice::UNO";

    my %inputhash;
    my @outputlist;

    if (scalar(@_) == 1) {
        # Got a scalar - must be a reference!
        %inputhash = %{$_[0]};
    } else {
        # Got more than one value - assume a hash.
        %inputhash = @_;
    }

    for my $arg (keys %inputhash) {
        my $pv = $uno->createIdlStruct("com.sun.star.beans.PropertyValue");
        $pv->Name($arg);
        $pv->Value(createUnoObj($inputhash{$arg}));
        push @outputlist, $pv;
    }

    return \@outputlist;
}

sub unwrapUnoArgSet {
    # Takes a reference to an array of PropertyValues
    # and returns a reference to a hash.
    my @pvset = @{$_[0]};
    my %outputhash;

    for my $pv (@pvset) {
        $outputhash{$pv->Name()} = $pv->Value();
    }

    return \%outputhash;
}

sub setUnoProps {
    # Wrapper of setPropertyValue that sends nice UNO objects.
    my $unoObj = shift;
    die "setUnoProps called without reference to an UNO struct/interface/any!"
        unless ref($uno) =~ /^OpenOffice::UNO/;

    my %inputhash;

    if (scalar(@_) == 1) {
        # Got a scalar - must be a reference!
        %inputhash = %{$_[0]};
    } else {
        # Got more than one value - assume a hash.
        %inputhash = @_;
    }

    for my $arg (keys %inputhash) {
        $unoObj->setPropertyValue($arg,createUnoObj($inputhash{$arg}));
    }
}

sub parseparams {
    # Split parameters from calls, respecting structures
    # enclosed by brackets/parentheses/braces.
    # Mostly based on https://stackoverflow.com/a/5052668
    my ($string) = @_;
    my @fields;

    my @comma_separated = split(/,\s*/, $string);

    my @to_be_joined;
    my $depth = 0;
    foreach my $field (@comma_separated) {
        my @brackets = $field =~ /(\(|\)|\[|\]|\{|\})/g;
        foreach (@brackets) {
            $depth++ if /\(|\[|\{/;
            $depth-- if /\)|\]|\}/;
        }

        if ($depth == 0) {
            push @fields, join(", ", @to_be_joined, $field);
            @to_be_joined = ();
        } else {
            push @to_be_joined, $field;
        }
    }

    return @fields;
}

Now, a truncated trace report on Rayman 2 probably isn't all that useful. To prevent the truncation, I could find some way of sanitizing the trace before running the script, e.g. by grepping it to get just the "open"s, "read"s and "close"s. Either way, the idea's well over a year old, but I only got around to implementing it recently. In the meantime, other means of reverse-engineering this game have come on leaps and bounds, so any report that this thing generates probably won't tell me anything that others haven't found already. Ah well!

The point is, I rather enjoyed making this, even if it was frustratingly head-scratching sometimes. I hope this can be of help to someone else out there too! For those interested in looking at the code without awkwardly copy-pasting it from this post, I've uploaded it (GPG signature – my key fingerprint is FAFA F12C 4440 460A 89D0 A67F 8D31 13F7 D36F 833C).

links

social