XS is EaSY

fishbot

fish-kwpm@uc.org

C/C++ to Perl

XS is a meta-language for binding C/C++ to Perl
Maps C functions to Perl subroutines
Used all over in the core: POSIX, Scalar::Util

Learning Curve

It's really not that hard
It's like anything, once you can do it, it's easy
Need to know some C, but that's hard to avoid
Can get away with not knowing any Perl internals

First, why bind to C?

It's "faster".
It's definitely easier on the RAM.
To leverage existing code.
I'll give three examples of XS stuff I've done recently. Quick case studies.

Case 1

I've got Perl code, should really have been C.
I know Perl, so I did it in Perl.
That was mostly stupid, though.
I used XS to -port- piecewise.
(fractal code)

Case 2

I have a Perl app that is a good idea for Perl.
One chunk of it is a huge runtime and memory bloat.
I want to rewrite just that one chunk into C, and bring the speed up, and the RAM usage down.
(Mosaic code)

Case 3

I have existing C (C++ in this case) code
Want to use it from Perl
Just want to map it in quickly

XS Worked

I had success with all three projects
Inline might have worked well for Case 2
Swig might have worked best for Case 3
I'd prefer to be good at one

Getting Started

h2xs is your friend.
You've likely all used it with -X -A -n to create a module template.
-X means don't do the XS stuff.
Thus... h2xs -A [maybe some other opts] -n My::Foo

This creates:

This will create a My-Foo directory that has the following files:


./Changes 
./MANIFEST 
./README
./Foo.xs
./lib/My/Foo.pm
./Makefile.PL
./ppport.h (for newer perls)
./t/My-Foo.t

You can edit Foo.xs, Foo.pm, and probably t/My-Foo.t, and be done.

Foo.xs

Let's start in the only unfamilar one, Foo.xs.

During compile, Foo.xs will be compiled to Foo.c
This is done by a program named xsubpp
Then Foo.c is compiled to a .so.a or a .dll.

Foo.xs

Foo.pm will use Dynaloader/whatever to bootstrap the module.
You don't need to care about that, though - h2xs already set that up.
In fact, you can compile your module already, and even make test...
h2xs wrote your first test for you, a use_ok().

Let's go


[ My-Foo ]=> perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for My::Foo
[ My-Foo ]=> make
cp lib/My/Foo.pm blib/lib/My/Foo.pm
/usr/bin/perl.exe /usr/lib/perl5/5.8/ExtUtils/xsubpp  -typemap
/usr/lib/perl5/5.8/ExtUtils/typemap  Foo.xs > Foo.xsc && mv Foo.xsc Foo.c
## compilation noise omitted ##
[ My-Foo ]=> make test
/usr/bin/perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib',
'blib/arch')" t/*.t
t/My-Foo....ok
All tests successful.
Files=1, Tests=1,  0 wallclock secs ( 0.12 cusr +  0.07 csys =  0.20 CPU)

Eeeeasy

I said that XS was easy.
You've just created your first XS module, and you didn't need to open a file.

Harder - opening files


#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "ppport.h"

MODULE = My::Foo        PACKAGE = My::Foo

Actually doing something


#includes as before
#include 

int char_sum( char *str );

int
char_sum( char *str )
{
   int sum = 0;
   int len = strlen( str );
   int i;
   for ( i = 0; i < len; i++ )
      sum += str[i];

   return sum;
}

XSub Mapping

And the mapping for Perl:


MODULE = My::Foo        PACKAGE = My::Foo

int
char_sum( str );
    char * str;

First real test:

XS provides typemaps for int and char *. I can now compile and run this. And test it, even:


use List::Util qw{ sum };
{
    my $str  = "foo";
    my $psum = sum unpack 'C*', $str;
    my $xsum = My::Foo::char_sum( $str );
    ok( $psum == $xsum, "Matches!  Woo!" );
}

And make test lets me know that this passes.

So far...

This code is:

pointless
broken
completely not requiring any knowledge of Perl internals
eaasy
our hello world

Hello world!


#include 

void
hello( void )
{
    printf( "Hello world!\n" );
}

MODULE = My::Foo        PACKAGE = My::Foo

void
hello( );

[ My-Foo ]=> perl -Mblib -MMy::Foo -e 'My::Foo::hello()'
Hello world!

Golf XS


MODULE = My::Foo        PACKAGE = My::Foo

void
hello( );
    CODE:
         printf( "Hello world!\n" );

Return from CODE


int 
sum_two( one, two );
   int one;
   int two;
   CODE:
      RETVAL = one + two;
   OUTPUT:
      RETVAL;

Behind the XS


MODULE = My::Foo        PACKAGE = My::Foo

void
hello( );
    CODE:
       printf( "Hello world!\n" );

xsubpp makes Foo.c


#line 18 "Foo.c"
XS(XS_My__Foo_hello); /* prototype to pass -Wmissing-prototypes */
XS(XS_My__Foo_hello)
{
    dXSARGS;
    if (items != 0)
    Perl_croak(aTHX_ "Usage: My::Foo::hello()");
    {
#line 13 "Foo.xs"
        printf( "Hello world!\n" );
#line 28 "Foo.c"
    }
    XSRETURN_EMPTY;
}

boot strapper


XS(boot_My__Foo); /* prototype to pass -Wmissing-prototypes */
XS(boot_My__Foo)
{
    dXSARGS;
    char* file = __FILE__;

    XS_VERSION_BOOTCHECK ;

    newXS("My::Foo::hello", XS_My__Foo_hello, file);
    XSRETURN_YES;
}

Boooring.

Really, nothing terribly exciting is happening here... in fact it's all pretty tedious. Which is one of two reasons xsubpp exists, to handle the tediousness of it all.

That, and not forcing people to learn and remember all the ins and outs of the Perl calling conventions.

Last .c example


XS(XS_My__Foo_char_sum)
{
    dXSARGS;
    if (items != 1)
    Perl_croak(aTHX_ "Usage: My::Foo::char_sum(str)");
    {
    char *  str = (char *)SvPV_nolen(ST(0));
    int RETVAL;
    dXSTARG;

    RETVAL = char_sum(str);
    XSprePUSH; PUSHi((IV)RETVAL);
    }
    XSRETURN(1);
}

Summary xsubpp

Input checking
Data sheparding, conversion
Stack interaction
Good to know what is happening

CODE versus Indirection

CODE: is faster
Indirection is clearer
Way easier if the C exists
Less flexible, though

Can now start doing things

We actually now can do a lot
This is enough to speed up my fractal code, which is really only interested in passing floats in and out.
But, to be of general use, we need to step back and learn a bit about Perl datatypes.

Datatypes

SV - a scalar value, a string, a number, a reference (number might be an integer, and unsigned integer, a float)
AV - an array
HV - a hash

There are other datatypes that are interesting, but not that common, things like globs (GV) and code values (CV).

Scalars

Scalars can be:

IV - integers
UV - unsigned integers
NV - floating point numbers
RV - references
PV - strings

Docs: pretty good

perldoc perlapi lists the various functions you can use to create and manipulatethese types.
Google for perlguts illustrated for all the composite types as pictures.

Using Perl Datatypes


IV
sum_of_squares( AV *in )
{
    IV last = av_len( in );
    IV sum = 0, i;

    for ( i = 0; i <= last; i++ )
    {
        SV **cur = av_fetch( in, i, 0 );
        sum += pow( SvIV( *cur ), 2 );
    }

    return sum;
}

Mapping with Perl Datatypes


IV
sum_of_squares( in );
   AV *in;

Coercion to references

And we can test it out:


[ My-Foo ]=> perl -Mblib -MMy::Foo -le 
    'print My::Foo::sum_of_squares([1, 2, 3])'
14

Note that this takes an arrayref

Cheating with Prototypes

Actually, we can sort of cheat a bit, with this mapping:


IV
sum_of_squares( in );
   AV *in;
   PROTOTYPE: \@

my @foo = ( 1, 2, 3 );
sum_of_squares( @foo );    # yes
sum_of_squares( 1, 2, 3 ); # no
&sum_of_squares( @foo );   # no

Case study: Photo Mosiac

Imagine you have something like the Photomosaic code that Daniel and I wrote some time back. A very quick summary:

mapped candiate tiles and target tiles to n-dimensional vectors
then iterated over the candidate vectors, looking for the one closest in N spaces to the target.

C Wins Here

You need a large candidate pool. 40k+

Now, here's an area where C will outstrip Perl in two ways:

compactness of storage
speed of comparison

Compactness of Storage

Only need 1 byte, Perl makes me eat 4, or 8.
Additionally, I have all the surrounding Perl structures, which means that my storage looks like:


SV -> RV -> AV -> [ SV -> IVX, SV -> IVX, ... ]

Example dump...


> print Dump $foo = [ 1, 2 ];
SV = RV(0x100381c8) at 0x10010fa0
  REFCNT = 1
  FLAGS = (ROK)
  RV = 0x10010eb0
  SV = PVAV(0x10015f00) at 0x10010eb0
    REFCNT = 3
    FLAGS = ()
    IV = 0
    NV = 0
    ARRAY = 0x10025ba8
    FILL = 2
    MAX = 2
    ARYLEN = 0x0
    FLAGS = (REAL)
    Elt No. 0
    SV = IV(0x10029ac0) at 0x10011030
      REFCNT = 1
      FLAGS = (IOK,pIOK)
      IV = 1
    Elt No. 1
    SV = IV(0x10029ac8) at 0x1002ef24
      REFCNT = 1
      FLAGS = (IOK,pIOK)
      IV = 2

Compact custom types

Let's make our own vector type


typedef struct nvector_s {
    UV dims;
    U8 *data; // int[dims]
} nvector_t;

Typemaps

Create a typemap file and add this:


TYPEMAP
nvector_t *     T_PTRREF

T_PTR - just returns the pointer as a an IV
T_PTROBJ - returns essentially a blessed T_PTRREF

Create vector type


nvector_t *
nvector_new( UV dims )
{
    nvector_t *vect;
    U8        *data;

    vect = (nvector_t *) calloc( 1, sizeof( nvector_t ));
    data = (U8 *) calloc( dims, sizeof( U8 ));

    vect->data = data;
    vect->dims = dims;

    return vect;
}

Destroy vector


void
nvector_free( nvector_t *vect )
{
    free( (void *) vect->data );
    free( (void *) vect );
}

Generate from AV


nvector_t *
nvector_fromAV( AV *in )
{
    I32 dim, i;
    nvector_t *vret;

    dim = av_len( in );
    
    if ( dim <= 0 )
        croak( "Vector dimensionality must be at least 1" );
    
    vret = nvector_new( dim + 1 );
    for ( i = 0; i <= dim; i++ )
    {
        SV **posSV = av_fetch( in, i, 0 );
        if ( ! SvIOK( *posSV ) )
            croak( "Array value at %u", i );

        vret->data[i] = SvIVX( *posSV );
    }
    
    return vret;
}

Do something with it...


long
nvector_sqdist( nvector_t *vecta, nvector_t *vectb )
{
    long sum = 0;
    int i;

    if ( vecta->dims != vectb->dims )
        croak( "Vector length mismatch: %u - %u", 
                vecta->dims, vectb->dims );

    for ( i = 0; i < vecta->dims; i++ )
    {
        int dist = abs( vecta->data[i] - vectb->data[i] );
        sum += pow( dist, 2 );
    }

    return sum;
}

Testing vector distance

Let's test it:


my $vecta = nvector_fromAV( [ 5, 2, 8 ] );
my $vectb = nvector_fromAV( [ 1, 3, 1 ] );
my $dist = nvector_sqdist( $vecta, $vectb );
print "$dist\n";

prints 66

Left holding the memories

Note that we have to manually destroy these vectors we build, or we are leaking memory. There are a bunch of strategies you can employ here:

make your C structures into Perl objects, and let DESTROY handle free()
place your C structures inside SvPVs.
force the user to destroy objects themselves.
leak memory.

Cheating by secreting

Just quickly, how to get Perl to garbage collect your C variable:


typedef struct foo_s {
   IV foo; UV bar; NV baz;
} foo_t;

"Encrapsulation"


SV *
make_foo( IV foo, UV bar, NV baz )
{
   SV *ret = newSVpv( "", sizeof( foo_t ));
   foo_t *fobj = (foo_t *) SvPVX( ret );
   fobj->foo = foo;
   fobj->bar = bar;
   fobj->baz = baz;
   return ret;
}

Accessing buried nuts


void
show_foo( SV *val )
{
   foo_t *fobj = (foo_t *) SvPVX( val );
   printf( "foo = %d; bar = %u; baz = %f\n",
      fobj->foo, fobj->bar, fobj->baz );
}

Testing hidden nuts

run:


$ perl -Mblib -MFoo -e 'my $foo = make_foo( 1, 2, 3 ); show_foo( $foo );'
foo = 1; bar = 2; baz = 3.000000

And if we Peek our $foo's PV:


PV = 0xe4860 "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\10@\0"\0

Caveats...

There is a lot of danger here, but a lot of benefit.
Probably should add insome checks that your SV is long enough, and is POK.
Doing this sort of thing with a struct with pointers is like begging for a SEGV. Getting down on your knees and begging.
This method does work nicely for an array or an array of structs, though.

What we can't do (yet)

Variadic parameters
List return
Perl callbacks
Create hashes
A lot of stuff...

Thanks

(questions, etc.)

kwmp - oct2006

XS is EaSY

XS is EaSY

fishbot

fish-kwpm@uc.org

C/C++ to Perl

Other linking methods

Learning Curve

First, why bind to C?

Case 1

Case 2

Case 3

XS Worked

Getting Started

This creates:

Foo.xs

Foo.xs

Let's go

Eeeeasy

Harder - opening files

Actually doing something

XSub Mapping

First real test:

So far...

Hello world!

Golf XS

Return from CODE

Behind the XS

xsubpp makes Foo.c

boot strapper

Boooring.

Last .c example

Summary xsubpp

CODE versus Indirection

Can now start doing things

Datatypes

Scalars

Docs: pretty good

Using Perl Datatypes

Mapping with Perl Datatypes

Coercion to references

Cheating with Prototypes

Case study: Photo Mosiac

C Wins Here

Compactness of Storage

Example dump...

Compact custom types

Typemaps

Create vector type

Destroy vector

Generate from AV

Do something with it...

Testing vector distance

Left holding the memories

Cheating by secreting

"Encrapsulation"

Accessing buried nuts

Testing hidden nuts

Caveats...

What we can't do (yet)

Thanks