Linking, loading and Chicken's libraries, modules and deployment system

Here I'll talk a little of what I've learned in respect to liking, loading, the Chicken's module system and software deployment in Chicken. Note that most of what is in here I learned by looking around and trying things out. It is the mental model I have, currently, for this sort of thing. It may be not (probably isn't) 100% correct, but it is still worth reading.

I've been learning how to use Chicken to develop "real world" programs in scheme. And for that, I wanted to learn how to use csc (the Chicken scheme compiler). Not long before the start, I runned into some troubles in packaging a small application. What I didn't know how to do was deployment in Chicken. It is simple actually, once you learn it. You can compile a program with the -deploy option, and you'll get much of the work already done. However, there is a little more to deployment that is not directly tied to the -deploy command line option.

To actually have an application bundle (understand this only as a folder containing all the files the program depends on to run, which includes libraries), you have to understand a little of how the Chicken's module system works. The -deploy option, which, by the way, I will talk more of later, will only generate the application bundle for the scm file you compile. Any dependencies won't be there: you have to put it yourself.

csc -deploy program.scm

That command will generate a folder: program. In that folder, there will be two files: program and libchicken.so. If your program depends on some extension, it won't be there. You have to put it yourself.

First, what is in there? The program file only has what program.scm specified. The folder also has libchicken.so. It doesn't have any of the shared objects (more on this later) that your program file may need. For example, if you make an extension yourself, you'll have to put the extension shared object file in there too, as well as its import code (more on this later too). So, it is not hard to see that only using the -deploy option is not enough to get your program dependencies generated.

Ok then. Let me start.

The problem here isn't only learning how to put dependencies on a folder. That is actually simple and it will be explained later in this text. The bigger problem is how to use the module's system that Chicken provides, and how is it connected with the shared objects csc can generate for them.

To explain that, it is necessary to have down the concepts of linkage and loading.

Linkage and Loading

Those two concepts were my first struggle to understand how this whole thing works.

There are two types of linkage: statical linkage and shared linkage. I don't know if the second name (shared linkage) is the right name for the technique, but the point is that the first is used to link static libraries and the second is used to link shared libraries.

Linkage normally happens behing the scenes. Most people call the compiler to compile a program, and the compiler will call the linker afterwards. The linker is the program that can link two or more binary objects (BO). Linking a BO to another means connecting the first to the second so that, if the first depends on the second, the system can, using information on the first, get access to the information from the second.

If you have, for example, a C program that calls printf, your program needs to be linked to the C standard library or define printf yourself. It needs to be linked to the C standard library, if you don't define it youself, because it is where printf is defined.

By linking an all dependencies BOs to the object that depends on them, you create your working program. It is as if the linker attaches information on the resuling BO that can answer questions such as "where is this external (to the program) procedure X?" (if your program depends on an external procedure, you need that kind of information, or your program won't run).

Static libraries are linked to a BO by, almost as if, concatenation. The linker gets all the information from the static library, from your BO, and creates a new one containing both's data. If you link all dependencies using static libraries, your resulting BO will be self contained, and it won't depend on anything.

Shared libraries are more elaborated, and have more advantages. A shared library is attached to a BO by generating a new BO with information on where the library is. Shared libraries have the characteristic that they are actually shared (thus the name) by many programs. Imagine printf. Most UNIX programs use printf: there is no need for a new copy of printf on memory for each of these programs. By linking the program to the shared version of the standard C library, the system can see if the libc was already loaded into memory, and if it was, then it can get its already loaded printf; if it wasn't, then it loads it. Therefore, ten programs using libc wouldn't implicate in ten copies of libc in memory, it would just one.

When you execute a program, it is loaded into memory, and then its procedure associated with the name "main" is executed. This is, possibly, a very simple process (the loading and calling to the executer procedure). It just is "uploading" the BOs involved from the disk (hdd, cdrom, flash drive, ...) to the main memory, and start the execution of the "main" procedure.

For static linked BOs, this is just as I said in the previous paragraph. Since all it needs is in itself, there is no need of extra automatic loading. For BOs linked to shared libraries, before executing the program (if it has "main"), the system will load all the shared objects it needs (this is a recursive process that happens for all shared objects involved), if not done already. Having the program and all its dependencies in memory, then the program can start running.

Dynamic loading

Another type of loading, is dynamic loading. Dynamic loading is to loading as dynamic memory allocation is to the creation of auto variables of a C program. You can get memory on demand, at runtime, in a C program by using malloc(): you tell how many more bytes you want, and you'll have it if it is possible; when you don't need them anymore, you use free(). Dynamic loading is similar to that. You can load a BO into memory on demand, given its name (man 3 dlopen), get the value of something given its name (man 3 dlsym), and close it once you won't use it anymore (man 3 dlclose).

One way to install plug-ins on a software is to do with dynamic loaded libraries, which can be, by the way, shared or static libraries. You could make an application that, given a BO, executes its "plugin_main" procedural value on a certain application event. For example, I may want a mail client that allows for a plugin to execute something when an e-mail arrives. This could be done by looking for the name "plugin_mailarrive_event" in the plugin BO, getting its value, and executing it.

In general, when you load a BO into memory for a program, all the names in the loaded BO become available to the program. With automatic shared libraries loading or the loading of static linked BO, you don't have to ask for the value of a "name" given a BO, but with dynamic loading you have to.

As a side note, do not confuse dll files with dynamic loading. The dll file format is the file format windows uses for shared objects. "DLL" (the file format) stands for dynamically linked libraries, and dynamic loading is commonly abbreviated as dll too, standing for dynamically loaded library. A dynamic library may be satic or shared (a dll file may be used as a dll or not =}).

Chicken

Now that linking/loading is explained, let's get to Chicken.

Linkage on Chicken

All that Chicken links to your program is shared objects (it is possible to do static linking, but I won't discuss it here). Here is the result of ldd $(which scmdefs).

linux-vdso.so.1 => (0x00007fffefd0c000) libchicken.so => /usr/lib/libchicken.so (0x00007f2d13062000) libm.so.6 => /lib/libm.so.6 (0x00007f2d12de0000) libdl.so.2 => /lib/libdl.so.2 (0x00007f2d12bdc000) libc.so.6 => /lib/libc.so.6 (0x00007f2d1287e000) /lib/ld-linux-x86-64.so.2 (0x00007f2d13638000)

There, only libchicken.so has definitions my scmdefs (scmdefs is a program writte in scheme and compiled with csc) uses. All the other shared BO are not directly connected to my program.

I am not so sure of this, but it seems that no matter how many dependencies your scheme program has, Chicken will only link libchicken.so to it. This SO (shared object) has the base definitions of Chicken (stuff like add1, +, map, vector-ref and others is there).

Later I will talk a little more about linkage. That is all for now. Understanding that Chicken will only link the SO libchicken.so "scheme dependency" is what I wanted to tell here. All the other SO it links are not directly related to your scheme program.

Chicken modules and libraries

It seems that all scheme dependencies are dynamically loaded. In Chicken, modules are as important as the library files where they're defined. For that, I am going to talk a little about the Chicken's module system here.

Before I go on, it is important to mention that I'll be talking about user defined modules and user created libraries, which is everything except what is defined in the libchicken.so and what is created using the (old and deprecated) units system.

Chicken modules can be understood as libraries or packages are for other languages. And it seems that there is some confusion of the many operators (syntactical or procedural) related to modules: use, import, require-library, require, and others. Although I am not going to specifically explain what each one of those do, I'll explain enough so that you can understand yourself the missing pieces by reading the documentation (installing the Chicken-doc egg is very useful here).

Creating a module can be easily done by putting the code

(module modname * <body>)

where <body> are all the definitions for the module that would be, otherwise, in toplevel, in a scheme source file, modname.scm. For explaining purposes, let's suppose the module modname defines the names A, B and C.

To compile a file containing a module, you can do this:

csc -J -dynamic modname.scm

This will generate the SO for the modname.scm source file, which is modname.so and it will generate a scheme source code that can be compiled too: modname.import.scm (this source file will be explained later). Since this source file has no module definition, it doesn't need the -J option.

csc -dynamic modname.import.scm

The first command tells csc that modname.scm should be compiled as a SO (-dynamic), and that it should generate code so that other modules can import the definitions on all the modules in modname.scm (-J). modname.import.scm contains that generated code. The import command uses it to import a module into the current environment (I'll go back to this later).

The second command compiles that generated code as a SO.

After executing both commands, you'll have modname.so and modname.import.so in your current folder.

Loading libraries, and importing modules

In Chicken the loading process and the modules system are somewhat connected.

When you load a library (done by the command require), it means that you load all modules defined in that library (you also load everything else that is in there, but I'm giving a focus to the modules). You have to load the libraries on runtime (dynamic loading).

So, if you want to load your modname module, you'd first have to do

(require 'modname)

This will cause Chicken to look for modname.so or modname.scm (on this order) on the current folder or in the include path. If it finds something, this something will be loaded into memory.

However, when you create a module, say modname, all names in it, after processed by the module system, will become modname#*the original name*. So A would become modname#A, and B would become modname#B. That naming scheme is to avoid name conflicts (more than one module may define the name A, B and C). While you're creating your module, you don't, and shouldn't worry about this because the mechanisms by which this happens are all automatic. The way you're supposed to use modules is this.

Inside a module, its bindings are visible, outside itself, its bindings will be visible if imported, and for those bindings, after imported, to actually work, you have to load the library that defines the module.

If you want the modules names to be IMPORTED to the current syntactical environment, you have to do

(import modname)

This command generates bindings for the current environment binding all names the module exports to module name added to the character # added to the name: A => modname#A. So,

(import modname)

would generate the bindings A => modname#A, B => modname#B and C => modname#C. Note that you shouldn't rely on this naming convetion that Chicken currently uses for user created modules (I am not sure if it's been this way, or if it'll stay this way).

The require-library command is pretty much a wrapper to the require command. In many cases (require 'ID) is equivalent to (require-library ID): require-library is a macro that calls require. Check out the docs for the specifications of require-library.

Note that, normally, there will be one module per file, but that is not always the case. The command require looks for files: not modules. If a file X.so, you could do this (given that you generated and compiled the import files for the modules):

(require 'X) (import A) (import B)

The use command is an alias to the require-extension command, which requires and imports.

(use id)

is pretty much the same as

(require 'id) (import id)

libchicken.so

All I wrote so far is perfectly true for user defined modules and libraries. But for the built in libraries, which seem to have been built using the units system, little of what I said still holds. You still use the commands require and import, but there are some changes.

Built in libraries (here I also call them library units) are all in libchicken.so. These library units were created before the modules system came around. So when it appeared, in order for people to be able to use only the modules system, it seems that chicken wrapped the modules system around the system used to build the libraries, which I believe to be the units system (more on the relation between library units and modules later).

Normally, when you require some symbol, the file name which is the symbol string contatenated with the ".so" string is loaded from some place. That is not what happens for the modules mapping library units. Talking to zbignew and sjamaan on #chicken@irc.freenode.com, they told me that some library units, to be available, require initializing. When you require a pre-defined module, you may run the initialization program for zero or more library units (the scheme module isn't associated with any of these library units, and the chicken module is associated with three library units).

The important thing here is to know which library units, and more importantly the modules that maps into them, require initializing. The modules that do not require initializing (i.e. they are not associated with library units that require initializing), thus only a import call is needed are scheme, chicken, extras, data-structures and ports do not require initializing. The rest need to be required with the require command.

About internal representation of pre-defined modules, I wouldn't suppose they're actually modules. It doesn't matter though. All you have to actually worry about is explained in the previous paragraph.

The scheme pre-defined module

With the exception of the scheme pre-defined modules, all modules are real modules (composed of module definition and the import code). The scheme module is only composed of the import code. If you try to require something twice, the first require call will succeed, and the other one will just do nothing. If you try to require the scheme module, you will get an error, because there is nothing associated to it.

Curiosity

Out of curiosity try pooking around to see which are the names defined in libscheme.so. You can use the strings command line unix tool to get started.

strings /usr/lib/libchicken.so | grep -i 'pointer->object'

If you have more interested on these internal "affairs", get a look at chicken's source code.

Deployment on Chicken

If you read and understood what I wrote so far, then you are good to create your own libraries with as many modules as you want. Which means that you're gold. Given that you know how to implement what you want, all you have to do now is programming, but that is the best part. But, before that, there are a few details that you need to know.

Deploying in Chicken is very easy after you got that all in your brain.

Executing

csc -deploy program.scm

will create the folder program with two files: program, which is the main program file (the one that should be executed) and libchicken.so.

However, your program dynamically loads other libraries through calls of require. And since require looks at the current folder to find the SO file, all you have to do is put in the program folder all the SO files you program needs: normally those are the libraries you create plus the libraries for the extensions (eggs) you use. What I believe is best is to have all your libraries installed as eggs (check out the documentation on how to create and use extensions, also how to use Chicken-install).

If you created a library, then you have its SO file. Just copy it into the folder and you're done. But if you depend on an egg, you can "deploy" it, which will install the egg in your program folder instead of installing it in the Chicken extensions folder. To deploy an egg, all you have to do is this.

Chicken-install -l $(csi "(display (repository-path))") -t local -deploy -p *program folder* *name*

Where *name* is the egg name and *program folder* is the folder the program is in. For this to work, you need to have installed the egg with the -keep option, which will keep stuff like the egg's source code, intalation files, etc. This is needed because Chicken-install won't just copy SO files: it will generate them from the source to you.

If you have access to the internet, then you can do this (it is better to deploy the eggs that you used for development, unless you can guarantee that the version you're downloading and the version you used are the same).

Chicken-install -deploy -p *program folder* *name*

Using this guide

This guide, by itself, is much incomplete. You must use it as a complement to the Chicken documentation.

Credits

This guide was written by Pedro Henrique Antunes de Oliveira. My e-mail is phao1989 at gmail dot com.