2. Platforms

 

You cannot have a science without measurement.

 R. W. Hamming

Building executables from C source code is a complex task. An innocent looking call of gcc will invoke a pre-processor, a multi-pass compiler, an assembler and finally a linker. Using all these tools to plant virus code into another executable makes the result either prohibitively large, or very dependent on the completeness of the target installation.

Real viruses approach the problem from the other end. They are aggressively optimized for code size and do only what's absolutely necessary. Basically they just copy one chunk of code and patch a few addresses at hard coded offsets.

However, this has drastic effects:

There are ways to circumvent these limitations. But they are complicated and make the virus more likely to fail.

2.1. Executable and linkable format

Another natural limitation of viruses is rigid dependency on the file format of target executables. These formats differ a lot. Even on the same hardware architecture and under the same operating system. Furthermore executable are not designed with post link-time modifications in mind. It's rare for a virus to support more than one infection method. This document is about the format used on recent versions of Linux, FreeBSD and Solaris. [1]

2.1.2. Viewers

GNU binutils provides two utilities to view ELF headers, objdump and readelf. [9] Functionality of both tools overlap, but I think the output of readelf is nicer. On Solaris the native tools for this purpose are called dump and avdp.

2.2. Assembly language documentation

ELF is used for a variety of both 32 bit and 64 bit architectures. Obviously you need to handle assembly language for each platform. A good starting point is "Linux Assembly" [10] and "Assembly Language Related Web Sites". [11]

2.2.1. alpha

Introduction to Alpha [12]
Alpha Assembly Language Guide [13]
Assembly Language Programmer's Guide [14]

2.2.2. i386

Assembly-HOWTO. [15] Description of tools and sites for Linux.
FAQ of comp.lang.asm.x86 [16]
"Robin Miyagi's Linux Programming" [17] features a tutorial and interesting links.
"Assembly resources" [18] covers advanced topics.
IA-32 Intel Architecture Software Developer's Manual [19]
"The Place on the Net to Learn Assembly Language Programming" [20]
The Art of Assembly Language. 32-bit Linux Edition Featuring HLA. [21]
X86 Architecture, low-level programming, freeware [22]
Dr. Dobb's Microprocessor Resources [23]
FreeBSD Assembly Language Tutorial [24]

2.2.3. sparc

SPARC Standards Documents Depository [25]
SPARC Assembly Language Reference Manual [26]
A Laboratory Manual for the SPARC [27]
SPARC technical links [28]

2.3. Assemblers and disassemblers

A debugger lets you see what is going on "inside" another program while it executes. gdb can also show a plain disassembly of the code, and can do so without executing a single instruction. This listing does not include a hex dump of opcodes, however. On the other hand pure disassemblers take shortcuts; they don't have a complete picture of the target executable.

objdump is part of GNU binutils. It is advertised as a means to display information from object files. But objdump can also work on executables. And it provides option --disassemble. Since it does not resolve function names in shared libraries it cannot fully replace gdb, though.

By default all GNU disassembly tools adhere to the syntax of the GNU assembler. Veterans of i386 programming consider this style repulsive, however. gdb provides statement set disassembly-flavor intel to lower the contrast. And objdump has option -Mintel for similar effect. Still I prefer ndisasm [29] on i386 and will use it where possible. This tool has absolutely no understanding of ELF (or any other file format). But for the scope of this document this is a feature. The calculations necessary to get at the interesting bytes are interesting themselves.

In this document input for assemblers (including nasm) is stored in .S files. Traditional cc treat that as "assembler code which must be preprocessed by cpp". This is required on platform alpha where symbolic names for registers are not part of the assembly language. Output of disassemblers ends up as .asm.

2.4. Be fertile and reproduce

The primary quality of this document is reproducibility. Every tiny bit of information should be proved by a working example. Since I don't trust myself all output files are rebuild for every release. All sections titled "Output" are real product of source code and shell scripts included in this document. Most numbers and calculations are processed by a Perl script parsing these output files.

The document itself is written in DocBook, [30] a XML document type definition. [31] Conversion to HTML is the last step of a Makefile that builds and runs all examples. However, this means that I can't provide one document comparing two platforms. Instead I set up everything for conditional compilation. I then build one consistent variation of the document on a single system.

You are now reading the platform independent part. The links below lead to actual examples, and the actual story of constantly improving technique. This part continues with general topics and larger chunks of source code. It is a bit like a huge appendix, since the platform parts frequently refer to chapters here.

2.5. i386-redhat8.0-linux

2.6. sparc-debian2.2-linux

2.7. sparc-sunos5.9

Notes

[1]

All examples for Solaris use the value returned by uname(2) as system name, i.e. "SunOS". And the version numbers as told by marketing make little sense. See http://www.ocf.berkeley.edu/solaris/versions/

[2]

A nice introduction for the uninitiated is http://www.tldp.org/LDP/tlk/kernel/processes.html#tth_sEc4.8

[3]

Present on Linux (part of glibc), FreeBSD and SunOS.

[4]

Canonical Postscript document: ftp://tsx.mit.edu/pub/linux/packages/GCC/ELF.doc.tar.gz
A flat-text version: http://www.muppetlabs.com/~breadbox/software/ELF.txt

[5]

http://www.linuxbase.org/spec/gLSB/gLSB/tocobjformat.html

[6]

http://www.netbsd.org/Documentation/elf.html

[7]

http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

[8]

This means file vmlinux. vmlinuz is compressed and prefixed with a boot-sector. See http://www.tldp.org/LDP/tlk/kernel/processes.html#tth_sEc4.8

[9]

readelf is included only since version 2.10 of GNU binutils and is missing on old distributions like SuSE 6.0. This might be the reason that Silvio Cesare does not mention readelf anywhere in his classic works.

[10]

http://linuxassembly.org/

[11]

http://www2.dgsys.com/~raymoon/asmlinks.html

[12]

http://www.cs.hut.fi/~cessu/compilers/alpha-intro.html

[13]

http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15213-f98/doc/alpha-guide.pdf

[14]

http://www.tru64unix.compaq.com/docs/base_doc/DOCUMENTATION/HTML/AA-PS31D-TET1_html/TITLE.html

[15]

http://www.tldp.org/HOWTO/Assembly-HOWTO

[16]

http://www2.dgsys.com/~raymoon/x86faqs.html

[17]

http://www.geocities.com/SiliconValley/Ridge/2544/

[18]

http://www.agner.org/assem/

[19]

http://developer.intel.com/design/pentium4/manuals/245470.htm

[20]

http://webster.cs.ucr.edu/index.html

[21]

http://webster.cs.ucr.edu/Page_AoALinux/0_AoAHLA.html

[22]

http://www.goosee.com/x86/

[23]

http://www.x86.org/

[24]

http://www.int80h.org

[25]

http://www.sparc.com/standards.html

[26]

http://docs.sun.com/?p=/doc/816-1681

[27]

http://www.cs.unm.edu/~maccabe/classes/341/labman/labman.html

[28]

http://www.users.qwest.net/~eballen1/sparc.tech.links.html

[29]

http://sourceforge.net/projects/nasm/

[30]

http://docbook.sourceforge.net

[31]

http://xml.coverpages.org/general.html#overview