Документ взят из кэша поисковой машины. Адрес оригинального документа : http://rtm-cs.sinp.msu.ru/manual/howto/Assembly-HOWTO.html
Дата изменения: Tue May 30 17:36:56 2000
Дата индексирования: Mon Oct 1 21:11:23 2012
Кодировка:
Linux Assembly HOWTO

Linux Assembly HOWTO

Konstantin Boldyshev and François-René Rideau

v0.5i, May 4, 2000


This is the Linux Assembly HOWTO. This document describes how to program in assembly language using FREE programming tools, focusing on development for or from the Linux Operating System, mostly on IA32 (i386) platform. Included material may or may not be applicable to other hardware and/or software platforms. Contributions about them will be gladly accepted. Keywords: assembly, assembler, asm, inline asm, macroprocessor, preprocessor, 32-bit, IA32, i386, x86, gas, as86, nasm, OS, kernel, system, libc, system call, interrupt, small, fast, embedded, hardware, port

1. INTRODUCTION

2. DO YOU NEED ASSEMBLY?

3. ASSEMBLERS

4. METAPROGRAMMING/MACROPROCESSING

5. CALLING CONVENTIONS

6. QUICK START

7. RESOURCES


1. INTRODUCTION

1.1 Legal Blurb

Copyright © 1999-2000 Konstantin Boldyshev.

Copyright © 1996-1999 François-René Rideau.

This document may be distributed only subject to the terms and conditions set forth in the LDP License. It may be reproduced and distributed in whole or in part, in any medium physical or electronic, provided that this license notice is displayed in the reproduction. Commercial redistribution is permitted and encouraged.

All modified documents, including translations, anthologies, and partial documents, must meet the following requirements:

1.2 Foreword

This document aims answering questions of those who program or want to program 32-bit x86 assembly using free software, particularly under the Linux operating system. It also points to other documents about non-free, non-x86, or non-32-bit assemblers, although this is not its primary goal.

Because the main interest of assembly programming is to build the guts of operating systems, interpreters, compilers, and games, where C compiler fails to provide the needed expressiveness (performance is more and more seldom as issue), we are focusing on development of such kind of software.

1.3 Important Note

This is an interactively evolving document: you are especially invited to ask questions, to answer questions, to correct given answers, to give pointers to new software, to point the current maintainer to bugs or deficiencies in the pages. In one word, contribute!

To contribute, please contact the Assembly-HOWTO maintainer. At the time of this writing, it is Konstantin Boldyshev and no more François-René Rideau. I (Faré) had been looking for some time for a serious hacker to replace me as maintainer of this document, and am pleased to announce Konstantin as my worthy successor.

How to use this document

This document contains answers to some frequently asked questions. At many places, Universal Resource Locators (URL) are given for some software or documentation repository. Please see that the most useful repositories are mirrored, and that by accessing a nearer mirror site, you relieve the whole Internet from unneeded network traffic, while saving your own precious time. Particularly, there are large repositories all over the world, that mirror other popular repositories. You should learn and note what are those places near you (networkwise). Sometimes, the list of mirrors is listed in a file, or in a login message. Please heed the advice. Else, you should ask archie about the software you're looking for...

The most recent official version of this document is available from Linux Assembly and LDP sites. If you are reading a few-months-old copy, please check the urls above for a new version.

Other related documents

1.4 History

Each version includes a few fixes and minor corrections, that need not to be repeatedly mentioned every time.

Version 0.5i 04 May 2000

Added HLA, TALC; rearrangements in RESOURCES, QUICK START, ASSEMBLERS; few new pointers

Version 0.5h 09 Apr 2000

finally managed to state LDP license on document, new resources added, misc fixes

Version 0.5g 26 Mar 2000

new resources on different CPUs

Version 0.5f 02 Mar 2000

new resources, misc corrections

Version 0.5e 10 Feb 2000

url updates, changes in GAS example

Version 0.5d 01 Feb 2000

RESOURCES (former POINTERS) section completely redone, various url updates.

Version 0.5c 05 Dec 1999

New pointers, updates and some rearrangements. Rewrite of sgml source.

Version 0.5b 19 Sep 1999

Discussion about libc or not libc continues. New web pointers and and overall updates.

Version 0.5a 01 Aug 1999

"QUICK START" section rearranged, added GAS example. Several new web pointers.

Version 0.5 25 July 1999

GAS has 16-bit mode. New maintainer (at last): Konstantin Boldyshev. Discussion about libc or not libc. Added section "QUICK START" with examples of using assembly.

Version 0.4q 22 June 1999

process argument passing (argc,argv,environ) in assembly. This is yet another "last release by Faré before new maintainer takes over". Nobody knows who might be the new maintainer.

Version 0.4p 6 June 1999

clean up and updates.

Version 0.4o 1 December 1998

*

Version 0.4m 23 March 1998

corrections about gcc invocation

Version 0.4l 16 November 1997

release for LSL 6th edition.

Version 0.4k 19 October 1997

*

Version 0.4j 7 September 1997

*

Version 0.4i 17 July 1997

info on 16-bit mode access from Linux.

Version 0.4h 19 Jun 1997

still more on "how not to use assembly"; updates on NASM, GAS.

Version 0.4g 30 Mar 1997

*

Version 0.4f 20 Mar 1997

*

Version 0.4e 13 Mar 1997

Release for DrLinux

Version 0.4d 28 Feb 1997

Vapor announce of a new Assembly-HOWTO maintainer.

Version 0.4c 9 Feb 1997

Added section "DO YOU NEED ASSEMBLY?"

Version 0.4b 3 Feb 1997

NASM moved: now is before AS86

Version 0.4a 20 Jan 1997

CREDITS section added

Version 0.4 20 Jan 1997

first release of the HOWTO as such.

Version 0.4pre1 13 Jan 1997

text mini-HOWTO transformed into a full linuxdoc-sgml HOWTO, to see what the SGML tools are like.

Version 0.3l 11 Jan 1997

*

Version 0.3k 19 Dec 1996

What? I had forgotten to point to terse???

Version 0.3j 24 Nov 1996

point to French translated version

Version 0.3i 16 Nov 1996

NASM is getting pretty slick

Version 0.3h 6 Nov 1996

more about cross-compiling -- See on sunsite: devel/msdos/

Version 0.3g 2 Nov 1996

Created the History. Added pointers in cross-compiling section. Added section about I/O programming under Linux (particularly video).

Version 0.3f 17 Oct 1996

*

Version 0.3c 15 Jun 1996

*

Version 0.2 04 May 1996

*

Version 0.1 23 Apr 1996

Francois-Rene "Faré" Rideau <fare@tunes.org> creates and publishes the first mini-HOWTO, because "I'm sick of answering ever the same questions on comp.lang.asm.x86"

1.5 Credits

I would like to thank following persons, by order of appearance:


2. DO YOU NEED ASSEMBLY?

Well, I wouldn't want to interfere with what you're doing, but here is some advice from hard-earned experience.

2.1 Pros and Cons

The advantages of Assembly

Assembly can express very low-level things:

The disadvantages of Assembly

Assembly is a very low-level language (the lowest above hand-coding the binary instruction patterns). This means

Assessment

All in all, you might find that though using assembly is sometimes needed, and might even be useful in a few cases where it is not, you'll want to:

Even in cases when assembly is needed (e.g. OS development), you'll find that not so much of it is, and that the above principles hold.

See the Linux kernel sources concerning this: as little assembly as needed, resulting in a fast, reliable, portable, maintainable OS. Even a successful game like DOOM was almost massively written in C, with a tiny part only being written in assembly for speed up.

2.2 How to NOT use Assembly

General procedure to achieve efficient code

As says Charles Fiterman on comp.compilers about human vs computer-generated assembly code,

" The human should always win and here is why.

The human wins because he can use the machine. "

Languages with optimizing compilers

Languages like ObjectiveCAML, SML, CommonLISP, Scheme, ADA, Pascal, C, C++, among others, all have free optimizing compilers that will optimize the bulk of your programs, and often do better than hand-coded assembly even for tight loops, while allowing you to focus on higher-level details, and without forbidding you to grab a few percent of extra performance in the above-mentioned way, once you've reached a stable design. Of course, there are also commercial optimizing compilers for most of these languages, too!

Some languages have compilers that produce C code, which can be further optimized by a C compiler: LISP, Scheme, Perl, and many other. Speed is fairly good.

General procedure to speed your code up

As for speeding code up, you should do it only for parts of a program that a profiling tool has consistently identified as being a performance bottleneck.

Hence, if you identify some code portion as being too slow, you should

Finally, before you end up writing assembly, you should inspect generated code, to check that the problem really is with bad code generation, as this might really not be the case: compiler-generated code might be better than what you'd have written, particularly on modern multi-pipelined architectures! Slow parts of a program might be intrinsically so. Biggest problems on modern architectures with fast processors are due to delays from memory access, cache-misses, TLB-misses, and page-faults; register optimization becomes useless, and you'll more profitably re-think data structures and threading to achieve better locality in memory access. Perhaps a completely different approach to the problem might help, then.

Inspecting compiler-generated code

There are many reasons to inspect compiler-generated assembly code. Here are what you'll do with such code:

The standard way to have assembly code be generated is to invoke your compiler with the -S flag. This works with most Unix compilers, including the GNU C Compiler (GCC), but YMMV. As for GCC, it will produce more understandable assembly code with the -fverbose-asm command-line option. Of course, if you want to get good assembly code, don't forget your usual optimization options and hints!

2.3 Linux and assembly

In general case you don't need to use assembly language in Linux programming. Unlike DOS, you do not have to write Linux drivers in assembly (well, actually you can do it if you really want). And with modern optimizing compilers, if you care of speed optimization for different CPU's, it's much simpler to write in C. However, if you're reading this, you might have some reason to use assembly instead of C/C++.

You may need to use assembly, or you may want to use assembly. Shortly, main practical reasons why you may need to get into Linux assembly are small code and libc independence. Non-practical (and most often) reason is being just an old crazy hacker, who has twenty years old habit of doing everything in assembly language.

Also, if you're porting Linux to some embedded hardware you can be quite short at size of whole system: you need to fit kernel, libc and all that stuff of (file|find|text|sh|etc.) utils into several hundreds of kilobytes, and every kilobyte costs much. So, one of the ways you've got is to rewrite some (or all) parts of system in assembly, and this will really save you a lot of space. For instance, a simple httpd written in assembly can take less than 600 bytes; you can fit a webserver, consisting of kernel and httpd, in 400 KB or less... Think about it.


3. ASSEMBLERS

3.1 GCC Inline Assembly

The well-known GNU C/C++ Compiler (GCC), an optimizing 32-bit compiler at the heart of the GNU project, supports the x86 architecture quite well, and includes the ability to insert assembly code in C programs, in such a way that register allocation can be either specified or left to GCC. GCC works on most available platforms, notably Linux, *BSD, VSTa, OS/2, *DOS, Win*, etc.

Where to find GCC

The original GCC site is the GNU FTP site ftp://prep.ai.mit.edu/pub/gnu/gcc/ together with all released application software from the GNU project. Linux-configured and precompiled versions can be found in ftp://metalab.unc.edu/pub/Linux/GCC/ There exists a lot of FTP mirrors of both sites. everywhere around the world, as well as CD-ROM copies.

GCC development has split into two branches some time ago (GCC 2.8 and EGCS), but they merged back, and current GCC webpage is http://gcc.cygnus.com.

Sources adapted to your favorite OS, and binaries precompiled for it, should be found at your usual FTP sites.

For most popular DOS port of GCC is named DJGPP, and can be found in directories of such name in FTP sites. See:

http://www.delorie.com/djgpp/

There is also a port of GCC to OS/2 named EMX, that also works under DOS, and includes lots of unix-emulation library routines. See around the following site: ftp://ftp-os2.cdrom.com/pub/os2/emx09c/.

Where to find docs for GCC Inline Asm

The documentation of GCC includes documentation files in texinfo format. You can compile them with tex and print then result, or convert them to .info, and browse them with emacs, or convert them to .html, or nearly whatever you like. convert (with the right tools) to whatever you like, or just read as is. The .info files are generally found on any good installation for GCC.

The right section to look for is: C Extensions::Extended Asm::

Section Invoking GCC::Submodel Options::i386 Options:: might help too. Particularly, it gives the i386 specific constraint names for registers: abcdSDB correspond to %eax, %ebx, %ecx, %edx, %esi, %edi and %ebp respectively (no letter for %esp).

The DJGPP Games resource (not only for game hackers) had page specifically about assembly, but it's down. Its data have nonetheless been recovered on the DJGPP site, that contains a mine of other useful information: http://www.delorie.com/djgpp/doc/brennan/, and in the DJGPP Quick ASM Programming Guide.

GCC depends on GAS for assembling, and follow its syntax (see below); do mind that inline asm needs percent characters to be quoted so they be passed to GAS. See the section about GAS below.

Find lots of useful examples in the linux/include/asm-i386/ subdirectory of the sources for the Linux kernel.

Invoking GCC to build proper inline assembly code

Because assembly routines from the kernel headers (and most likely your own headers, if you try making your assembly programming as clean as it is in the linux kernel) are embedded in extern inline functions, GCC must be invoked with the -O flag (or -O2, -O3, etc), for these routines to be available. If not, your code may compile, but not link properly, since it will be looking for non-inlined extern functions in the libraries against which your program is being linked! Another way is to link against libraries that include fallback versions of the routines.

Inline assembly can be disabled with -fno-asm, which will have the compiler die when using extended inline asm syntax, or else generate calls to an external function named asm() that the linker can't resolve. To counter such flag, -fasm restores treatment of the asm keyword.

More generally, good compile flags for GCC on the x86 platform are


        gcc -O2 -fomit-frame-pointer -W -Wall

-O2 is the good optimization level in most cases. Optimizing besides it takes longer, and yields code that is a lot larger, but only a bit faster; such overoptimization might be useful for tight loops only (if any), which you may be doing in assembly anyway. In cases when you need really strong compiler optimization for a few files, do consider using up to -O6.

-fomit-frame-pointer allows generated code to skip the stupid frame pointer maintenance, which makes code smaller and faster, and frees a register for further optimizations. It precludes the easy use of debugging tools (gdb), but when you use these, you just don't care about size and speed anymore anyway.

-W -Wall enables all warnings and helps you catch obvious stupid errors.

You can add some CPU-specific -m486 or such flag so that GCC will produce code that is more adapted to your precise computer. Note that modern GCC has -mpentium and such flags (and PGCC has even more), whereas GCC 2.7.x and older versions do not. A good choice of CPU-specific flags should be in the Linux kernel. Check the texinfo documentation of your current GCC installation for more.

-m386 will help optimize for size, hence also for speed on computers whose memory is tight and/or loaded, since big programs cause swap, which more than counters any "optimization" intended by the larger code. In such settings, it might be useful to stop using C, and use instead a language that favors code factorization, such as a functional language and/or FORTH, and use a bytecode- or wordcode- based implementation.

Note that you can vary code generation flags from file to file, so performance-critical files will use maximum optimization, whereas other files will be optimized for size.

To optimize even more, option -mregparm=2 and/or corresponding function attribute might help, but might pose lots of problems when linking to foreign code, including the libc. There are ways to correctly declare foreign functions so the right call sequences be generated, or you might want to recompile the foreign libraries to use the same register-based calling convention...

Note that you can add make these flags the default by editing file /usr/lib/gcc-lib/i486-linux/2.7.2.3/specs or wherever that is on your system (better not add -W -Wall there, though). The exact location of the GCC specs files on your system can be found by asking gcc -v.

3.2 GAS

GAS is the GNU Assembler, that GCC relies upon.

Where to find it

Find it at the same place where you found GCC, in a package named binutils.

The latest version is available from HJLu at ftp://ftp.varesearch.com/pub/support/hjl/binutils/.

What is this AT&T syntax

Because GAS was invented to support a 32-bit unix compiler, it uses standard AT&T syntax, which resembles a lot the syntax for standard m68k assemblers, and is standard in the UNIX world. This syntax is no worse, no better than the Intel syntax. It's just different. When you get used to it, you find it much more regular than the Intel syntax, though a bit boring.

Here are the major caveats about GAS syntax: