|
Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://rtm-cs.sinp.msu.ru/manual/howto/Assembly-HOWTO.html
Дата изменения: Tue May 30 17:36:56 2000 Дата индексирования: Mon Oct 1 21:11:23 2012 Кодировка: |
assembly, assembler, asm, inline asm, macroprocessor, preprocessor,
32-bit, IA32, i386, x86, gas, as86, nasm, OS, kernel, system, libc,
system call, interrupt, small, fast, embedded, hardware, port
Copyright © 1999-2000 Konstantin Boldyshev.
Copyright © 1996-1999 François-René Rideau.
This document may be distributed only subject to the terms and conditions set forth in the LDP License. It may be reproduced and distributed in whole or in part, in any medium physical or electronic, provided that this license notice is displayed in the reproduction. Commercial redistribution is permitted and encouraged.
All modified documents, including translations, anthologies, and partial documents, must meet the following requirements:
This document aims answering questions of those who program or want to program 32-bit x86 assembly using free software, particularly under the Linux operating system. It also points to other documents about non-free, non-x86, or non-32-bit assemblers, although this is not its primary goal.
Because the main interest of assembly programming is to build the guts of operating systems, interpreters, compilers, and games, where C compiler fails to provide the needed expressiveness (performance is more and more seldom as issue), we are focusing on development of such kind of software.
This is an interactively evolving document: you are especially invited to ask questions, to answer questions, to correct given answers, to give pointers to new software, to point the current maintainer to bugs or deficiencies in the pages. In one word, contribute!
To contribute, please contact the Assembly-HOWTO maintainer. At the time of this writing, it is Konstantin Boldyshev and no more François-René Rideau. I (Faré) had been looking for some time for a serious hacker to replace me as maintainer of this document, and am pleased to announce Konstantin as my worthy successor.
This document contains answers to some frequently asked questions. At many places, Universal Resource Locators (URL) are given for some software or documentation repository. Please see that the most useful repositories are mirrored, and that by accessing a nearer mirror site, you relieve the whole Internet from unneeded network traffic, while saving your own precious time. Particularly, there are large repositories all over the world, that mirror other popular repositories. You should learn and note what are those places near you (networkwise). Sometimes, the list of mirrors is listed in a file, or in a login message. Please heed the advice. Else, you should ask archie about the software you're looking for...
The most recent official version of this document is available from Linux Assembly and LDP sites. If you are reading a few-months-old copy, please check the urls above for a new version.
COPYING,
with a library version in a file named COPYING.LIB.
Literature from the
FSF
(free software foundation) might help you, too.
Each version includes a few fixes and minor corrections, that need not to be repeatedly mentioned every time.
Added HLA, TALC; rearrangements in RESOURCES, QUICK START, ASSEMBLERS; few new pointers
finally managed to state LDP license on document, new resources added, misc fixes
new resources on different CPUs
new resources, misc corrections
url updates, changes in GAS example
RESOURCES (former POINTERS) section completely redone, various url updates.
New pointers, updates and some rearrangements. Rewrite of sgml source.
Discussion about libc or not libc continues. New web pointers and and overall updates.
"QUICK START" section rearranged, added GAS example. Several new web pointers.
GAS has 16-bit mode. New maintainer (at last): Konstantin Boldyshev. Discussion about libc or not libc. Added section "QUICK START" with examples of using assembly.
process argument passing (argc,argv,environ) in assembly. This is yet another "last release by Faré before new maintainer takes over". Nobody knows who might be the new maintainer.
clean up and updates.
*
corrections about gcc invocation
release for LSL 6th edition.
*
*
info on 16-bit mode access from Linux.
still more on "how not to use assembly"; updates on NASM, GAS.
*
*
Release for DrLinux
Vapor announce of a new Assembly-HOWTO maintainer.
Added section "DO YOU NEED ASSEMBLY?"
NASM moved: now is before AS86
CREDITS section added
first release of the HOWTO as such.
text mini-HOWTO transformed into a full linuxdoc-sgml HOWTO, to see what the SGML tools are like.
*
What? I had forgotten to point to terse???
point to French translated version
NASM is getting pretty slick
more about cross-compiling -- See on sunsite: devel/msdos/
Created the History. Added pointers in cross-compiling section. Added section about I/O programming under Linux (particularly video).
*
*
*
Francois-Rene "Faré" Rideau <fare@tunes.org> creates and publishes the first mini-HOWTO, because "I'm sick of answering ever the same questions on comp.lang.asm.x86"
I would like to thank following persons, by order of appearance:
Well, I wouldn't want to interfere with what you're doing, but here is some advice from hard-earned experience.
Assembly can express very low-level things:
Assembly is a very low-level language (the lowest above hand-coding the binary instruction patterns). This means
All in all, you might find that though using assembly is sometimes needed, and might even be useful in a few cases where it is not, you'll want to:
Even in cases when assembly is needed (e.g. OS development), you'll find that not so much of it is, and that the above principles hold.
See the Linux kernel sources concerning this: as little assembly as needed, resulting in a fast, reliable, portable, maintainable OS. Even a successful game like DOOM was almost massively written in C, with a tiny part only being written in assembly for speed up.
As says Charles Fiterman on comp.compilers about human vs computer-generated assembly code,
" The human should always win and here is why.
Languages like ObjectiveCAML, SML, CommonLISP, Scheme, ADA, Pascal, C, C++, among others, all have free optimizing compilers that will optimize the bulk of your programs, and often do better than hand-coded assembly even for tight loops, while allowing you to focus on higher-level details, and without forbidding you to grab a few percent of extra performance in the above-mentioned way, once you've reached a stable design. Of course, there are also commercial optimizing compilers for most of these languages, too!
Some languages have compilers that produce C code, which can be further optimized by a C compiler: LISP, Scheme, Perl, and many other. Speed is fairly good.
As for speeding code up, you should do it only for parts of a program that a profiling tool has consistently identified as being a performance bottleneck.
Hence, if you identify some code portion as being too slow, you should
Finally, before you end up writing assembly, you should inspect generated code, to check that the problem really is with bad code generation, as this might really not be the case: compiler-generated code might be better than what you'd have written, particularly on modern multi-pipelined architectures! Slow parts of a program might be intrinsically so. Biggest problems on modern architectures with fast processors are due to delays from memory access, cache-misses, TLB-misses, and page-faults; register optimization becomes useless, and you'll more profitably re-think data structures and threading to achieve better locality in memory access. Perhaps a completely different approach to the problem might help, then.
There are many reasons to inspect compiler-generated assembly code. Here are what you'll do with such code:
The standard way to have assembly code be generated
is to invoke your compiler with the -S flag.
This works with most Unix compilers,
including the GNU C Compiler (GCC), but YMMV.
As for GCC, it will produce more understandable assembly code with
the -fverbose-asm command-line option.
Of course, if you want to get good assembly code,
don't forget your usual optimization options and hints!
In general case you don't need to use assembly language in Linux programming. Unlike DOS, you do not have to write Linux drivers in assembly (well, actually you can do it if you really want). And with modern optimizing compilers, if you care of speed optimization for different CPU's, it's much simpler to write in C. However, if you're reading this, you might have some reason to use assembly instead of C/C++.
You may need to use assembly, or you may want to use assembly. Shortly, main practical reasons why you may need to get into Linux assembly are small code and libc independence. Non-practical (and most often) reason is being just an old crazy hacker, who has twenty years old habit of doing everything in assembly language.
Also, if you're porting Linux to some embedded hardware
you can be quite short at size of whole system:
you need to fit kernel, libc
and all that stuff of (file|find|text|sh|etc.) utils
into several hundreds of kilobytes,
and every kilobyte costs much.
So, one of the ways you've got is to rewrite some
(or all) parts of system in assembly,
and this will really save you a lot of space.
For instance, a simple httpd written in assembly
can take less than 600 bytes;
you can fit a webserver, consisting of kernel and httpd,
in 400 KB or less... Think about it.
The well-known GNU C/C++ Compiler (GCC), an optimizing 32-bit compiler at the heart of the GNU project, supports the x86 architecture quite well, and includes the ability to insert assembly code in C programs, in such a way that register allocation can be either specified or left to GCC. GCC works on most available platforms, notably Linux, *BSD, VSTa, OS/2, *DOS, Win*, etc.
The original GCC site is the GNU FTP site ftp://prep.ai.mit.edu/pub/gnu/gcc/ together with all released application software from the GNU project. Linux-configured and precompiled versions can be found in ftp://metalab.unc.edu/pub/Linux/GCC/ There exists a lot of FTP mirrors of both sites. everywhere around the world, as well as CD-ROM copies.
GCC development has split into two branches some time ago (GCC 2.8 and EGCS), but they merged back, and current GCC webpage is http://gcc.cygnus.com.
Sources adapted to your favorite OS, and binaries precompiled for it, should be found at your usual FTP sites.
For most popular DOS port of GCC is named DJGPP, and can be found in directories of such name in FTP sites. See:
There is also a port of GCC to OS/2 named EMX, that also works under DOS, and includes lots of unix-emulation library routines. See around the following site: ftp://ftp-os2.cdrom.com/pub/os2/emx09c/.
The documentation of GCC includes documentation files in texinfo format. You can compile them with tex and print then result, or convert them to .info, and browse them with emacs, or convert them to .html, or nearly whatever you like. convert (with the right tools) to whatever you like, or just read as is. The .info files are generally found on any good installation for GCC.
The right section to look for is:
C Extensions::Extended Asm::
Section
Invoking GCC::Submodel Options::i386 Options::
might help too.
Particularly, it gives the i386 specific constraint names for registers:
abcdSDB correspond to
%eax,
%ebx,
%ecx,
%edx,
%esi,
%edi
and
%ebp
respectively (no letter for %esp).
The DJGPP Games resource (not only for game hackers) had page specifically about assembly, but it's down. Its data have nonetheless been recovered on the DJGPP site, that contains a mine of other useful information: http://www.delorie.com/djgpp/doc/brennan/, and in the DJGPP Quick ASM Programming Guide.
GCC depends on GAS for assembling, and follow its syntax (see below); do mind that inline asm needs percent characters to be quoted so they be passed to GAS. See the section about GAS below.
Find lots of useful examples in the linux/include/asm-i386/
subdirectory of the sources for the Linux kernel.
Because assembly routines from the kernel headers
(and most likely your own headers,
if you try making your assembly programming as clean
as it is in the linux kernel)
are embedded in extern inline functions,
GCC must be invoked with the -O flag
(or -O2, -O3, etc),
for these routines to be available.
If not, your code may compile, but not link properly,
since it will be looking for non-inlined extern functions
in the libraries against which your program is being linked!
Another way is to link against libraries that include fallback
versions of the routines.
Inline assembly can be disabled with -fno-asm,
which will have the compiler die when using extended inline asm syntax,
or else generate calls to an external function named asm()
that the linker can't resolve.
To counter such flag, -fasm restores treatment
of the asm keyword.
More generally, good compile flags for GCC on the x86 platform are
gcc -O2 -fomit-frame-pointer -W -Wall
-O2 is the good optimization level in most cases.
Optimizing besides it takes longer, and yields code that is a lot larger,
but only a bit faster;
such overoptimization might be useful for tight loops only (if any),
which you may be doing in assembly anyway.
In cases when you need really strong compiler optimization for a few files,
do consider using up to -O6.
-fomit-frame-pointer allows generated code to skip the stupid
frame pointer maintenance, which makes code smaller and faster,
and frees a register for further optimizations.
It precludes the easy use of debugging tools (gdb),
but when you use these,
you just don't care about size and speed anymore anyway.
-W -Wall enables all warnings
and helps you catch obvious stupid errors.
You can add some CPU-specific -m486 or such flag so that
GCC will produce code that is more adapted to your precise computer.
Note that modern GCC has -mpentium and such flags
(and
PGCC has even more),
whereas GCC 2.7.x and older versions do not.
A good choice of CPU-specific flags should be in the Linux kernel.
Check the texinfo documentation of your current GCC installation for more.
-m386 will help optimize for size,
hence also for speed on computers whose memory is tight and/or loaded,
since big programs cause swap, which more than counters
any "optimization" intended by the larger code.
In such settings, it might be useful to stop using C,
and use instead a language that favors code factorization,
such as a functional language and/or FORTH,
and use a bytecode- or wordcode- based implementation.
Note that you can vary code generation flags from file to file, so performance-critical files will use maximum optimization, whereas other files will be optimized for size.
To optimize even more, option -mregparm=2
and/or corresponding function attribute might help,
but might pose lots of problems when linking to foreign code,
including the libc.
There are ways to correctly declare foreign functions
so the right call sequences be generated,
or you might want to recompile the foreign libraries
to use the same register-based calling convention...
Note that you can add make these flags the default by editing file
/usr/lib/gcc-lib/i486-linux/2.7.2.3/specs
or wherever that is on your system
(better not add -W -Wall there, though).
The exact location of the GCC specs files on your system
can be found by asking gcc -v.
GAS is the GNU Assembler, that GCC relies upon.
Find it at the same place where you found GCC, in a package named binutils.
The latest version is available from HJLu at ftp://ftp.varesearch.com/pub/support/hjl/binutils/.
Because GAS was invented to support a 32-bit unix compiler, it uses standard AT&T syntax, which resembles a lot the syntax for standard m68k assemblers, and is standard in the UNIX world. This syntax is no worse, no better than the Intel syntax. It's just different. When you get used to it, you find it much more regular than the Intel syntax, though a bit boring.
Here are the major caveats about GAS syntax:
%, so that
registers are %eax, %dl and so on,
instead of just eax, dl, etc.
This makes it possible to include external C symbols directly
in assembly source, without any risk of confusion, or any need
for ugly underscore prefixes.mov ax,dx (move contents of
register dx into register ax) will be in GAS syntax
mov %dx, %ax.b for (8-bit) byte,
w