The Netwide Disassembler, NDISASM
The Netwide Disassembler is a small companion program to the Netwide Assembler, NASM. It seemed a shame to have an x86 assembler, complete with a full instruction table, and not make as much use of it as possible, so here's a disassembler which shares the instruction table (and some other bits of code) with NASM.
The Netwide Disassembler does nothing except to produce disassemblies of
binary source files. NDISASM does not have any understanding of
object file formats, like objdump
, and it will not understand
DOS .EXE
files like debug
will. It just
disassembles.
To disassemble a file, you will typically use a command of the form
ndisasm -b {16|32|64} filename
NDISASM can disassemble 16-, 32- or 64-bit code equally easily, provided
of course that you remember to specify which it is to work with. If no
-b
switch is present, NDISASM works in 16-bit mode by default.
The -u
switch (for USE32) also invokes 32-bit mode.
Two more command line options are -r
which reports the
version number of NDISASM you are running, and -h
which gives
a short summary of command line options.
To disassemble a DOS .COM
file correctly, a disassembler
must assume that the first instruction in the file is loaded at address
0x100
, rather than at zero. NDISASM, which assumes by default
that any file you give it is loaded at zero, will therefore need to be
informed of this.
The -o
option allows you to declare a different origin for
the file you are disassembling. Its argument may be expressed in any of the
NASM numeric formats: decimal by default, if it begins with
`$
' or `0x
' or ends in `H
' it's
hex
, if it ends in `Q
' it's octal
,
and if it ends in `B
' it's binary
.
Hence, to disassemble a .COM
file:
ndisasm -o100h filename.com
will do the trick.
Suppose you are disassembling a file which contains some data which
isn't machine code, and then contains some machine code. NDISASM
will faithfully plough through the data section, producing machine
instructions wherever it can (although most of them will look bizarre, and
some may have unusual prefixes, e.g. `FS OR AX,0x240A
'), and
generating `DB' instructions ever so often if it's totally stumped. Then it
will reach the code section.
Supposing NDISASM has just finished generating a strange machine instruction from part of the data section, and its file position is now one byte before the beginning of the code section. It's entirely possible that another spurious instruction will get generated, starting with the final byte of the data section, and then the correct first instruction in the code section will not be seen because the starting point skipped over it. This isn't really ideal.
To avoid this, you can specify a `synchronisation
' point,
or indeed as many synchronisation points as you like (although NDISASM can
only handle 2147483647 sync points internally). The definition of a sync
point is this: NDISASM guarantees to hit sync points exactly during
disassembly. If it is thinking about generating an instruction which would
cause it to jump over a sync point, it will discard that instruction and
output a `db
' instead. So it will start disassembly
exactly from the sync point, and so you will see all the
instructions in your code section.
Sync points are specified using the -s
option: they are
measured in terms of the program origin, not the file position. So if you
want to synchronize after 32 bytes of a .COM
file, you would
have to do
ndisasm -o100h -s120h file.com
rather than
ndisasm -o100h -s20h file.com
As stated above, you can specify multiple sync markers if you need to,
just by repeating the -s
option.
Suppose you are disassembling the boot sector of a DOS
floppy (maybe it has a virus, and you need to understand the virus so that
you know what kinds of damage it might have done you). Typically, this will
contain a JMP
instruction, then some data, then the rest of
the code. So there is a very good chance of NDISASM being
misaligned when the data ends and the code begins. Hence a sync
point is needed.
On the other hand, why should you have to specify the sync point
manually? What you'd do in order to find where the sync point would be,
surely, would be to read the JMP
instruction, and then to use
its target address as a sync point. So can NDISASM do that for you?
The answer, of course, is yes: using either of the synonymous switches
-a
(for automatic sync) or -i
(for intelligent
sync) will enable auto-sync
mode. Auto-sync mode automatically
generates a sync point for any forward-referring PC-relative jump or call
instruction that NDISASM encounters. (Since NDISASM is one-pass, if it
encounters a PC-relative jump whose target has already been processed,
there isn't much it can do about it...)
Only PC-relative jumps are processed, since an absolute jump is either through a register (in which case NDISASM doesn't know what the register contains) or involves a segment address (in which case the target code isn't in the same segment that NDISASM is working in, and so the sync point can't be placed anywhere useful).
For some kinds of file, this mechanism will automatically put sync points in all the right places, and save you from having to place any sync points manually. However, it should be stressed that auto-sync mode is not guaranteed to catch all the sync points, and you may still have to place some manually.
Auto-sync mode doesn't prevent you from declaring manual sync points: it
just adds automatically generated ones to the ones you provide. It's
perfectly feasible to specify -i
and some
-s
options.
Another caveat with auto-sync mode is that if, by some unpleasant fluke,
something in your data section should disassemble to a PC-relative call or
jump instruction, NDISASM may obediently place a sync point in a totally
random place, for example in the middle of one of the instructions in your
code section. So you may end up with a wrong disassembly even if you use
auto-sync. Again, there isn't much I can do about this. If you have
problems, you'll have to use manual sync points, or use the -k
option (documented below) to suppress disassembly of the data area.
The -e
option skips a header on the file, by ignoring the
first N bytes. This means that the header is not counted towards
the disassembly offset: if you give -e10 -o10
, disassembly
will start at byte 10 in the file, and this will be given offset 10, not
20.
The -k
option is provided with two comma-separated numeric
arguments, the first of which is an assembly offset and the second is a
number of bytes to skip. This will count the skipped bytes towards
the assembly offset: its use is to suppress disassembly of a data section
which wouldn't contain anything you wanted to see anyway.