What Is Assembly Language?
First What is Machine Language:
Although programmers tend to use C or C++ or Pascal these days, the language closest to the PC hardware
is machine language. Not one second during a PCs powered on lifetime passes where the computer is not
executing machine language.
Assembly Language or Machine Language:
To word this simply, you can say that say that assembly language is a human-readable text, and machine
language is machine-readable binary code. When you program in assembly language, you are programming
on the machine language level.
To program directly in machine language is teadious, so you use assembly language instead, and use an
assembler to produce the actual machine code.
The assembler language is the symbolic programming language that lies closest to the machine language in
form and content.
You will, therefore, find the assembler language useful when:
You need to control your program closely, down to the byte and even the bit level.
You must write subroutines for functions that are not provided by other symbolic programming languages,
such as COBOL, FORTRAN, or PL/I.
Windows Memory
The first thing you need to know that windows memory is numbered, the first byte is at offset number 0, and
the highest possible offset for a byte is just over 4.2 billion (ffffffff hex)
When a program executes it will start at the first instruction (starting at 400000 (hex) in windows 95) and then
go onto the next one in a linear fashion, that is until a jump or conditional jump happens.
Conditional jumps are like a choice. If something happens then jump to here, otherwise just keep going in the
code. For instance if a character has 0 hp after being hit it will compare the hp to 0 and if it's 0 then it will jump
to the death routine.
Disassembling (W32Dasm)
Using w32Dasm disassembler on: diablo_s.exe
Start off by loading the project. It may take awhile, go fix yourself a drink :)
After this, go to Disassembler menu/Save Disassembly Txt File and Create Project File
and then next time you load it it'll just take 5 seconds...
Ok, now that you've finshed loading it...Hmm, so what do all these funky symbols mean? :)
Well, now what ya gotta do is this: goto font under the Disassembler menu. Then goto select font and select
fixedsys. Now you can actually read it! :)
Meanings of all those numbers you see:
Scroll down to about halfway and take a look at the code. For instance:
:0044BD61 0570010000 add eax, 00000170
: <offset> <code> <translation into asm>
Also if you look at the bottom of the window:
Code Data @:0044BD61 @Offset 0004B161h in File...
<offset in winmem> <offset in file (h=hex)>
Here's a few clues as to understanding the language:
eax, ebx, ecx, edx, are registers - that's a place set aside in the actual CPU to hold information that's being
processed presently.
The code is read sequentially unless there's a jump then it'll jump to that location instead. Basically a jump will
be anything with a j at the beginning There are many different types of jumps, some are conditional, and a
couple are unconditional. A conditional jump will check the last instruction (usually a compare or test) and
jump if certain conditions are met. Here are a couple of examples:
jmp - unconditional jump (always jumps)
je - jump if the two numbers are even
jne - jump if they're not even
jg - jumps if second number is greater (I think it's the second...)
jl - jumps if the second number is less
Conditional jumps are preceded by compares or tests. this will basically compare a number against another
number and jump if the condition is met.
eg:
cmp ecx, 00000004
je 0041F786
What this is saying is compare the number that's currently in the ecx register against 4 (hex) and if they're
equal then jump to the instruction in the memory location 41f786 in the program. If it's not equal, then it won't
jump and will just go onto the next instruction.
Add means simply add a number to another number
add eax, edx -means to add the value stored in the edx register to the value in eax register.
Likewise with sub = subtract.
One more simple command to learn: move data
mov edx, 000000d8 means move the number <hex> 000000d8 into the edx register
mov edx, eax means move the value in eax into edx.
mov is pretty straight forward...
Well there's just one more thing that I know quite well...
push and pop commands:
push esi (another register) pushes the value in esi onto the "stack"
The "Stack"
You don't need to know anything about the stack except that it's "last in first out" so the last number pushed is
on top of the stack and will be the first to be taken off (popped), then after that the second to entered will be
taken at the next pop, etc.
so pop edi would take the last number put on the stack and put it in edi.
Numbers are often pushed onto the stack, then later popped back into the program.