Tuesday 3 May 2016

Intro to reverse engineering - x86 Assembly

Reverse Engineering (RE) is an art of learning how things were made. It can be related to anything, like reverse engineering a car, or reverse engineering a mobile phone, in order to know how they were made. RE is a very useful skill. In this blog we will discuss about reverse engineering computer applications and programs, to learn how things were made.

Not only that, but reverse engineering is also useful for malware analysis. To analyse malware and make counter attacks.

In order to learn computer reverse engineering, one should be familiar with basic programming languages like C,C++. And learning x86 Assembly will be a plus point. Basics of x86 Assembly will be taught here.



X86 ASSEMBLY
*********************

The Central Processing Unit (CPU) is the main part of any computer. The CPU makes all the operations which are needed to run a computer normally. CPU generally does this operations as instructions. Instructions are executed by the CPU one after another. To do this, the CPU needs some space to store some information. This information is stored in small boxes in CPU, call REGISTERS. Using registers instead of the memory (RAM) is more efficient and time saving for CPU.
The Registers are of three types :


1. General Purpose Registers : There are used to manipulate the data, to pass parameters when calling any function, and to store the immediate results of any king of operations done which can be later transferred to the memory.

2. Status Registers

3. Segment Registers : These are used to store the addresses of programs, or Stack (I will explain it later) or anything to be loaded .


AX, BX ,CX & DX are the most used registers. These are the 16-bit representation of those registers. By , adding E i.e., EAX, EBX, ECX, EDX, and so on, we can get the full 32-bit registers.

AX,BX,CX,DX, so on and be divided into 8 bit registers again, for example,

EAX(32bit) ----> AX(16bit) ----> AL(lower 8bits of AX) & AH(Upper 8bits of AX)

Logically these registers can contain only values equals to their capacities. Actually the amount of bits (8, 16 and 32) corresponds to these capacities, that is to say: 8 bits = 255d, 16 bits = 65535d, 32 bits = 294 967 295d (“d” to say decimal, and these are the maximum values a register can contain).

Regarding Status Registers, they do not have 8-bit parts, so they contain neither H nor L. These registers are:

DI – Destination Index: mainly used when handling string instructions, and is generally associated with Segment Registers DS or ES.
SI – Source Index: used as source data address when it comes to manipulating strings, and is generally associated with Segment Register DS.
BP – Base Pointer: when a subroutine is called by a “CALL“, this register is partnering with the SS Segment Register to access data from the stack and is generally used for registering indirect addresses.
IP – Instruction Pointer: associated with the Segment Register CS to indicate the next instruction to execute, and indirectly modified by jumps instructions, subroutines and interrupts.
SP – Stack Pointer: used with Segment Register SS (SS: SP) to indicate the last element of the stack.
(EDI, ESI, EBP, EIP, and ESP are all 32bit registers)

Now, STACK The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer.


Push and Pop commands are used to push the information to the top of the stack and pop(take out) the information from top of the stack/


Some important instructions in Assembly :

MOV instruction (used as: mov destination,source) is used to move the data from one place to another, i.e., from source to the destination.


LEA instruction is similar to mov, but lea AX,[BX+CX] computes the value of BX+CX , and then stores its address in AX. where as, Mov AX,[BX+CX] moves the value at the address BX+CX into AX. It is important to keep this in mind.


JMP instruction is used to jump directly to any address in the program. There are no If conditions to use in assembly. So, we use conditional jumps instead. Ex: JZ(Jump if equal), JNZ(Jump if not equal), JE, JNE, etc. these instructions are used for different status registers.

jge -Jump if they're greater or equal ; This will not work on negative registers
jg - Jump if they're greater than ; Neither will this
jle -Jump if they're less or equal ; ..this..
jl - Jump if they're less ; ...Or this
jne - Jump if they're not equal ; This conditional jump and all the following will work with both negative and positive numbers alike
je - Jump if they're equal
jne - Jump if they're not equal
jae - Jump if they're above/greater than or equal
ja - Jump if they're above/greater than
jbe - Jump if they're below/less than or equal

jb - Jump if they're below/less than

These condtional jumps are used after a comparison instruction like cmp(used as cmp ax,bx). After any type of comparison instruction,the flags, or status registers will be manipulated and according to those, the conditional jumps will be taken. For example, consider this small code:


push 0

xor eax,eax
mov ebx,eax
pop ecx
cmp ecx,ebx
jz 402342
jmp 402124

So, first we pushed 0 on top of the stack, and then we XORed eax which makes eax 0. Then, we moved the value of eax into ebx, making ebx 0. Pop ecx will put the value on the top of the stack into ecx, so we know 0 is on the top of stack, so it will make ecx 0. Now, comparing ecx,ebx since the both are equal, jump will be taken at jz 402342 and will be jumped to 402342. If they both are not equal, the jump will be not taken and the code continues to run.


Sometimes, you will also see instructions like this : 


Mov Eax, DWORD PTR DS:[01009000]

DWORD is a 32-bit value. PTR stands for "pointer", meaning that the data at address 01009000 is being loaded, not the number 01009000. DS stands for "data segment", meaning the loaded value is from the .data section.

Now, we almost covered the basics of x86 Assemby

2 comments: