VM - InCTF 2018
InCTF 2018 also consisted of an individual CTF apart from the main Attack and Defense CTF. There were 5 total challenges in this CTF. So in this post i am going to discuss the “vm” challenge that I solved during the CTF. There were multiple approaches that could have been used for this challenge since the source code of the vm was given. During the CTF i didn’t use this approach and i directly went after the flag by ignoring the VM since it was a very simple VM as source code was given but I decided to solve the challenge again and this time without using the source code because I usually have some trouble solving complex VMs (that 10th flare-on level this year with a kvm was a pain) and this looks like a good challenge to learn something new.
What is a VM?
In the context of this challenge, VM(Virtual Machine) is an application/machine that reads and executes the custom bytecode. We need to write a parser/disassembler so as to convert this bytecode to readable format/pseudo assembly.
Challenge
We are given 3 files in this challenge – net.bsd, vm, vm.c. We are not gonna look at the source in this method and will try to reverse engineer the VM. Also, “net.bsd” contains the custom bytecode which is executed by the VM. Taking a look at the .bss section it is pretty clear that there are 4 virtual registers. Analyzing the Switch Jump Table at 0x2024 and the different functions (handle_….) it is clear that there are 7 different virtual opcodes in the VM and there is a default function which is called when VM encounters an invalid opcode. Now onto the main function, the vm opens net.bsd files and reads it line by line. Each line is converted to integer and then different parts of the instruction are extracted from this integer.
Code Snippet of VM converting the integer to the different parts of the virtual instruction:
.text:0000000000001214 mov eax, [rbp+var_C]
.text:0000000000001217 shr eax, 18h
.text:000000000000121A mov [rbp+var_10], eax
.text:000000000000121D mov eax, [rbp+var_C]
.text:0000000000001220 shr eax, 14h
.text:0000000000001223 and eax, 0Fh
.text:0000000000001226 mov [rbp+var_14], eax
.text:0000000000001229 mov eax, [rbp+var_C]
.text:000000000000122C shr eax, 10h
.text:000000000000122F and eax, 0Fh
.text:0000000000001232 mov [rbp+var_18], eax
.text:0000000000001235 mov eax, [rbp+var_C]
.text:0000000000001238 shr eax, 0Ch
.text:000000000000123B and eax, 0Fh
.text:000000000000123E mov [rbp+var_1C], eax
.text:0000000000001241 mov eax, [rbp+var_C]
.text:0000000000001244 and eax, 0FFFFh
.text:0000000000001249 mov [rbp+var_20], eax
.text:000000000000124C cmp [rbp+var_10], 6
.text:0000000000001250 ja loc_12E7
Some important points we can derive from our knowledge till now:
- Each line in net.bsd contains an integer which is actually an virtual instruction.
- Different parts of the integer are different parts of the instruction.
The last byte that is 24th to 32nd bit of the integer are saved separately and we will refer this as 1st part further in the post. The bits from 20th to 24th bit form the 2nd part. The bits from 16th to 20th bit form the 3rd part and 12th to 16th bit form the 4th part of the instruction. 0 to 16 bit are also saved and lets say this is the 5th part of the instruction. Now lets see what these parts are and where they are used in the VM.
So now we have almost reached to the core of the VM. The 1st part that was found earlier is now compared with 6 and if its greater then handle_default is called and VM exits. So it is the 1st part which decides which function will be called and rest all the parts are actually the custom instructions or the parameters to the functions. Now since we know how the functions are called from the integer lets take a look at all the 7 functions that is the 7 opcodes supported by the VM. We will analyze all the 7 functions and will try to parse the code at last.
Functions
Analyzing all the functions will help us write a parser for this vm :
handle_add – On analysis we can see that this function expects three parameters, edi and esi contain first 2 parameters which are actually the 2nd part (20th to 24th bit) and 3rd part (16th to 20th bit) of the integer which we got in the earlier part. The third parameter is actually the 16 bit integer (5th part). If 2nd or 3rd parameter are greater than 3 then again the VM exits. And this made me remember that there are just 4 virtual registers which are stored in continuous memory as array. First two parameters are actually the indexes of 2 registers which can be further confirmed by looking at the code.
.text:000000000000136C mov eax, [rbp+var_8]
.text:000000000000136F cdqe
.text:0000000000001371 lea rdx, ds:0[rax*4]
.text:0000000000001379 lea rax, reg
.text:0000000000001380 mov edx, [rdx+rax]
.text:0000000000001383 mov eax, [rbp+var_C]
.text:0000000000001386 lea ecx, [rdx+rax]
.text:0000000000001389 mov eax, [rbp+var_4]
.text:000000000000138C cdqe
.text:000000000000138E lea rdx, ds:0[rax*4]
.text:0000000000001396 lea rax, reg
.text:000000000000139D mov [rdx+rax], ecx
.text:00000000000013A0 nop
.text:00000000000013A1 leave
.text:00000000000013A2 retn
So what happens in above code is the value in the virtual register at the index (2nd param) is added to the 3rd param and is stored in vreg (1st param). So its kind of like reg[1st param] = reg[2nd param] + 3rd param.
handle_xor – This function also has 3 parameters but this time the 2nd, 3rd and 4th parts of the integer are passed as parameters. what happens in this function is the register at first param is set equal to the xor of the data at 2nd and 3rd register.
.text:00000000000013D0 mov eax, [rbp+var_8]
.text:00000000000013D3 cdqe
.text:00000000000013D5 lea rdx, ds:0[rax*4]
.text:00000000000013DD lea rax, reg
.text:00000000000013E4 mov edx, [rdx+rax]
.text:00000000000013E7 mov eax, [rbp+var_C]
.text:00000000000013EA cdqe
.text:00000000000013EC lea rcx, ds:0[rax*4]
.text:00000000000013F4 lea rax, reg
.text:00000000000013FB mov eax, [rcx+rax]
.text:00000000000013FE mov ecx, edx
.text:0000000000001400 xor ecx, eax
.text:0000000000001402 mov eax, [rbp+var_4]
.text:0000000000001405 cdqe
.text:0000000000001407 lea rdx, ds:0[rax*4]
.text:000000000000140F lea rax, reg
.text:0000000000001416 mov [rdx+rax], ecx
.text:0000000000001419 nop
.text:000000000000141A leave
.text:000000000000141B retn
handle_out – This function has a single parameter. It expects the index of the virtual register whose data has to be printed. Only a single character is printed at a time.
handle_in – This function reads a value into a virtual register. Only a single character is read at a time.
handle_mov – This function moves a 16 bit value to a register. The index of the register and the value are passed as parameter.
handle_chk – This function checks if 2 virtual registers are equal or not.
handle_hlt – Exits the VM.
This was just some brief info about the VM functions, Now as to parse the custom bytecode set we would need to convert the custom bytecode to the corresponding opcodes. So here is the format of the different opcodes which we will get after parsing the code.
- ADD R1, R2, val – Sets R1 equal to sum of R2 and val.
- XOR R1, R2, R3 – Sets R1 equal to xor of R2 and R3.
- IN R1– Reads a character from the stdin to a virtual register R1.
- OUT R1– Prints data from virtual register R1 to stdout.
- MOV R1, val – Moves val to R1.
- CHK R1, R2 – Checks if R1 is equal to R2 or not, Sets flag = 1 if they are unequal.
- HLT – VM exit
So from the knowledge of all the above opcodes and their working it is pretty easy to write a parser, so here is the parser i wrote for this VM:
# virtual registers - R0, R1, R2, R3
opcodes = ["ADD", "XOR", "CHK", "OUT", "IN", "MOV", "HLT"] # virtual opcodes
netbsd = open("net.bsd", "r")
virt_code = []
for x in netbsd:
virt_code.append(int(x))
netbsd.close()
for inst in virt_code:
instr_str = ""
opcode = inst >> 24
p2 = inst >> 20 & 0xf
p3 = inst >> 16 & 0xf
p4 = inst >> 12 & 0xf
p5 = inst & 0xffff
if opcode == 0:
print opcodes[0] + " R" + str(p2) + ", " + "R" + str(p3) + ", " + str(hex(p5))
elif opcode == 1:
print opcodes[1] + " R" + str(p2) + ", " + "R" + str(p3) + ", " + "R" + str(p4)
elif opcode == 2:
print opcodes[2] + " R" + str(p2) + ", " + "R" + str(p3)
elif opcode == 3:
print opcodes[3] + " R" + str(p2)
elif opcode == 4:
print opcodes[4] + " R" + str(p2)
elif opcode == 5:
print opcodes[5] + " R" + str(p2) + ", " + str(hex(p5))
elif opcode == 6:
print opcodes[6]
else:
print "ERROR!!!!"
Code executed by the VM
By using the above script we get the code executed by the VM:
MOV R1, 0x45 ; moves 0x45 ('E') to R1
OUT R1 ; prints 'E'
MOV R1, 0x6e ; moves 0x6e ('n') to R1
OUT R1 ; prints 'n'
MOV R1, 0x74 ; moves 0x74 ('t') to R1
OUT R1 ; prints 't'
MOV R1, 0x65 ; moves 0x65 ('e') to R1
OUT R1 ; prints e
MOV R1, 0x72 ; moves 0x72 ('r') to R1
OUT R1 ; prints 'r'
MOV R1, 0x20 ; moves 0x20 (' ') to R1
OUT R1 ; prints ' '
MOV R1, 0x70
OUT R1 ; prints 'P'
MOV R1, 0x61
OUT R1 ; prints 'a'
MOV R1, 0x73
OUT R1 ; prints 's'
MOV R1, 0x73
OUT R1 ; prints 's'
MOV R1, 0x20
OUT R1 ; prints ' '
MOV R1, 0x3a
OUT R1 ; prints ':'
MOV R2, 0x20
OUT R2 ; prints ' '
MOV R0, 0x69 ; Moves 0x69 to R0
IN R1 ; Reads a char from stdin to R1
CHK R0, R1 ; Checks if input char is equal to 0x69 ('i')
MOV R1, 0x6e
IN R2
CHK R1, R2 ; Checks if input is equal to 0x6e ('n')
MOV R2, 0x63
IN R1
CHK R1, R2 ; Checks if input == 'c'
MOV R0, 0x74
IN R2
CHK R0, R2 ; Checks if input == 't'
MOV R0, 0x66
IN R2
CHK R0, R2 ; Checks if input == 'f'
MOV R0, 0x7b
IN R1
CHK R0, R1 ; Checks if input == '{'
MOV R0, 0x26 ; moves 0x26 to R0
IN R1 ; input char to R1
MOV R2, 0x17 ; moves 0x17 to R2
XOR R2, R2, R1 ; XOR R2 and R1 and stores result in R2
CHK R0, R2 ; checks R0 == R2, this gives us the char 0x26 ^ 0x17 == 0x31 ('1')
MOV R0, 0x15
IN R1
MOV R2, 0x7b
XOR R2, R2, R1
CHK R0, R2 ; 0x15 ^ 0x7b == 0x6e ('n')
MOV R0, 0x7b
IN R1
MOV R2, 0x38
XOR R2, R2, R1
CHK R0, R2 ; 0x7b ^ 0x38 == 0x43 ('C')
MOV R0, 0x55
IN R1
MOV R2, 0x62
XOR R2, R2, R1
CHK R0, R2 ; 0x55 ^ 0x62 == 0x37 ('7')
MOV R0, 0x1b
IN R1
MOV R2, 0x5d
XOR R2, R2, R1
CHK R0, R2 ; 0x1b ^ 0x5d == 0x46 ('F')
MOV R0, 0x72
IN R1
MOV R2, 0x2d
XOR R2, R2, R1
CHK R0, R2 ; 0x72 ^ 0x2d == 0x5f ('_')
MOV R0, 0x7a
IN R1
MOV R2, 0x38
XOR R2, R2, R1
CHK R0, R2 ; 0x7a ^ 0x38 == 0x42 ('B')
MOV R0, 0x7a
IN R1
MOV R2, 0x49
XOR R2, R2, R1
CHK R0, R2 ; 0x7a ^ 0x49 == 0x33 ('3')
MOV R0, 0x7
IN R1
MOV R2, 0x54
XOR R2, R2, R1
CHK R0, R2 ; 0x7 ^ 0x54 == 0x53 ('S')
MOV R0, 0x23
IN R1
MOV R2, 0x57
XOR R2, R2, R1
CHK R0, R2 ; 0x23 ^ 0x57 == 0x74 ('t')
MOV R1, 0x7d
IN R2
CHK R1, R2 ; checks R2 == '}' ?
HLT
Since the VM only supports 7 virtual opcodes and 4 virtual registers and in the above instructions the ADD opcode and 4th register are not even used, So the analysis of the code is really easy. The comments in the parsed code can help us to understand what the VM is doing.
The flag that we get from code is – inctf{1nC7F_B3St} This is pretty good challenge to play with if you want to get started with VMs.