During the research phase of my Blackhat talk, I was digging into detecting the default layout of a dexfile, as generated by the normal dx tool. Originally, my concept was that I wanted my tool to “stack” things inside the file the same way that the dalvik compiler would, though I couldn’t find any actual resources on what this actually looked like. After a few hours of digging through code on AOSP and tearing apart an actual dex file to look at the innards, I came up with the quick little ASCII diagram below;
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
+--------------------------------------------------------------------+ | Dex header | * offsets and sizes of all sections | - default size 0x70 +--------------------------------------------------------------------+ | String_id_list | * offsets into data | - size: number of strings * 4 +--------------------------------------------------------------------+ | Type_id_list | * index into string_id_list | - size: number of types * 4 +--------------------------------------------------------------------+ | Proto_id_list | * index into string_id_list | * index into type_id_list | * offsets into data section (params) | - size: number of protos * 12 +--------------------------------------------------------------------+ | Field_id_list | * 2 indexes into type_id_list | * index into string_id_list | - size: number of fields * 8 +--------------------------------------------------------------------+ | Method_id_list | * index into Type_id_list | * index into Proto_id_list | * index into String_id_list | - size: number of methods * 8 +--------------------------------------------------------------------+ | Class_def_items | * 2 indexes into Type_id_list | * offsets into data for interfaces | * indexes into Type_id_list | * index into string_id_list for source file | * offsets into data for annotation | * offsets into data for annotation_set | * offsets into class data for annotation item | * offsets into data for class_data_items | * index into method_id | * offsets into data for static_values | * offsets into data for code_item | * offsets into data for debug_item | - size: number of classes * 20 +--------------------------------------------------------------------+ | Data section (default layout) | * annotation items | * code items | * annotation_directory | * interfaces | * parameters - used by proto section | * strings | * debug items | * annotation_sets | * static values | * class_data | * map list +--------------------------------------------------------------------+ |
The result of the APKfuscator actually ended up being quiet different than the above mappings. It’s definitely possibly to retain the structure, however the sections can easily be interchanged. The resulting sections from my tool look like the following;
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
Above sections are identical as to layout, but could be shifted around if need be ... +--------------------------------------------------------------------+ | Data section (default layout) | * strings | * parameters (proto section) | * interfaces | * annotation items (visibility of item (flags), | annotation type, number of name, | encoded annotation) | * class annotations (size of items, offsets to items) | * annotation data (offset to class annotations, | fields size, methods size, | parameters size) | * code items | * class data | * static values | * debug items (currently stripped) | * map list +--------------------------------------------------------------------+ |
The patterns for the normal dx compiler appear to always lay out the same, so if someone has developed a post-compilation modification tool (i.e. – APKfuscator or (bak)smali), it might be possible to see that a dex file has been “changed”. If someone was to develop a tool to look for patterns about how this data is laid out, it could lead to some interesting results. Being able to detect these changes and patterns, run on a large enough scale, could be an interesting tactic to finding out whether or not someone has messed with a file quickly. Hopefully I’ll have more time to research this area and either prove or disprove this theory. Though, until then – hopefully the small ASCII layouts might help someone else with whatever work they’re doing on dalvik research.
It’s been almost a full week since my talk, Dex Education: Practicing Safe Dex, though I think I’m only now beginning to recover. The past few months have truly been a whirlwind of both working on dissecting malware at Lookout and working on putting together a solid presentation for BlackHat. So far I’ve been unable to draw a crowd like Charlie, though maybe someday I’ll have people sitting in the aisles fighting for a seat during a presentation. Until then the people who went will just have to deal with the extra legroom. Over all the presentation seemed to go over pretty well, some interesting chats afterwards with some smart people. A few people where interested in the slides and proof of concept code, so I told them I would tweet it and also make a blog post about it.
My slides are available here with the proof of concept code being hosted on my github page here. The proof of concept crackme code on the same github page as well shortly.
I’ve got some extra content that I wasn’t able to fit into the slide-deck, heck it was 96 slides as is after trimming some things out. While I didn’t intend to try and cover everything possible to break most analysis tools, I wanted to attempt to cover as much as possible. Over the course of a few days or weeks, I’ll try to roll out details in my blog about how certain things worked, mainly for people who where unable to attend the presentation, hear my explanations or ask me things at the conference. Feel free to reach out to me if there is anything I’ve missed or you would live a better explanation about.
A few people asked me about Blackhat and Defcon – wondering if it’s worth attending. So to step on a soap box just for a minute, I’ll give the mini speech that I normally tell people. Conferences are only worth what you put into them, go to talks that seem interesting and are outside of your direct field of work. Why attend talks outside the direct field of work? I’ve found it’s a great way to try and find different perspectives, which often can be related back into your own work and field. It is also quiet hard to appreciate a talk on something that you deal with daily, definitely very important to try and keep this in mind if you do see those types of talks. As a presenter myself, I found it exceptionally hard to not go too low level while still feeling like I can add value to everyone in the audience. After attending the talks you chose, meet the presenters and pick their brains, this is honestly where you can learn the most. As I have said, it’s really hard to make a presentation accessible for a whole audience, talking directly with these people will give you so much more information than the slides often do. The people you meet at the bars (for Blackhat @ Caesars goto the Galleria bar) are often people you talk to online already. Make friends, go outside that comfort zone and buy some people drinks. Most everyone is friendly, if they aren’t – don’t drink with them. Almost all conferences are worth going to, Blackhat and Defcon included, mainly due to the talent it attacks that you can find hanging out at the bars.
Probably the greatest thing about Blackhat for me was to meet some really great people I’ve only had the pleasure of talking to online. Talking with Mila, the mind behind Contagio Dump, was really great – able to pay her back a little for all the hard work she does with a beer or two. Got to talk with some of the original DroidSecurity (now AVG) guys, Elad and Oren, it’s never a dull moment talking to an Israeli reverse engineer – just look at Zuk. Another interesting person who I got to hang out with was along side me in the malware talk track, @snare. He did some crazy things with EFI rootkits for OSX, pretty scary and interesting stuff all in the same talk.
People often say it isn’t what you know, but who you know. I’d argue the security space is a ying and yang of both; to be a valuable (reverser) engineer you need to know your stuff and the people to help you succeed.
Enough on this soapbox, hopefully you enjoy the slides and code. If you ever run into me at a conference – let’s have a beer or two and chat.
A recent addition to the android market has been ATD, Android Turret Defense. This is a Plox-like game, though it has the “maze” strategy element combined in it. Strangely — it reminds me of a few old maps I used to play with friend for starcraft… Anyway I finally got around to beating it which isn’t too difficult once you get the hang of placing turrets and a get a decent strategy. At the end it awards you with a “badge code” — not sure exactly what the author intends to use this for, but I decided to take a look at how these are created. I was interested in how they where generated, and to see if people could easily replicate them, or if there would be any deterrents to keep people from just sharing them. Again, this is possibly completely useless information, since we have no idea what these codes will be used for. The could be used for tournaments, downloads, prizes – or maybe to just “give” you an image of a badge… As of right now we just don’t know.
Below is a dump of the function we will be analyzing with my comments in it (highlighted green), they should be pretty easy to follow:
.method private createBadgeCode()Ljava/lang/String;
// Date now = New Date();
new-instance v2,java/util/Date
invoke-direct {v2},java/util/Date/; ()V // SimpleDateFormat dateFormat = new SimpleDateFormat(“yyMMddhhmm”);
new-instance v5,java/text/SimpleDateFormat
const-string v7,”yyMMddhhmm”
invoke-direct {v5,v7},java/text/SimpleDateFormat/; (Ljava/lang/String;)V // StringBuilder raw = new StringBuilder();
new-instance v7,java/lang/StringBuilder
invoke-direct {v7},java/lang/StringBuilder/; ()V // raw.append(dateFormat.format(now));
invoke-virtual {v5,v2},java/text/SimpleDateFormat/format ; format(Ljava/util/Date;)Ljava/lang/String;
move-result-object v8
invoke-virtual {v7,v8},java/lang/StringBuilder/append ; append(Ljava/lang/String;)Ljava/lang/StringBuilder;
move-result-object v7// raw.append(difficulty);
iget v8,v12,tx/games/atd_world.difficulty I
invoke-virtual {v7,v8},java/lang/StringBuilder/append ; append(I)Ljava/lang/StringBuilder;
move-result-object v7// raw.append(“tensaix2j”);
const-string v8,”tensaix2j”
invoke-virtual {v7,v8},java/lang/StringBuilder/append ; append(Ljava/lang/String;)Ljava/lang/StringBuilder;
move-result-object v7// Bytes[] rawbytes = raw.toString.getBytes;
invoke-virtual {v7},java/lang/StringBuilder/toString ; toString()Ljava/lang/String;
move-result-object v4
invoke-virtual {v4},java/lang/String/getBytes ; getBytes()[B
move-result-object v0/* Below code refined;
int sum = 0;for(int i = 0; i < rawbytes.length(); i++)
sum += rawbytes[i];
*/
const/4 v6,0
const/4 v3,0
l3c1e:
// length = rawbytes.length();
array-length v7// if( v3 > v7 ) goto: l3c30
if-ge v3,v7,l3c30// v7 = rawbytes(v0);
aget-byte v7,v0,v3// v6 += v7;
add-int/2addr v6,v7// v3 ++;
add-int/lit8 v3,v3,1
goto l3c1el3c30:
// StringBuilder badge = new StringBuilder();
new-instance v7,java/lang/StringBuilder
invoke-direct {v7},java/lang/StringBuilder/; ()V // v8 = Math.random();
invoke-static {},java/lang/Math/random ; random()D
nop
move-result-wide v8// v10 = 4652007308841189376;
const-wide v10,4652007308841189376 ; 0x408f400000000000// v8 = Math.round(v8*v10);
mul-double/2addr v8,v10// I thought it only took one variable??
invoke-static {v8,v9},java/lang/Math/round ; round(D)J
move-result-wide v8// v10 = 1000
const-wide/16 v10,1000// v8 += v10;
add-long/2addr v8,v10// badge.append(v8);
invoke-virtual {v7,v8,v9},java/lang/StringBuilder/append ; append(J)Ljava/lang/StringBuilder;
move-result-object v7// badge.append(dateFormat.format(now));
invoke-virtual {v5,v2},java/text/SimpleDateFormat/format ; format(Ljava/util/Date;)Ljava/lang/String;
move-result-object v8
invoke-virtual {v7,v8},java/lang/StringBuilder/append ; append(Ljava/lang/String;)Ljava/lang/StringBuilder;
move-result-object v7// badge.append(difficulty);
iget v8,v12,tx/games/atd_world.difficulty I
invoke-virtual {v7,v8},java/lang/StringBuilder/append ; append(I)Ljava/lang/StringBuilder;
move-result-object v7// badge.append(sum);
invoke-virtual {v7,v6},java/lang/StringBuilder/append ; append(I)Ljava/lang/StringBuilder;
move-result-object v7// return badge.toString();
invoke-virtual {v7},java/lang/StringBuilder/toString ; toString()Ljava/lang/String;
move-result-object v1
return-object v1
.end method
An example of the output of this function is; 1310090403121501473
Broken down the output looks like this;
1310090403121501473, (round(random * const)+1000
1310090403121501473, Date in yyMMddhhmm format.
1310090403121501473, “0″ Difficulty, Noob = 0, Normal = 1, Pro = 3
1310090403121501473, sum of bytes (date + difficulty + “tensaix2″)
I’ll post more later if the “badge system” is every finished and released. Hopefully this serves as a decent example on how to reverse simple android programs… Enjoy!
Thanks to my friend Gabor, over at http://mylifewithandroid.blogspot.com/ has created a really well done dex file dissembler. The direct link for the post is here and the source code is all free and located at dedexer.sourceforge.net.
It’s nice as it outputs the format in jasmin like the following;
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
.method public calc1(I)I packed-switch v2,0 ps418_422 ; case 0 ps418_426 ; case 1 ps418_42a ; case 2 default: ps418_default ps418_default: const/4 v0,15 l420: return v0 ps418_422: const/4 v0,2 goto l420 ps418_426: const/4 v0,5 goto l420 ps418_42a: const/4 v0,6 goto l420 nop .end method |
Opposed to the normal;
|
1 2 3 4 5 6 7 8 9 10 11 |
000418: 2b02 0c00 0000 |0000: packed-switch v2, 0000000c // +0000000c 00041e: 12f0 |0003: const/4 v0, #int -1 // #ff 000420: 0f00 |0004: return v0 000422: 1220 |0005: const/4 v0, #int 2 // #2 000424: 28fe |0006: goto 0004 // -0002 000426: 1250 |0007: const/4 v0, #int 5 // #5 000428: 28fc |0008: goto 0004 // -0004 00042a: 1260 |0009: const/4 v0, #int 6 // #6 00042c: 28fa |000a: goto 0004 // -0006 00042e: 0000 |000b: nop // spacer 000430: 0001 0300 faff ffff 0500 0000 0700 ... |000c: packed-switch-data (10 units) |
Great work Gabor, and keep up the good work!
I’ve been working on the header a little more – so I figured I’d post some code I just finished throwing together quickly. It’s not all the code, since most of it is experimental and I’m not finished doing it, but this will provide people with the information on how to dump the dex file header information.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 |
/* File: DexNfo.java * * Coded: Timothy Strazzere * Date: 11/22/08 * * Dump header information from a dex file, only supports '035' dex files, though will * attempt to dump rest of the information, but will just warn you otherwise. * * Some code has been removed as it isn't sure if it full works properly yet. * */ import java.security.*; import java.util.zip.Adler32; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; /* To do... * * lots... * */ public class DexNfo{ public static void main(String[] args) { if (args.length == 1) { try { File file = new File(args[0]); byte[] barr, newbytes = null; newbytes = barr = getBytesFromFile(file); // add switch for this? System.out.printf("Original information: " + args[0]); int magic = 0; for(int i = 0; i<8; i++) magic+=barr[i]; if(magic!=483) // technically anything higher should be a new dex file... 'dex 036' etc.. System.out.printf("\n** Warning: Magic; bad dex file or unsupported version loaded!"); // Don't output char 4 since it's a newline char System.out.printf("\nMagic: %c%c%c %c%c%c", barr[0], barr[1], barr[2], /*barr[3],*/ barr[4], barr[5], barr[6], barr[7]); System.out.print("\nChecksum: "); for(int i = 8; i<12; i+=4) System.out.printf("0x%02X%02X%02X%02X ", barr[i+3], barr[i+2], barr[i+1], barr[i]); System.out.print("\nSignature: 0x"); for(int i = 12; i<32; i+=4) System.out.printf("%02X%02X%02X%02X ", barr[i], barr[i+1], barr[i+2], barr[i+3]); System.out.printf("\nLength: 0x%02X%02X", barr[33], barr[32]); if(barr[36]!= 112) // currently is always 0x70==(int)112... System.out.printf("\n** Warning: Header Length; bad dex file or unsupported version loaded!"); System.out.printf("\nHeader Length: 0x%02X", barr[36]); // endian tag int endian = 0; for(int i = 0; i < 4; i++) endian += barr[40+i]; if(endian != 276) // Currently always should be 0x78563412, which when added = int 114 System.out.printf("\n** Warning: Endian Tag; bad dex file or unsupported version loaded!"); System.out.printf("\nEndian Tag: 0x%02X%02X%02X%02X", barr[40], barr[41], barr[42], barr[43]); // map offset System.out.printf("\nMap Offset: 0x%02X%02X", barr[53], barr[52]); // string table size System.out.printf("\nString Table Size: 0x%02X%02X", barr[57], barr[56]); // string table offset System.out.printf("\nString Table Offset: 0x%02X%02X", barr[61], barr[60]); // type table size System.out.printf("\nType Table Size: 0x%02X%02X", barr[65], barr[64]); // type table offset System.out.printf("\nType Table Offset: 0x%02X%02X", barr[69], barr[68]); // Prototype table size System.out.printf("\nPrototype Table Size: 0x%02X%02X", barr[73], barr[72]); // Prototype table offset System.out.printf("\nPrototype Table Offset: 0x%02X%02X", barr[77], barr[76]); // Field table size System.out.printf("\nField Table Size: 0x%02X%02X", barr[81], barr[80]); // Field table offset System.out.printf("\nField Table Offset: 0x%02X%02X", barr[85], barr[84]); // Method table size System.out.printf("\nMethod Table Size: 0x%02X%02X", barr[89], barr[88]); // Method table offset System.out.printf("\nMethod Table Offset: 0x%02X%02X", barr[93], barr[92]); // Class table size System.out.printf("\nClass Table Size: 0x%02X%02X", barr[97], barr[96]); // Class table offset System.out.printf("\nClass Table Offset: 0x%02X%02X", barr[101], barr[100]); System.out.println(); // add switch for this too? calcSignature(newbytes); calcChecksum(newbytes); System.out.print("\n\nNew Checksum: "); for(int i = 8; i<12; i+=4) System.out.printf("0x%02X%02X%02X%02X ", newbytes[i+3], newbytes[i+2], newbytes[i+1], newbytes[i]); System.out.print("\nNew Signature: 0x"); for(int i = 12; i<32; i+=4) System.out.printf("%02X%02X%02X%02X ", newbytes[i], newbytes[i+1], newbytes[i+2], newbytes[i+3]); System.out.printf("\nLength: %04X", calcSize(newbytes)); // output compares to the two, highlight differences... } catch (Exception e) { System.err.println("File input error"); } } else System.out.println("Invalid parameters"); } public static byte[] getBytesFromFile(File file) throws IOException { InputStream is = new FileInputStream(file); // Get the size of the file long length = file.length(); if (length > Integer.MAX_VALUE) { // File is too large } // Create the byte array to hold the data byte[] bytes = new byte[(int)length]; // Read in the bytes int offset = 0; int numRead = 0; while (offset < bytes.length && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) { offset += numRead; } // Ensure all the bytes have been read in if (offset < bytes.length) { throw new IOException("Could not completely read file "+file.getName()); } // Close the input stream and return bytes is.close(); return bytes; } private static void calcSignature(byte bytes[]) { MessageDigest md; try { md = MessageDigest.getInstance("SHA-1"); } catch(NoSuchAlgorithmException ex) { throw new RuntimeException(ex); } md.update(bytes, 32, bytes.length - 32); try { int amt = md.digest(bytes, 12, 20); if(amt != 20) throw new RuntimeException((new StringBuilder()).append("unexpected digest write: ").append(amt).append(" bytes").toString()); } catch(DigestException ex) { throw new RuntimeException(ex); } } private static void calcChecksum(byte bytes[]) { Adler32 a32 = new Adler32(); a32.update(bytes, 12, bytes.length - 12); int sum = (int)a32.getValue(); bytes[8] = (byte)sum; bytes[9] = (byte)(sum >> 8); bytes[10] = (byte)(sum >> 16); bytes[11] = (byte)(sum >> 24); } public static int calcSize(byte bytes[]) { return(bytes.length); } } |
This now dumps all the header information from the original file, and will recalculate the signature and checksum in case something has changed. A version should be available shortly to check for differences in all the values, hopefully soon being able to calculate the correct values if something is wrong.
Maybe this will be useful for someone? Otherwise, oh well it’s just here in case I delete my files. Working on functions to find the new values after patching and to allow patching/injection of code. I’ll have to write up more later as I don’t have an overwhelming amount of time right now, busy day and I’m exhausted. Saw Sara play some volleyball, finished up solo campaign in COD5, spent a few hours reading and researching some dex related things and trying to get some more injection to work. Tomorrow I probably won’t have time to post – but trust me, this stuff will be up sooner or later. It’s a big puzzle I’m chipping away at, and it’s bugging the heck out of me not having the answers.
In my quest to writing a successful injector I’ve had to do a ton of digging into the dex file format. While mostly everything is open source, it’s not exactly easy to find all of the information – let alone understand it. A great resource I’ve mentioned previously was the “Dalvik VM Dex File Format” over at retrodev.org. This resource is sadly out dated and no longer updated by pavone, but it does provide a wealth of information. I figured I’d post my results just as pavone has done so that anyone looking for the information will hopefully find it. Note that pavone’s version of the dex file he was examining was ‘dex 009′ according to the magic. The current one as of this posting is ‘dex 035′. I’ll repost this data as I figure out more about it and exactly how it is modified.
Magic – 8 bytes – “dex\n035\0″
Checksum – 4 bytes – Adler32 checksum from bytes offset 12 and on
Signature – 20 bytes – SHA-1 of bytes from 32 on
File Size – 4 bytes – Exactly what it sounds like, the file size
Header Size – 4 bytes – Will always be “70″
Endian Tag – 8 bytes – Will always be “78563412″
Zeros – 8 bytes – Exactly that, eight bytes of zeros
Map Offset – 4 bytes – Leads to below, need more research on this though
String Table Size – 4 bytes – Size of the string’s table
String Table Offset – 4 bytes – Offset to the string table
TypeTable Size – 4 bytes – Size of the type’s table
Type Table Offset – 4 bytes – Offset to the type table
Prototype Table Size – 4 bytes – Size of the prototype’s table
Prototype Table Offset – 4 bytes – Offset to the prototype table
Field Table Size – 4 bytes – Size of the field’s table
Field Table Offset – 4 bytes – Offset to the field table
Method Table Size – 4 bytes – Size of the method’s table
Method Table Offset – 4 bytes – Offset to the method table
Class Table Size – 4 bytes – Size of the class’s table
Class Table Offset – 4 bytes – Offset to the class table
You can easily note that all the sizes of these fields end up adding up to 0×70, which is the “Header Size”. Also if above isn’t clear enough, after a dex file is created, the signature is applied – which is a SHA-1 digest of all the bytes below it’s position. The checksum is an Alder32 hash of all the bytes below itself, including the signature. I actually discussed this in a previous post where I posted the code for “ReDEX”, the post was entitled “DEX File signature and checksums“.
I’m actually revamping the “ReDEX” code to check and spit out this relevant information and more, though it’s not fully done. I’m also doing more research into the “Map” field and will hopefully be able to explain more about what is store, how it is stored and what not – more like the information originally presented on retrodev. Until then, this information will have to suffice, enjoy!



Recent Comments