1
2 Fast detection of Android malware Yury Leonychev
3 Introduction
4 Android application APK Manifest (AndroidManifest.xml) Code (Classes.dex and native) Meta information (META-INF) Resources (files and Resources.arsc)
5 Brief list of tools for APK analysis ! Androguard (ultimate tool by @adesnos and others) – used by VirusTotal, APKInspector, etc. ! SCanDroid (Adam P. Fuchs, Avik Chaudhuri, and Jeffrey S. Foster) ! TaintDroid (guys from Intel, Penn State University, Duke University) ! DroidBox (dynamic analysis by Lantz Patric) – used by ApkScan
6 Is this all? Really? !  http://www.apk-analyzer.net !  http://anubis.iseclab.org !  http://apkscan.nviso.be
7 Our task is more complex Malware detector
8 Methods of malware detection Static analysis !  Advantages –  APK has predictable content. Application behavior can be learned by simply reading the file –  Checks are safe !  Limitations –  Can be ineffective for sophisticated malware and obfuscation techniques –  We cannot really tell as we don't execute app
9 Methods of malware detection Dynamic analysis !  Advantages –  Clear results and interpretation –  Open source solutions available !  Limitations –  Not fast (enough) –  Can be detected and bypassed –  Big ecosystem requires big infrastructure
10 Methods of malware detection Signature analysis !  Advantages –  Effective for known malware –  Commercial solutions available !  Limitations –  Signature databases requires regular (and frequent) updates –  Not effective for new malware –  Do you have a team of virus analytics?
11 Methods of malware detection Seems like the most efficient way is hybrid solution
12 MatrixNet What is The Matrix?
13 Why can we use machine learning? Abstract task description: !  We have a set of objects (APK-files). We should divide this set into two subsets (malware and normal) !  For every element in main set we can count predictable amount of features !  Subsets – only result of simple classification task, so we can try to choose effective features
14 What is the MatrixNet? MatrixNet is an implementation of gradient boosted decision trees algorithm MatrixNet is a bit different from standard: !  Using Oblivious Trees !  Accounting for sample count in each leaf
15 Why MatrixNet is powerful? !  This is machine learning algorithm for classification task !  A key feature of this method is it’s resistance to overfitting
16 MatrixNet post learning optimization
17 MatrixNet post learning optimization Copyright © 2013 by Sidney Harris.
18 How it works? Offline learning process: !  Choosing features !  Choosing samples !  Manual classification (malware or not) !  Learning on combined set of apps !  Calculating mistakes
19 Features What kind of features to use: !  Permissions !  URI in strings and other resources !  Adware library usage !  Obfuscation methods !  …
20 Samples and classification Malware applications: ! VirusTotal feed !  Samples from malicious sites Normal applications: !  Manual testing !  Trusted developers !  Yandex applications
21 Formula Features weight Features cost Learning Normal Malware MatrixNetFeatures  
22 Measuring of mistakes Formula 1 Features cost 1 Formula N Features cost N Normal Malware Formula with cool confusion matrix and low cost
23 Analyzer architecture Fine! I'll go build my own casino, with blackjack and big data
24 Main parts Parsers Analyzers Oracle Report
25 Parsers In depth APK ManifestParser ResourceParser MetaInfoParser ClassesParser Analyzers PermissionAnalyzer PackageAnalyzer URLAnalyzer ReflectionAnalyzer Reports XHTMLReporter JSONReporter Oracle MatrixNet
26 ManifestParser Avoid some obfuscation methods: ! HEUR:Backdoor.AndroidOS.Obad.a
27 <?xml version="1.0" encoding="utf-8"?> <manifest ="singleTop" android:versionCode="2" ="2.0" android:installLocation="internalOnly" package="com.android.system.admin" xmlns:android="http://schemas.android.com/apk/res/android"> <uses-permission ="android.permission.READ_LOGS" /> <uses-permission ="android.permission.WAKE_LOCK" /> … <uses-permission ="android.permission.RECEIVE_SMS" /> <uses-permission ="android.permission.SEND_SMS" /> <uses-permission ="android.permission.CALL_PHONE" /> ManifestParser
28 ClassesParser !  Parser for DEX files !  Internal DEX disassembler !  Callgraph builder !  Embeds “real” functions/variables names into disassembly listing !  Builds a list of used procedures and functions
29 ClassesParser Disassembler https://github.com/tracer0tong/de Example: ./de.py test1.dex.dat [[0, 'sget-object v0, {type} [{class}].{field} // field@2225'], [2, 'invoke-virtual v0 @13970 // {class}->{method}'], [5, 'move-result-object v0'], [6, 'check-cast v0, [{type_name}] // type@0958'], [8, 'return-object v0']]
30 ReflectionAnalyzer java.lang.reflect.* !  Classes: Field, Method, etc. !  Functions: getClass(), getDeclaredField(), etc.
31 ReflectionAnalyzer Output: !  Report: There is some reflections usage: 1@android.app.Activity->getContentResolver calls: 598@java.lang.Class->forName 2@android.app.Activity->onActivityResult calls: 598@java.lang.Class->forName !  Amount of reflection calls is a feature.
32 Service architecture Nginx   Gunicorn   Flask   Celery   MongoDB   Nginx   Gunicorn   Flask   Celery   MongoDB  
33 Case study
34 Let's try it on... Yandex.Store application feed: !  More than 50K Android applications !  More than 200 new/updated apps per week !  Open for developers (no strict manual verification)
35 Perfomance. Check timing ~2 ms ~0,25 s ~4,5 min
36 Performance. Amount of checks !  More than 16.000 applications checked in 1 hour on 1 cluster node
37 Confusion matrix Meaning Malware (Score > 0) Normal (Score < 0) Fact Malware 485 (97%) 15 (3%) Normal 25 (5%) 475 (95%)
38 (Un)predictable results !  Applications with malicious adware library AirPush classified as malware !  But we have no special features for adware in first version
39 Conclusion It’s alive… alive!
40 It works! !  Analytic methods work fine for detection Android mobile malware !  Machine learning is not a “rocket science” but cool and effective instrument !  Open API coming soon.
41 Thanks for attention
42 Yury Leonychev Application Security Engineer yleonychev@yandex-team.ru !   tracer0tong© Yandex LLC 2013

Fast detection of Android malware: machine learning approach