Tải bản đầy đủ (.pdf) (680 trang)

IT training data mining tools for malware detection masud, khan thuraisingham 2011 12 07

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (8.97 MB, 680 trang )


IT MANAGEMENT TITLES
FROM
AUERBACH
PUBLICATIONS AND CRC PRESS
.Net 4 for Enterprise Architects and Developers
Sudhanshu Hate and Suchi Paharia
ISBN 978-1-4398-6293-3
A Tale of Two Transformations: Bringing Lean and Agile
Software Development to Life
Michael K. Levine
ISBN 978-1-4398-7975-7
Antipatterns: Managing Software Organizations and
People, Second Edition
Colin J. Neill, Philip A. Laplante, and Joanna F. DeFranco
ISBN 978-1-4398-6186-8
Asset Protection through Security Awareness
Tyler Justin Speed
ISBN 978-1-4398-0982-2
Beyond Knowledge Management: What Every Leader
Should Know
Edited by Jay Liebowitz
ISBN 978-1-4398-6250-6
CISO’s Guide to Penetration Testing: A Framework to
Plan, Manage, and Maximize Benefits

2


James S. Tiller
ISBN 978-1-4398-8027-2


Cybersecurity: Public Sector Threats and Responses
Edited by Kim J. Andreasson
ISBN 978-1-4398-4663-6
Cybersecurity for Industrial Control Systems: SCADA,
DCS, PLC, HMI, and SIS
Tyson Macaulay and Bryan Singer
ISBN 978-1-4398-0196-3
Data Warehouse Designs: Achieving ROI with Market
Basket Analysis and Time Variance
Fon Silvers
ISBN 978-1-4398-7076-1
Emerging Wireless Networks: Concepts, Techniques and
Applications
Edited by Christian Makaya and Samuel Pierre
ISBN 978-1-4398-2135-0
Information and Communication Technologies
Healthcare
Edited by Stephan Jones and Frank M. Groom
ISBN 978-1-4398-5413-6

in

Information Security Governance Simplified: From the
Boardroom to the Keyboard
Todd Fitzgerald
ISBN 978-1-4398-1163-4

3



IP Telephony Interconnection Reference: Challenges,
Models, and Engineering
Mohamed Boucadair, Isabel Borges, Pedro Miguel Neves,
and Olafur Pall Einarsson
ISBN 978-1-4398-5178-4
IT’s All about the People: Technology Management That
Overcomes Disaffected People, Stupid Processes, and
Deranged Corporate Cultures
Stephen J. Andriole
ISBN 978-1-4398-7658-9
IT Best Practices: Management,
Performance, and Projects
Tom C. Witt
ISBN 978-1-4398-6854-6

Teams,

Quality,

Maximizing Benefits from IT Project Management: From
Requirements to Value Delivery
José López Soriano
ISBN 978-1-4398-4156-3
Secure and Resilient Software: Requirements, Test Cases,
and Testing Methods
Mark S. Merkow and Lakshmikanth Raghavan
ISBN 978-1-4398-6621-4
Security De-engineering: Solving
Information Risk Management
Ian Tibble

ISBN 978-1-4398-6834-8

4

the

Problems

in


Software Maintenance Success Recipes
Donald J. Reifer
ISBN 978-1-4398-5166-1
Software Project Management:
Approach
Ashfaque Ahmed
ISBN 978-1-4398-4655-1

A

Process-Driven

Web-Based and Traditional Outsourcing
Vivek Sharma, Varun Sharma, and K.S. Rajasekaran, Infosys
Technologies Ltd., Bangalore, India
ISBN 978-1-4398-1055-2

5



Data Mining
Tools for Malware
Detection
Mehedy Masud, Latifur
Khan,
and Bhavani Thuraisingham

6


CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2011 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an
Informa business
No claim to original U.S. Government works
Version Date: 20120111
International Standard Book Number-13: 978-1-4665-1648-9
(eBook - ePub)
This book contains information obtained from authentic and
highly regarded sources. Reasonable efforts have been made
to publish reliable data and information, but the author and
publisher cannot assume responsibility for the validity of all
materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all
material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not

been obtained. If any copyright material has not been
acknowledged please write and let us know so we may rectify
in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this
book may be reprinted, reproduced, transmitted, or utilized in
any form by any electronic, mechanical, or other means, now
known or hereafter invented, including photocopying,
microfilming, and recording, or in any information storage or

7


retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically
from this work, please access www.copyright.com
( or contact the Copyright
Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers,
MA 01923, 978-750-8400. CCC is a not-for-profit
organization that provides licenses and registration for a
variety of users. For organizations that have been granted a
photocopy license by the CCC, a separate system of payment
has been arranged.
Trademark Notice: Product or corporate names may be
trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at

and the CRC Press Web site at



8


Dedication
We dedicate this book to our respective families for their
support that enabled us to write this book.

9


Contents
PREFACE
Introductory Remarks
Background on Data Mining
Data Mining for Cyber Security
Organization of This Book
Concluding Remarks
ACKNOWLEDGMENTS
THE AUTHORS
COPYRIGHT PERMISSIONS
CHAPTER 1: INTRODUCTION
1.1 Trends
1.2 Data Mining and Security Technologies
1.3 Data Mining for Email Worm Detection
1.4 Data Mining for Malicious
Code Detection
1.5 Data Mining for Detecting Remote Exploits
10



1.6 Data Mining for Botnet Detection
1.7 Stream Data Mining
1.8 Emerging Data Mining Tools for Cyber Security
Applications
1.9 Organization of This Book
1.10 Next Steps
PART I: DATA MINING AND SECURITY
Introduction to Part I: Data Mining and Security
CHAPTER 2: DATA MINING TECHNIQUES
2.1 Introduction
2.2 Overview of Data Mining Tasks and Techniques
2.3 Artificial Neural Network
2.4 Support Vector Machines
2.5 Markov Model
2.6 Association Rule Mining (ARM)
2.7 Multi-Class Problem
2.7.1 One-vs-One
2.7.2 One-vs-All
11


2.8 Image Mining
2.8.1 Feature Selection
2.8.2 Automatic Image Annotation
2.8.3 Image Classification
2.9 Summary
References
CHAPTER 3: MALWARE
3.1 Introduction

3.2 Viruses
3.3 Worms
3.4 Trojan Horses
3.5 Time and Logic Bombs
3.6 Botnet
3.7 Spyware
3.8 Summary
References
CHAPTER 4: DATA
APPLICATIONS

MINING

12

FOR

SECURITY


4.1 Introduction
4.2 Data Mining for Cyber Security
4.2.1 Overview
4.2.2 Cyber-Terrorism, Insider Threats, and External Attacks
4.2.3 Malicious Intrusions
4.2.4 Credit Card Fraud and Identity Theft
4.2.5 Attacks on Critical Infrastructures
4.2.6 Data Mining for Cyber Security
4.3 Current Research and Development
4.4 Summary

References
CHAPTER 5: DESIGN AND IMPLEMENTATION OF
DATA MINING TOOLS
5.1 Introduction
5.2 Intrusion Detection
5.3 Web Page Surfing Prediction
5.4 Image Classification
5.5 Summary
13


References
CONCLUSION TO PART I
PART II: DATA
DETECTION

MINING

FOR

EMAIL

WORM

Introduction to Part II
CHAPTER 6: Email Worm Detection
6.1 Introduction
6.2 Architecture
6.3 Related Work
6.4 Overview of Our Approach

6.5 Summary
References
CHAPTER 7: DESIGN OF THE DATA MINING TOOL
7.1 Introduction
7.2 Architecture
7.3 Feature Description
7.3.1 Per-Email Features
7.3.2 Per-Window Features
14


7.4 Feature Reduction Techniques
7.4.1 Dimension Reduction
7.4.2 Two-Phase Feature Selection (TPS)
7.4.2.1 Phase I
7.4.2.2 Phase II
7.5 Classification Techniques
7.6 Summary
References
CHAPTER 8: EVALUATION AND RESULTS
8.1 Introduction
8.2 Dataset
8.3 Experimental Setup
8.4 Results
8.4.1 Results from Unreduced Data
8.4.2 Results from PCA-Reduced Data
8.4.3 Results from Two-Phase Selection
8.5 Summary

15



References
CONCLUSION TO PART II
PART III: DATA MINING FOR DETECTING MALICIOUS
EXECUTABLES
Introduction to Part III
CHAPTER 9: MALICIOUS EXECUTABLES
9.1 Introduction
9.2 Architecture
9.3 Related Work
9.4 Hybrid Feature Retrieval (HFR) Model
9.5 Summary
References
CHAPTER 10: DESIGN OF THE DATA MINING TOOL
10.1 Introduction
10.2 Feature Extraction Using n-Gram Analysis
10.2.1 Binary n-Gram Feature
10.2.2 Feature Collection
10.2.3 Feature Selection
16


10.2.4 Assembly n-Gram Feature
10.2.5 DLL Function Call Feature
10.3 The Hybrid Feature Retrieval Model
10.3.1 Description of the Model
10.3.2 The Assembly Feature Retrieval (AFR) Algorithm
10.3.3 Feature Vector Computation and Classification
10.4 Summary

References
CHAPTER 11: EVALUATION AND RESULTS
11.1 Introduction
11.2 Experiments
11.3 Dataset
11.4 Experimental Setup
11.5 Results
11.5.1 Accuracy
11.5.1.1 Dataset1
11.5.1.2 Dataset2

17


11.5.1.3 Statistical Significance Test
11.5.1.4 DLL Call Feature
11.5.2 ROC Curves
11.5.3 False Positive and False Negative
11.5.4 Running Time
11.5.5 Training and Testing with Boosted J48
11.6 Example Run
11.7 Summary
References
CONCLUSION TO PART III
PART IV: DATA MINING FOR DETECTING REMOTE
EXPLOITS
Introduction to Part IV
CHAPTER 12: DETECTING REMOTE EXPLOITS
12.1 Introduction
12.2 Architecture

12.3 Related Work
12.4 Overview of Our Approach
18


12.5 Summary
References
CHAPTER 13: DESIGN OF THE DATA MINING TOOL
13.1 Introduction
13.2 DExtor Architecture
13.3 Disassembly
13.4 Feature Extraction
13.4.1 Useful Instruction Count (UIC)
13.4.2 Instruction Usage Frequencies (IUF)
13.4.3 Code vs. Data Length (CDL)
13.5 Combining Features and Compute Combined Feature
Vector
13.6 Classification
13.7 Summary
References
CHAPTER 14: EVALUATION AND RESULTS
14.1 Introduction
14.2 Dataset
19


14.3 Experimental Setup
14.3.1 Parameter Settings
14.2.2 Baseline Techniques
14.4 Results

14.4.1 Running Time
14.5 Analysis
14.6 Robustness and Limitations
14.6.1 Robustness against Obfuscations
14.6.2 Limitations
14.7 Summary
References
CONCLUSION TO PART IV
PART V: DATA MINING FOR DETECTING BOTNETS
Introduction to Part V
CHAPTER 15: DETECTING BOTNETS
15.1 Introduction
15.2 Botnet Architecture

20


15.3 Related Work
15.4 Our Approach
15.5 Summary
References
CHAPTER 16: DESIGN OF THE DATA MINING TOOL
16.1 Introduction
16.2 Architecture
16.3 System Setup
16.4 Data Collection
16.5 Bot Command Categorization
16.6 Feature Extraction
16.6.1 Packet-Level Features
16.6.2 Flow-Level Features

16.7 Log File Correlation
16.8 Classification
16.9 Packet Filtering
16.10 Summary

21


References
CHAPTER 17: Evaluation and Results
17.1 Introduction
17.1.1 Baseline Techniques
17.1.2 Classifiers
17.2 Performance on Different Datasets
17.3 Comparison with Other Techniques
17.4 Further Analysis
17.5 Summary
References
CONCLUSION TO PART V
PART VI: STREAM
APPLICATIONS

MINING

Introduction to Part VI
CHAPTER 18: STREAM MINING
18.1 Introduction
18.2 Architecture
18.3 Related Work
22


FOR

SECURITY


18.4 Our Approach
18.5 Overview of the Novel Class Detection Algorithm
18.6 Classifiers Used
18.7 Security Applications
18.8 Summary
References
CHAPTER 19: DESIGN OF THE DATA MINING TOOL
19.1 Introduction
19.2 Definitions
19.3 Novel Class Detection
19.3.1 Saving the Inventory of Used Spaces during Training
19.3.1.1 Clustering
19.3.1.2 Storing the Cluster Summary Information
19.3.2 Outlier Detection and Filtering
19.3.2.1 Filtering
19.3.3 Detecting Novel Class
19.3.3.1 Computing the Set of Novel Class Instances

23


19.3.3.2 Speeding up the Computation
19.3.3.3 Time Complexity
19.3.3.4 Impact of Evolving Class Labels on Ensemble

Classification
19.4 Security Applications
19.5 Summary
Reference
CHAPTER 20: EVALUATION AND RESULTS
20.1 Introduction
20.2 Datasets
20.2.1 Synthetic Data with Only Concept-Drift (SynC)
20.2.2 Synthetic Data with Concept-Drift and Novel Class
(SynCN)
20.2.3 Real Data—KDD Cup 99 Network Intrusion Detection
20.2.4 Real Data—Forest Cover (UCI Repository)
20.3 Experimental Setup
20.3.1 Baseline Method
20.4 Performance Study

24


20.4.1 Evaluation Approach
20.4.2 Results
20.4.3 Running Time
20.5 Summary
References
CONCLUSION TO VI
PART VII: EMERGING APPLICATIONS
Introduction to Part VII
CHAPTER 21: Data Mining for Active Defense
21.1 Introduction
21.2 Related Work

21.3 Architecture
21.4 A Data Mining-Based Malware Detection Model
21.4.1 Our Framework
21.4.2 Feature Extraction
21.4.2.1 Binary n-Gram Feature Extraction
21.4.2.2 Feature Selection

25


×