1 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Peter Elmer | Security Expert, EMEA | Office of the CTO May 2021 The value of Machine Learning in Cyber Security DATA DRIVEN SECURITY
2 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • Need for Data Driven Security • Methods used • Value of Machine Learning powered by human experience • Effectivness of Data Driven Security Today we look at …
3 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Collaboration Intelligence Experience Key Ingredients For Success Check Point Software Technologies Founded in 1993, about 5.400 employees Securing more than 100.000 customers 27 Years
4 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ “Important decision points are taken by machines with logic created from data.” Check Point, Data Scientists Team October 2020
5 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Predicting Results Using Machine Learning Humans deciding on features and labels oval round smooth surface undulating surface sweet sour ‘for pie’ ‘for vine’ Data remains Data destroid Human experience is key when assigning characteristics (features) for predicting a result (label)
6 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Predicting?
7 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Logic Created From Data Computer Logic Data Program Deterministic result Humans deciding for the best logic to achieve a result prior to ‘feeding’ the machine Context Assumptions Conceptions Machine Learning Algorithm Data Result Characteristics of data (features) of historic results (labels) are presented to machine Program / Model Logic Program / Model Logic New Data Probablistic result
8 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Probabilistic results?
9 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Probabilistic Deterministic
10 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising
11 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Feeding more data into the machine increases accuracy
12 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Limited resources Increasing attack surface
13 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Attacking Is Easier Than Defending Surface • Intent • Idea • Plan • Design Logic • Source Code • Compile • Stream of bits Process Effort for defending Effort for defending
14 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Understanding Intent Optimizing Resources
15 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ 8 : 1 Applying Machine Learning requires eight times less resources than preparing the data
16 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Mathematical Representation Abstraction
17 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • An image of 224x224 RGB is transformed by filters becoming a number • Convolutional filters capture 3x3 pixels to capture notion of ... • right/left • up/down • center • Accuracy of 92,7% Changing Representation Turning an image into a number – VGG16 Convolutional Network Source: Neurohive – VGG16 Convolutional Network for Classification and Detection:
18 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • Training a VGG16 with fotos from Citiscapes • Enhancing realismn of animation • Eliminating artefacts Changing Representation Turning an image into a number – VGG16 Convolutional Network Source: Intel - Enhancing Photorealism Enhancement, May 2021
19 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Example: Human Language Describing meaning / intent to achieve an abstraction level King Queen Man Woman Masculinity Femininity Vectorising words allows ‘word algebra’ - Algebra allows Machine Learning swimming swam walking walked Verb tense Vectors are presenting the abstraction level
20 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Natural Language Processing (NLP) Describing meaning / intent to achieve an abstraction level “NLP is a subfield of computer science and artificial intelligence concerned with interactions between computers and human (natural) languages. It is used to apply machine learning algorithms to text and speech.” Source: towards data science
21 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Why is NLP useful? Describing meaning / intent to achieve an abstraction level Pineapples We know ‘Pineapples are spikey and yellow’ are spikey and yellow Input Projection Output ‘Give me the missing word’ Pineapples are spikey and yellow Input Projection Output ‘Give me the context’ Reference: Tomas Mikolov et al. : Distributed Representations of Words and Phrases and their Compositionality
22 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Understanding what is making something different How can we apply this to Cyber Security?
23 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Cyber Security Applying NLP when Sandboxing executables Observing API calls performed against the operating system API calls are language and can be vectorised
24 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Cyber Security Applying TF-IDF when disassembling OPCODES Borrowing TF-IDF algorithm from word document analysis Source: http://filotechnologia.blogspot.com/2014/01/a-simple-java-class-for-tfidf-scoring.html “TF-IDF is an information retrieval and information extraction subtask which aims to express the importance of a word to a document which is part of a collection of documents which we usually name a corpus. ”
25 ©2021 Check Point Software Technologies Ltd. Vectorising Elements – Cyber Security Decoded machine language Machine code has sequence – sequence has meaning [Protected] Distribution or modification is subject to approval ​
26 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • An executable file is fed into a neural network • Each ‘filter‘ performs a mathematical operation on a sliding patch Changing Representation Turning an executable file into vectors – VGG16 Convolutional Network Source: Check Point, Data Scientists Team, October 2020 Original Convolved
27 ©2021 Check Point Software Technologies Ltd. Machine Learning In Cyber Security Preventing Unknown Attacks EXE Understanding Entropy & Structure Disassembling URL Verification Finding Similarities File/Registry Classification using provided Meta Data Verdict Meta Data PDF PPT DOC XLS PDF Analyzer URL Verification Macro Analyzer Classification using provided Meta Data Verdict Meta Data [Protected] Distribution or modification is subject to approval ​
28 ©2021 Check Point Software Technologies Ltd. Machine Learning In Cyber Security Preventing Unknown Attacks On July 20th 2020 a sample was labeled malicious by our machine learning logic [Protected] Distribution or modification is subject to approval ​
29 ©2021 Check Point Software Technologies Ltd. Machine Learning In Cyber Security Preventing Unknown Attacks On July 24th 2020 only 45 out of 73 engines on Virus Total labeled it malicious [Protected] Distribution or modification is subject to approval ​ Four days later!
30 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Machine Learning In Cyber Security Sharing experience Source: https://research.checkpoint.com/category/how-to-guides/
31 ©2021 Check Point Software Technologies Ltd. Machine Learning In Cyber Security ‘Malware DNA’ based clustering applying TF-IDF Two dimensional representation of the 300 000 dimensional space representing the ‘world of malware’ in Check Point Threat Intelligence Colors representing labels of malware families [Protected] Distribution or modification is subject to approval ​
32 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Itay Cohen (Check Point) and Omri Ben Bassat (Intezer) mapped out an ecosystem Results: • Classification into 60 families and 200 modules • 22 000 connections between analyzed samples • Different Actors don’t share code Access the interactive map • Published as open source Download the detector tool • Defend and contribute Map based on Fruchterman-Reingold algorithm Read the full report: Machine Learning In Cyber Security ‘Malware DNA’ applied to uncover an APT Eco System
33 ©2021 Check Point Software Technologies Ltd. Machine Learning In Cyber Security Sharing experience Understand how vulnerable on-premises and cloud environments are [Protected] Distribution or modification is subject to approval ​ Source: https://research.checkpoint.com/2021/deep-into-the-sunburst-attack/ Understanding the SolarWinds Orion Platform Security Advisory 16-December 2020, video, https://community.checkpoint.com/
34 ©2021 Check Point Software Technologies Ltd. Machine Learning In Cyber Security The need for defense BBC article about Colonial Pipeline attack, May 2021 [Protected] Distribution or modification is subject to approval ​ Source: https://www.bbc.com/news/business-57050690 Source: Check Point, Research Blog, May 2021 Update 17th May 2021: DarkSide is offline - https://krebsonsecurity.com/
35 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Understanding the DNA of a malware allows attributing ‘family’ characteristics
36 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Knowing the ‘family’ …allows applying tools for defense ..allows saving resources
37 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ What‘s next?
38 ©2021 Check Point Software Technologies Ltd. Machine Learning – General Purpose Comparing NLP-Trained Models Over 300 apps are using GPT-3 https://openai.com/blog/gpt-3-apps/ GPT-3 API access is controlled https://openai.com/blog/openai-api/ 28th May 2020 14 Apps using GPT-3 [Protected] Distribution or modification is subject to approval ​
39 ©2021 Check Point Software Technologies Ltd. Machine Learning Empowers Threat Prevention Every input for Threat Intelligence becomes a Label More than 27 years of experience … • Having access to data • Knowing the labels • Selecting the right features • Creating ML algorithms • ML empowers Threat Prevention Data Labels This is This is Feature1: form Feature2: colour Next module [Protected] Distribution or modification is subject to approval ​
40 ©2021 Check Point Software Technologies Ltd. Machine Learning Empowers Threat Prevention The infinity cycle of learning Incumbent New DATA Labeling Training Stand by evaluation Decision point Federated Learning Using encrypted customer data Supervised by human expertise Measuring Unseen data Adjusting weights [Protected] Distribution or modification is subject to approval ​
41 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ The infinity cycle of learning is powered by us
42 ©2021 Check Point Software Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Peter Elmer | Security Expert, EMEA | Office of the CTO pelmer@checkpoint.com, May 2021 THANK YOU

stackconf 2021 | Data Driven Security

  • 1.
    1 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Peter Elmer | Security Expert, EMEA | Office of the CTO May 2021 The value of Machine Learning in Cyber Security DATA DRIVEN SECURITY
  • 2.
    2 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • Need for Data Driven Security • Methods used • Value of Machine Learning powered by human experience • Effectivness of Data Driven Security Today we look at …
  • 3.
    3 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Collaboration Intelligence Experience Key Ingredients For Success Check Point Software Technologies Founded in 1993, about 5.400 employees Securing more than 100.000 customers 27 Years
  • 4.
    4 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ “Important decision points are taken by machines with logic created from data.” Check Point, Data Scientists Team October 2020
  • 5.
    5 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Predicting Results Using Machine Learning Humans deciding on features and labels oval round smooth surface undulating surface sweet sour ‘for pie’ ‘for vine’ Data remains Data destroid Human experience is key when assigning characteristics (features) for predicting a result (label)
  • 6.
    6 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Predicting?
  • 7.
    7 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Logic Created From Data Computer Logic Data Program Deterministic result Humans deciding for the best logic to achieve a result prior to ‘feeding’ the machine Context Assumptions Conceptions Machine Learning Algorithm Data Result Characteristics of data (features) of historic results (labels) are presented to machine Program / Model Logic Program / Model Logic New Data Probablistic result
  • 8.
    8 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Probabilistic results?
  • 9.
    9 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Probabilistic Deterministic
  • 10.
    10 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising
  • 11.
    11 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Feeding more data into the machine increases accuracy
  • 12.
    12 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Limited resources Increasing attack surface
  • 13.
    13 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Attacking Is Easier Than Defending Surface • Intent • Idea • Plan • Design Logic • Source Code • Compile • Stream of bits Process Effort for defending Effort for defending
  • 14.
    14 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Understanding Intent Optimizing Resources
  • 15.
    15 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ 8 : 1 Applying Machine Learning requires eight times less resources than preparing the data
  • 16.
    16 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Mathematical Representation Abstraction
  • 17.
    17 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • An image of 224x224 RGB is transformed by filters becoming a number • Convolutional filters capture 3x3 pixels to capture notion of ... • right/left • up/down • center • Accuracy of 92,7% Changing Representation Turning an image into a number – VGG16 Convolutional Network Source: Neurohive – VGG16 Convolutional Network for Classification and Detection:
  • 18.
    18 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • Training a VGG16 with fotos from Citiscapes • Enhancing realismn of animation • Eliminating artefacts Changing Representation Turning an image into a number – VGG16 Convolutional Network Source: Intel - Enhancing Photorealism Enhancement, May 2021
  • 19.
    19 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Example: Human Language Describing meaning / intent to achieve an abstraction level King Queen Man Woman Masculinity Femininity Vectorising words allows ‘word algebra’ - Algebra allows Machine Learning swimming swam walking walked Verb tense Vectors are presenting the abstraction level
  • 20.
    20 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Natural Language Processing (NLP) Describing meaning / intent to achieve an abstraction level “NLP is a subfield of computer science and artificial intelligence concerned with interactions between computers and human (natural) languages. It is used to apply machine learning algorithms to text and speech.” Source: towards data science
  • 21.
    21 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Why is NLP useful? Describing meaning / intent to achieve an abstraction level Pineapples We know ‘Pineapples are spikey and yellow’ are spikey and yellow Input Projection Output ‘Give me the missing word’ Pineapples are spikey and yellow Input Projection Output ‘Give me the context’ Reference: Tomas Mikolov et al. : Distributed Representations of Words and Phrases and their Compositionality
  • 22.
    22 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Understanding what is making something different How can we apply this to Cyber Security?
  • 23.
    23 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Cyber Security Applying NLP when Sandboxing executables Observing API calls performed against the operating system API calls are language and can be vectorised
  • 24.
    24 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Vectorising Elements – Cyber Security Applying TF-IDF when disassembling OPCODES Borrowing TF-IDF algorithm from word document analysis Source: http://filotechnologia.blogspot.com/2014/01/a-simple-java-class-for-tfidf-scoring.html “TF-IDF is an information retrieval and information extraction subtask which aims to express the importance of a word to a document which is part of a collection of documents which we usually name a corpus. ”
  • 25.
    25 ©2021 Check PointSoftware Technologies Ltd. Vectorising Elements – Cyber Security Decoded machine language Machine code has sequence – sequence has meaning [Protected] Distribution or modification is subject to approval ​
  • 26.
    26 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ • An executable file is fed into a neural network • Each ‘filter‘ performs a mathematical operation on a sliding patch Changing Representation Turning an executable file into vectors – VGG16 Convolutional Network Source: Check Point, Data Scientists Team, October 2020 Original Convolved
  • 27.
    27 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security Preventing Unknown Attacks EXE Understanding Entropy & Structure Disassembling URL Verification Finding Similarities File/Registry Classification using provided Meta Data Verdict Meta Data PDF PPT DOC XLS PDF Analyzer URL Verification Macro Analyzer Classification using provided Meta Data Verdict Meta Data [Protected] Distribution or modification is subject to approval ​
  • 28.
    28 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security Preventing Unknown Attacks On July 20th 2020 a sample was labeled malicious by our machine learning logic [Protected] Distribution or modification is subject to approval ​
  • 29.
    29 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security Preventing Unknown Attacks On July 24th 2020 only 45 out of 73 engines on Virus Total labeled it malicious [Protected] Distribution or modification is subject to approval ​ Four days later!
  • 30.
    30 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Machine Learning In Cyber Security Sharing experience Source: https://research.checkpoint.com/category/how-to-guides/
  • 31.
    31 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security ‘Malware DNA’ based clustering applying TF-IDF Two dimensional representation of the 300 000 dimensional space representing the ‘world of malware’ in Check Point Threat Intelligence Colors representing labels of malware families [Protected] Distribution or modification is subject to approval ​
  • 32.
    32 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Itay Cohen (Check Point) and Omri Ben Bassat (Intezer) mapped out an ecosystem Results: • Classification into 60 families and 200 modules • 22 000 connections between analyzed samples • Different Actors don’t share code Access the interactive map • Published as open source Download the detector tool • Defend and contribute Map based on Fruchterman-Reingold algorithm Read the full report: Machine Learning In Cyber Security ‘Malware DNA’ applied to uncover an APT Eco System
  • 33.
    33 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security Sharing experience Understand how vulnerable on-premises and cloud environments are [Protected] Distribution or modification is subject to approval ​ Source: https://research.checkpoint.com/2021/deep-into-the-sunburst-attack/ Understanding the SolarWinds Orion Platform Security Advisory 16-December 2020, video, https://community.checkpoint.com/
  • 34.
    34 ©2021 Check PointSoftware Technologies Ltd. Machine Learning In Cyber Security The need for defense BBC article about Colonial Pipeline attack, May 2021 [Protected] Distribution or modification is subject to approval ​ Source: https://www.bbc.com/news/business-57050690 Source: Check Point, Research Blog, May 2021 Update 17th May 2021: DarkSide is offline - https://krebsonsecurity.com/
  • 35.
    35 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Understanding the DNA of a malware allows attributing ‘family’ characteristics
  • 36.
    36 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Knowing the ‘family’ …allows applying tools for defense ..allows saving resources
  • 37.
    37 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ What‘s next?
  • 38.
    38 ©2021 Check PointSoftware Technologies Ltd. Machine Learning – General Purpose Comparing NLP-Trained Models Over 300 apps are using GPT-3 https://openai.com/blog/gpt-3-apps/ GPT-3 API access is controlled https://openai.com/blog/openai-api/ 28th May 2020 14 Apps using GPT-3 [Protected] Distribution or modification is subject to approval ​
  • 39.
    39 ©2021 Check PointSoftware Technologies Ltd. Machine Learning Empowers Threat Prevention Every input for Threat Intelligence becomes a Label More than 27 years of experience … • Having access to data • Knowing the labels • Selecting the right features • Creating ML algorithms • ML empowers Threat Prevention Data Labels This is This is Feature1: form Feature2: colour Next module [Protected] Distribution or modification is subject to approval ​
  • 40.
    40 ©2021 Check PointSoftware Technologies Ltd. Machine Learning Empowers Threat Prevention The infinity cycle of learning Incumbent New DATA Labeling Training Stand by evaluation Decision point Federated Learning Using encrypted customer data Supervised by human expertise Measuring Unseen data Adjusting weights [Protected] Distribution or modification is subject to approval ​
  • 41.
    41 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ The infinity cycle of learning is powered by us
  • 42.
    42 ©2021 Check PointSoftware Technologies Ltd. [Protected] Distribution or modification is subject to approval ​ Peter Elmer | Security Expert, EMEA | Office of the CTO pelmer@checkpoint.com, May 2021 THANK YOU