Data Science and Applications (An Introduction) Kürşat İNCE kince@havelsan.com.tr
Kürşat İNCE 1996 – Today HAVELSAN, Inc. • 1996 - Development of HVL Firewall (#1 in Turkey) • 2001 - Developer in various projects: TuAF IS, MELTEM, etc. • 2010 - YGO Project/Product manager • 2014 - Move to HVL Istanbul Office  • 2014 - Systems Engineer • 2016 - R&D Coordinator June 2016 - Organizer at www.DataIstanbul.org BSc, Bilkent University Computer Engineering, 1996 MSc, Bilkent University Computer Engineering, 1999 PhD, Gebze Technical University Computer Engineering (in progress)
FACILITIES HAVELSAN HEADQUARTERS (ANKARA) SIMULATION CENTER NAVAL COMBAT SYSTEMS CENTER - İSTANBUL R&D CENTER (METU Technopolis) TEST & INTEGRATION FACILITIES HAVELSANKEYFACTS SİSATEM
BUSINESS AREAS HAVELSANKEYFACTS Command & Control Solutions House of Turkey COMMAND, CONTROL & COMBAT SYSTEMS A Global Brand in Simulation & Training TRAINING TECHNOLOGIES & SIMULATION SYSTEMS Leading E- Transformation Company of Turkey MANAGEMENT INFORMATION SYSTEMS Center of Excellence in Security Solutions HOMELAND & CYBER SECURITY SOLUTIONS
• Meetup Community • Established: Mart 2016 • Members: ~1500 • Latest Events: • Büyük Veri için Veri Yapıları ve Algoritmalar • Web Analitiği ve Dönüşüm Oranı Optimizasyonu • Veri Bilimi ve Kişisel Verilerin Korunması • Planning hands-on Data Science course /data_istanbul/dataistanbul
Agenda • Data Science • Roles, Skill Sets, and Process • Applications • Final Word • Resources, etc.
Data Science
9 Evolution of Sciences • Before 1600, empirical science • Direct observations • 1600-1950s, theoretical science • Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding. • 1950s-1990s, computational science • Over the last 50 years, most disciplines have grown a third, computational branch (e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.) • Computational Science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models. • 1990-now, data science • The flood of data from new scientific instruments and simulations • The ability to economically store and manage petabytes of data online • The Internet and computing Grid that makes all these archives universally accessible Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science, Comm. ACM, 45(11): 50-54, Nov. 2002
Data Science is… • ...the art of turning data into actions. The Field Guide to Data Science by Booz Allen Hamilton
http://www.wired.co.uk/article/art-algorithm-recreates-paintings J.M. Turner’s “The Wreck of a Transport Ship” Van Gogh’s “The Starry Night,”
Data Science is… • …the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results.
Where is all the data coming from?
Data Value From Data to Actions
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
The Sexiest Job Cartoon Marion van de Wiel, DSC/e Workshop with industry in 2014.
Data Science is…
The Roles 18 Increasing potential to support business decisions Customer / End User Business Analyst Data Scientists Data Engineer / DBA Decision Making Data Presentation Visualization Techniques Modelling and Algorithms Machine learning, and statistical models Data Exploration Statistical analysis, data visualization… Data Preprocessing / Integration Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems Missing values, Duplicate values…
Data Science Skill Set Data Scientist: The Engineer of the Future Technical Skills Business Skills Social Skills
Data Science Process Process DataFrame the Problem Collect Raw Data Explore Data Perform in-depth analysis
Data Science Process Communicate the Results
Applications of Data Science
https://www.forbes.com/sites/kashmirhill/20 12/02/16/how-target-figured-out-a-teen-girl- was-pregnant-before-her-father-did/ Story of Target
https://www.top500.org/news/watson-proving-better-than-doctors-in-diagnosing-cancer/
http://fortune.com/2016/10/11/ibm-watson-empoyees-cancer-drugs/
Web Analytics • Web analytics is the measurement, collection, analysis and reporting of web data for purposes of understanding and optimizing web usage. • Used to measure metrics / key performance indicators such as • Hit • Page View Event • Visitor • Impression • Bounce Rate • Exit Rate • Session Duration • Click path Web Analytics Software • Google Analytics • Yandex Metrica • Count.ly (Turkey origin) • Rakam.io (Turkey origin)
Count.ly
Collaborative Filtering • Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating).
Collaborative Filtering
Collaborative Filtering
Linkedin People You May Know
Health Care Analytics • Health care analytics is a term used to describe the healthcare analysis activities that can be undertaken as a result of data collected in healthcare services, namely • clinical data (electronic medical records), • patient behavior and sentiment data. • pharmaceutical and research and development data, • claims and cost data,
Health Care Analytics • Electronic Health Records (EHRs) • Infrastructure and use cases to store, retrieve, and share EHRs securely. • Real-time Alerting • Clinical Decision Support via wearables • Predictive Analytics in Healthcare • Increase the accuracy of diagnoses, preventive medicine and public health, detection of risk of diabetes, etc. • Telemedicine • Delivery of remote clinical services such as remote patient monitoring, initial diagnosis • Telesurgery with the use of robots, etc. http://www.datapine.com/blog/big-data-examples-in-healthcare/
Predictive Maintenance • Reactive Point Processes: A New Approach to Predicting Power Failures in Underground Electrical Systems. Seyda Ertekin, Cynthia Rudin, Tyler McCormick. Annals of Applied Statistics,2015. • A new statistical model designed for predicting discrete events (e.g. fires, explosions & power failures) in time based on the past history.
Final Words
Resources and Datasets • Kaggle Competitions http://kaggle.com • UCI Machine Learning Repository http://archive.ics.uci.edu/ml/ • Kdnuggets http://www.kdnuggets.com/datasets/ • DataQuest https://www.dataquest.io/ • Massive Open Online Courses • Coursera, edX, etc. • …
Final Words • Data science is the art of turning data into actions. • As data increases data scientist will be a rare resource.
Data Science Process http://www.kdnuggets.com/2016/03/data-science-process.html
Data is Everywhere Thank you

GTU GeekDay Data Science and Applications

  • 1.
    Data Science and Applications (AnIntroduction) Kürşat İNCE kince@havelsan.com.tr
  • 2.
    Kürşat İNCE 1996 –Today HAVELSAN, Inc. • 1996 - Development of HVL Firewall (#1 in Turkey) • 2001 - Developer in various projects: TuAF IS, MELTEM, etc. • 2010 - YGO Project/Product manager • 2014 - Move to HVL Istanbul Office  • 2014 - Systems Engineer • 2016 - R&D Coordinator June 2016 - Organizer at www.DataIstanbul.org BSc, Bilkent University Computer Engineering, 1996 MSc, Bilkent University Computer Engineering, 1999 PhD, Gebze Technical University Computer Engineering (in progress)
  • 4.
    FACILITIES HAVELSAN HEADQUARTERS (ANKARA) SIMULATION CENTERNAVAL COMBAT SYSTEMS CENTER - İSTANBUL R&D CENTER (METU Technopolis) TEST & INTEGRATION FACILITIES HAVELSANKEYFACTS SİSATEM
  • 5.
    BUSINESS AREAS HAVELSANKEYFACTS Command &Control Solutions House of Turkey COMMAND, CONTROL & COMBAT SYSTEMS A Global Brand in Simulation & Training TRAINING TECHNOLOGIES & SIMULATION SYSTEMS Leading E- Transformation Company of Turkey MANAGEMENT INFORMATION SYSTEMS Center of Excellence in Security Solutions HOMELAND & CYBER SECURITY SOLUTIONS
  • 6.
    • Meetup Community •Established: Mart 2016 • Members: ~1500 • Latest Events: • Büyük Veri için Veri Yapıları ve Algoritmalar • Web Analitiği ve Dönüşüm Oranı Optimizasyonu • Veri Bilimi ve Kişisel Verilerin Korunması • Planning hands-on Data Science course /data_istanbul/dataistanbul
  • 7.
    Agenda • Data Science •Roles, Skill Sets, and Process • Applications • Final Word • Resources, etc.
  • 8.
  • 9.
    9 Evolution of Sciences •Before 1600, empirical science • Direct observations • 1600-1950s, theoretical science • Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding. • 1950s-1990s, computational science • Over the last 50 years, most disciplines have grown a third, computational branch (e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.) • Computational Science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models. • 1990-now, data science • The flood of data from new scientific instruments and simulations • The ability to economically store and manage petabytes of data online • The Internet and computing Grid that makes all these archives universally accessible Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science, Comm. ACM, 45(11): 50-54, Nov. 2002
  • 10.
    Data Science is… •...the art of turning data into actions. The Field Guide to Data Science by Booz Allen Hamilton
  • 11.
    http://www.wired.co.uk/article/art-algorithm-recreates-paintings J.M. Turner’s “TheWreck of a Transport Ship” Van Gogh’s “The Starry Night,”
  • 12.
    Data Science is… •…the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results.
  • 13.
    Where is allthe data coming from?
  • 14.
  • 15.
  • 16.
    The Sexiest JobCartoon Marion van de Wiel, DSC/e Workshop with industry in 2014.
  • 17.
  • 18.
    The Roles 18 Increasing potential to support business decisions Customer /End User Business Analyst Data Scientists Data Engineer / DBA Decision Making Data Presentation Visualization Techniques Modelling and Algorithms Machine learning, and statistical models Data Exploration Statistical analysis, data visualization… Data Preprocessing / Integration Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems Missing values, Duplicate values…
  • 19.
    Data Science SkillSet Data Scientist: The Engineer of the Future Technical Skills Business Skills Social Skills
  • 20.
    Data Science Process ProcessDataFrame the Problem Collect Raw Data Explore Data Perform in-depth analysis
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 27.
    Web Analytics • Webanalytics is the measurement, collection, analysis and reporting of web data for purposes of understanding and optimizing web usage. • Used to measure metrics / key performance indicators such as • Hit • Page View Event • Visitor • Impression • Bounce Rate • Exit Rate • Session Duration • Click path Web Analytics Software • Google Analytics • Yandex Metrica • Count.ly (Turkey origin) • Rakam.io (Turkey origin)
  • 28.
  • 29.
    Collaborative Filtering • Collaborativefiltering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating).
  • 30.
  • 31.
  • 32.
  • 33.
    Health Care Analytics •Health care analytics is a term used to describe the healthcare analysis activities that can be undertaken as a result of data collected in healthcare services, namely • clinical data (electronic medical records), • patient behavior and sentiment data. • pharmaceutical and research and development data, • claims and cost data,
  • 34.
    Health Care Analytics •Electronic Health Records (EHRs) • Infrastructure and use cases to store, retrieve, and share EHRs securely. • Real-time Alerting • Clinical Decision Support via wearables • Predictive Analytics in Healthcare • Increase the accuracy of diagnoses, preventive medicine and public health, detection of risk of diabetes, etc. • Telemedicine • Delivery of remote clinical services such as remote patient monitoring, initial diagnosis • Telesurgery with the use of robots, etc. http://www.datapine.com/blog/big-data-examples-in-healthcare/
  • 35.
    Predictive Maintenance • ReactivePoint Processes: A New Approach to Predicting Power Failures in Underground Electrical Systems. Seyda Ertekin, Cynthia Rudin, Tyler McCormick. Annals of Applied Statistics,2015. • A new statistical model designed for predicting discrete events (e.g. fires, explosions & power failures) in time based on the past history.
  • 37.
  • 38.
    Resources and Datasets •Kaggle Competitions http://kaggle.com • UCI Machine Learning Repository http://archive.ics.uci.edu/ml/ • Kdnuggets http://www.kdnuggets.com/datasets/ • DataQuest https://www.dataquest.io/ • Massive Open Online Courses • Coursera, edX, etc. • …
  • 39.
    Final Words • Datascience is the art of turning data into actions. • As data increases data scientist will be a rare resource.
  • 40.
  • 41.