CSC1202 Fundamentals of Data Science June 2025 Lecturer: Ms Jayashiry Morgan
Introduction to Data Science 01
Topics Covered 01 02 Big Data Analytics Data Science Definition 03 Required Skills of Data Scientists
Learning Outcomes Define data science and differentiate it from big data analytics. Identify the key skills and interdisciplinary nature required for a data science professional. Explain the relationship between big data and the field of data science. CSC 1202 Fundamentals of Data Science Chapter 1: Introduction to Data Science
Topics Covered 01 02 Big Data Analytics Data Science Definition 03 Required Skills of Data Scientists
Data Deluge • Hospital patient registries • Electronic point-of-sale data • Stock trades • Telephone calls • Website hits • Catalog orders • Bank transactions • Remote sensing images • Tax returns • Airline reservations • Credit card charges • Web comments • Sensor data
Consequences of the Data Deluge Proactively defining a data collection protocol results in more useful information. This leads to more useful analytics. Proactively analytical companies compete more effectively. Every problem generates data eventually. Every company needs analytics eventually. Proactively analytical people are more marketable and more successful in their work. Everyone needs analytics eventually.
What is Big Data? Big Data refers to an extremely large and diverse collections of structured, unstructured, and semi- structured data that continues to grow exponentially over time. These datasets are so huge and complex in volume, velocity, and variety, that traditional data management systems cannot store, process, and analyze them.
George Dyson "Big data is what happened when the cost of storing information became less than the cost of making the decision to throw it away” Science Historian and TED Speaker
Big Data: What is it? The SAS definition of big data : The point at which the volume, velocity, and variety of data exceed an organization’s storage or computation capacity for accurate and timely decision making.
Some factors associated with big data: Data Volume Data Complexity Data Velocity Data Variety Data Variability
Data Volume Data volumes are increasing due to use of the following: • social media (Facebook, Twitter, Instagram) • machines talking to machines • improvements in the manufacturing process (quality control) • automated tracking devices • streaming data feeds 01
Data Velocity • business processes that are more automated • mergers and acquisitions • more use of social media • more use of self service applications • integration of business applications 02
Data Variety • Structured data • Unstructured data (ex: business apps, emails, digital images, articles, blogs, video and audio clips,…) • Streaming data (ex: stock ticker data, RFID tag data, sensor data,…) 03
Data Variability • The flow of data changes over time (seasonality, peak response, social media trends,…). • Data values change over time. How much history do you keep? • Data values are different across data sources. • Data is stored in different formats. • Data standards change across time. What was “valid” five years ago might not be “valid” today. 04
Data Complexity Data comes from a variety of systems in a variety of formats. This can make it difficult to merge, cleanse, and transform data in a uniform manner. 05
Reasons for the Big Data Explosion • increasing “data velocity” due to the following: • streaming data feeds • point of sale (POS) transactional systems • radio frequency identification (RFID) tags • smart metering • bigger and cheaper data storage capabilities • social media • improved and automated business processes • mergers and acquisitions, leading to the merge of multiple data sources • more online self service applications being used
Factors Driving Demand for Big Data Solutions In addition to rapidly increasing data growth rates, consider these factors: • availability of data from social media sources • In-memory technology • demand for mobile business intelligence • increasing requirements around real-time reporting • desire to mine data from social media sources (sentiment analysis)
Topics Covered 01 02 Big Data Analytics Data Science Definition 03 Required Skills of Data Scientists
Data Science Venn Diagram Data Science is a combination of: ● Computer skills ● Mathematical knowledge ● Domain knowledge in the particular field Conway (2010) emphasizes the need to learn a lot!! (Conway,
What is Data Science? Drawing Useful Conclusions: Data science aims to derive meaningful insights from large and varied datasets.
Data Science: A Definition According to SAS "Data Science can be thought of as a multidisciplinary field that combines skills in software engineering and statistics with domain experience to support the end- to-end analysis of large and diverse data sets, ultimately uncovering value for an organization and then communicating that value to stakeholders as actionable results."
Data Science: A Definition According to SAS Support the end- to end analysis of large and diverse data sets Domain experience Advanced analytics Software engineering Communicatio n to stakeholders as actionable results value
What is Data Science? Key Activities Making informed guesses about unknown values, primarily through machine learning and optimization. Finding patterns in information, often using visualizations and descriptive statistics. Quantifying certainty about patterns and prediction accuracy, using statistical tests and models. Exploration Prediction Inference
Central Components Essential for applying analysis techniques to diverse, large-scale data (numbers, text, images, video, sensor readings). Crucial for drawing robust conclusions from incomplete information. Computing Statistics
Levels of Analytics
Analytic Methods helps you understand what happened, or diagnostic models that help you understand key relationships and determine why something happened the use of data, statistical algorithms, and machine learning techniques to identify the likelihood of future outcomes based on historical data what to do by providing information about optimal decisions based on the predicted future scenarios Descriptive Model Predictive Model Prescriptive Model Types Techniques Classification --> predict class membership Regression --> predict a number decision trees | linear/logistic regression neural networks gradient boosting | random forests support vector machines
Glossary of Terms Statistics Data Analysis Predictive Analysis Artificial Intelligence Prescriptive Analysis Data Mining Machine Learning Optimization Natural Language Processing (NLP) Computer Vision Deep Learning
Glossary of Terms • Statistics – Numeric study of data relationships • Data Analysis – Find meaningful patterns and knowledge in data • Data Mining – In data, understand what is relevant, assess outcomes, accelerate informed decisions • Machine Learning – Trains a machine how to learn with minimal human intervention • Artificial Intelligence – Machines learn from experience adjust to new inputs and perform human-like tasks • Predictive Analysis – Identify the likelihood of future outcomes based on historical data • Prescriptive Analysis – Providing information about optimal decisions based on the predicted future scenarios • Optimization – Delivers the best results given resource constraints • Natural Language Processing – Enables understanding, interaction, and communication between humans and machines • Computer Vision – Analyzes/interprets a picture or video • Deep Learning – Trains a machine to perform human-like tasks
Chandana Gopal, IDC, December 2017 “Analytics is core to success in the digital economy. Data and analytics driven organizations will thrive.”
Organizations That Are Using SAS AI and Analytics Solutions WildTrack Data for Good Rogers Telecom Amsterda m UMC Health Care Daiwa Financial Honda Manufacturing 90% Accuracy for ID of wildlife using tracks 53% Fewer customer complaints Improved liver and brain tumor diagnosis with AI and analytics 2.7x increase in client purchase rates Continuous learning and insight from clients to improve design and quality
Topics Covered 01 02 Big Data Analytics Data Science Definition 03 Required Skills of Data Scientists
Who is a Data Scientist? Data scientists are a new breed of analytical data expert who have the technical skills to solve complex problems and the curiosity to explore what problems need to be solved. They are part mathematician , part computer scientist , part trend spotter . They are a sign of the times . Their popularity reflects how businesses now think about big data.
Who is a Data Scientist? That unwieldy mass of unstructured information can no longer be ignored and forgotten. It is a virtual gold mine that helps boost revenue as long as there is someone who digs in and unearths business insights that no one thought to look for before. Enter the data scientist
Typical Job Responsibilities for a Data Scientist • collect large amounts of unruly data and transform it into a more usable format • solve business related problems using data driven techniques • work with a variety of programming languages (for example, SAS, R, and Python) • have a solid grasp of statistics, such as statistical tests and distributions • stay on top of analytical techniques such as social network analysis, text analytics, and new methodologies for predictive modeling • communicate and collaborate with both IT and business • look for order and patterns in data
BUT… • There just are not enough data scientists in the workforce. • it is important to realize one data scientist might not have all the • necessary skills. • it is important to develop a team of data scientists that are “scattered across the business.” • There is a rise of easier to use analytics tools. • Analytics is so important to society that it cannot be something that is only the domain of experts. • So, companies rely on Citizen Data Scientists (Gartner research director, Alexander Linden, April 2015)
How to find Citizen Data Scientists? In most organizations, they’re already there, working in many different roles and departments throughout the organization. They are citizen data scientists, businesspeople with the right attitude – curious, adventurous, determined – to research and improve things in your organization. The demand for citizen data scientists will increase 5x more quickly than the demand for “traditional”, highly skilled data scientists. https://www.sas.com/en_us/insights/articles/analytics/how-to-find-and-equip-citizen-data-scientists.html
Characteristics of Citizen Data Scientists  tired of looking at the same reports  want to get their hands on all the data themselves  and find new ways to get answers  willing to learn new methods and use new tools  analytically minded
Three Roles Working Together… Business Analyst domain expertise advanced analytics data science expertise Citizen Data Scientist Data Scientist … from basic discovery to data
Data Scientists Skills
Data Scientists Skills
Data Scientists Approach
Applied Data Science
References ● Conway, D. (2010). The Data Science Venn Diagram. Drewconway.com. http://drewconway.com/zia/2013/3/26/the data science venn diagram ● Van Der Velden, J. (2021). Introduction to Data Science Course Notes. SAS Institute.
Class Activity #1
Class Activity #2

Chapter 1 Introduction to Data Science (Computing)

  • 1.
    CSC1202 Fundamentals ofData Science June 2025 Lecturer: Ms Jayashiry Morgan
  • 2.
  • 3.
    Topics Covered 01 02 Big DataAnalytics Data Science Definition 03 Required Skills of Data Scientists
  • 4.
    Learning Outcomes Define data scienceand differentiate it from big data analytics. Identify the key skills and interdisciplinary nature required for a data science professional. Explain the relationship between big data and the field of data science. CSC 1202 Fundamentals of Data Science Chapter 1: Introduction to Data Science
  • 5.
    Topics Covered 01 02 Big DataAnalytics Data Science Definition 03 Required Skills of Data Scientists
  • 6.
    Data Deluge • Hospitalpatient registries • Electronic point-of-sale data • Stock trades • Telephone calls • Website hits • Catalog orders • Bank transactions • Remote sensing images • Tax returns • Airline reservations • Credit card charges • Web comments • Sensor data
  • 7.
    Consequences of theData Deluge Proactively defining a data collection protocol results in more useful information. This leads to more useful analytics. Proactively analytical companies compete more effectively. Every problem generates data eventually. Every company needs analytics eventually. Proactively analytical people are more marketable and more successful in their work. Everyone needs analytics eventually.
  • 8.
    What is BigData? Big Data refers to an extremely large and diverse collections of structured, unstructured, and semi- structured data that continues to grow exponentially over time. These datasets are so huge and complex in volume, velocity, and variety, that traditional data management systems cannot store, process, and analyze them.
  • 9.
    George Dyson "Big datais what happened when the cost of storing information became less than the cost of making the decision to throw it away” Science Historian and TED Speaker
  • 10.
    Big Data: Whatis it? The SAS definition of big data : The point at which the volume, velocity, and variety of data exceed an organization’s storage or computation capacity for accurate and timely decision making.
  • 11.
    Some factors associatedwith big data: Data Volume Data Complexity Data Velocity Data Variety Data Variability
  • 12.
    Data Volume Data volumesare increasing due to use of the following: • social media (Facebook, Twitter, Instagram) • machines talking to machines • improvements in the manufacturing process (quality control) • automated tracking devices • streaming data feeds 01
  • 13.
    Data Velocity • businessprocesses that are more automated • mergers and acquisitions • more use of social media • more use of self service applications • integration of business applications 02
  • 14.
    Data Variety • Structureddata • Unstructured data (ex: business apps, emails, digital images, articles, blogs, video and audio clips,…) • Streaming data (ex: stock ticker data, RFID tag data, sensor data,…) 03
  • 15.
    Data Variability • Theflow of data changes over time (seasonality, peak response, social media trends,…). • Data values change over time. How much history do you keep? • Data values are different across data sources. • Data is stored in different formats. • Data standards change across time. What was “valid” five years ago might not be “valid” today. 04
  • 16.
    Data Complexity Data comesfrom a variety of systems in a variety of formats. This can make it difficult to merge, cleanse, and transform data in a uniform manner. 05
  • 17.
    Reasons for theBig Data Explosion • increasing “data velocity” due to the following: • streaming data feeds • point of sale (POS) transactional systems • radio frequency identification (RFID) tags • smart metering • bigger and cheaper data storage capabilities • social media • improved and automated business processes • mergers and acquisitions, leading to the merge of multiple data sources • more online self service applications being used
  • 18.
    Factors Driving Demandfor Big Data Solutions In addition to rapidly increasing data growth rates, consider these factors: • availability of data from social media sources • In-memory technology • demand for mobile business intelligence • increasing requirements around real-time reporting • desire to mine data from social media sources (sentiment analysis)
  • 19.
    Topics Covered 01 02 Big DataAnalytics Data Science Definition 03 Required Skills of Data Scientists
  • 20.
    Data Science VennDiagram Data Science is a combination of: ● Computer skills ● Mathematical knowledge ● Domain knowledge in the particular field Conway (2010) emphasizes the need to learn a lot!! (Conway,
  • 21.
    What is DataScience? Drawing Useful Conclusions: Data science aims to derive meaningful insights from large and varied datasets.
  • 22.
    Data Science: ADefinition According to SAS "Data Science can be thought of as a multidisciplinary field that combines skills in software engineering and statistics with domain experience to support the end- to-end analysis of large and diverse data sets, ultimately uncovering value for an organization and then communicating that value to stakeholders as actionable results."
  • 23.
    Data Science: ADefinition According to SAS Support the end- to end analysis of large and diverse data sets Domain experience Advanced analytics Software engineering Communicatio n to stakeholders as actionable results value
  • 24.
    What is DataScience? Key Activities Making informed guesses about unknown values, primarily through machine learning and optimization. Finding patterns in information, often using visualizations and descriptive statistics. Quantifying certainty about patterns and prediction accuracy, using statistical tests and models. Exploration Prediction Inference
  • 25.
    Central Components Essential forapplying analysis techniques to diverse, large-scale data (numbers, text, images, video, sensor readings). Crucial for drawing robust conclusions from incomplete information. Computing Statistics
  • 26.
  • 27.
    Analytic Methods helps youunderstand what happened, or diagnostic models that help you understand key relationships and determine why something happened the use of data, statistical algorithms, and machine learning techniques to identify the likelihood of future outcomes based on historical data what to do by providing information about optimal decisions based on the predicted future scenarios Descriptive Model Predictive Model Prescriptive Model Types Techniques Classification --> predict class membership Regression --> predict a number decision trees | linear/logistic regression neural networks gradient boosting | random forests support vector machines
  • 28.
    Glossary of Terms Statistics DataAnalysis Predictive Analysis Artificial Intelligence Prescriptive Analysis Data Mining Machine Learning Optimization Natural Language Processing (NLP) Computer Vision Deep Learning
  • 29.
    Glossary of Terms •Statistics – Numeric study of data relationships • Data Analysis – Find meaningful patterns and knowledge in data • Data Mining – In data, understand what is relevant, assess outcomes, accelerate informed decisions • Machine Learning – Trains a machine how to learn with minimal human intervention • Artificial Intelligence – Machines learn from experience adjust to new inputs and perform human-like tasks • Predictive Analysis – Identify the likelihood of future outcomes based on historical data • Prescriptive Analysis – Providing information about optimal decisions based on the predicted future scenarios • Optimization – Delivers the best results given resource constraints • Natural Language Processing – Enables understanding, interaction, and communication between humans and machines • Computer Vision – Analyzes/interprets a picture or video • Deep Learning – Trains a machine to perform human-like tasks
  • 30.
    Chandana Gopal, IDC,December 2017 “Analytics is core to success in the digital economy. Data and analytics driven organizations will thrive.”
  • 31.
    Organizations That AreUsing SAS AI and Analytics Solutions WildTrack Data for Good Rogers Telecom Amsterda m UMC Health Care Daiwa Financial Honda Manufacturing 90% Accuracy for ID of wildlife using tracks 53% Fewer customer complaints Improved liver and brain tumor diagnosis with AI and analytics 2.7x increase in client purchase rates Continuous learning and insight from clients to improve design and quality
  • 32.
    Topics Covered 01 02 Big DataAnalytics Data Science Definition 03 Required Skills of Data Scientists
  • 33.
    Who is aData Scientist? Data scientists are a new breed of analytical data expert who have the technical skills to solve complex problems and the curiosity to explore what problems need to be solved. They are part mathematician , part computer scientist , part trend spotter . They are a sign of the times . Their popularity reflects how businesses now think about big data.
  • 34.
    Who is aData Scientist? That unwieldy mass of unstructured information can no longer be ignored and forgotten. It is a virtual gold mine that helps boost revenue as long as there is someone who digs in and unearths business insights that no one thought to look for before. Enter the data scientist
  • 35.
    Typical Job Responsibilitiesfor a Data Scientist • collect large amounts of unruly data and transform it into a more usable format • solve business related problems using data driven techniques • work with a variety of programming languages (for example, SAS, R, and Python) • have a solid grasp of statistics, such as statistical tests and distributions • stay on top of analytical techniques such as social network analysis, text analytics, and new methodologies for predictive modeling • communicate and collaborate with both IT and business • look for order and patterns in data
  • 36.
    BUT… • There justare not enough data scientists in the workforce. • it is important to realize one data scientist might not have all the • necessary skills. • it is important to develop a team of data scientists that are “scattered across the business.” • There is a rise of easier to use analytics tools. • Analytics is so important to society that it cannot be something that is only the domain of experts. • So, companies rely on Citizen Data Scientists (Gartner research director, Alexander Linden, April 2015)
  • 37.
    How to findCitizen Data Scientists? In most organizations, they’re already there, working in many different roles and departments throughout the organization. They are citizen data scientists, businesspeople with the right attitude – curious, adventurous, determined – to research and improve things in your organization. The demand for citizen data scientists will increase 5x more quickly than the demand for “traditional”, highly skilled data scientists. https://www.sas.com/en_us/insights/articles/analytics/how-to-find-and-equip-citizen-data-scientists.html
  • 38.
    Characteristics of CitizenData Scientists  tired of looking at the same reports  want to get their hands on all the data themselves  and find new ways to get answers  willing to learn new methods and use new tools  analytically minded
  • 39.
    Three Roles WorkingTogether… Business Analyst domain expertise advanced analytics data science expertise Citizen Data Scientist Data Scientist … from basic discovery to data
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
    References ● Conway, D.(2010). The Data Science Venn Diagram. Drewconway.com. http://drewconway.com/zia/2013/3/26/the data science venn diagram ● Van Der Velden, J. (2021). Introduction to Data Science Course Notes. SAS Institute.
  • 45.
  • 46.

Editor's Notes

  • #6 DETA DEH-LUGE  "data deluge" refers to the overwhelming amount of data being generated, which can exceed the capacity to store, manage, and analyze it effectively.
  • #7 Simple Meaning: Whenever there's an issue or a challenge, somewhere along the line, information about that problem is created. Example: If your internet keeps disconnecting (a problem), your router probably creates a log of disconnections (data). Your internet provider might log support calls (more data). You might tweet about it (even more data!). 2. Simple Meaning: No matter what a company does, at some point, they will need to look at their information to make smart decisions. Example: A shoe company needs to know which shoes are selling, which ads work, and why customers return shoes. All of this requires looking at data. 3. Simple Meaning: In today's world, whether you're a student, an employee, or a business owner, you'll eventually encounter situations where understanding data helps you. Example: As an employee, you might need to analyze project timelines, customer feedback, or sales figures. Even managing your personal budget involves a form of analytics. CONCLUSION The world is drowning in data. You can either let that data overwhelm you, or you can learn to harness it. The key message here is that being proactive with data collection and analysis (for problems, for companies, and for individuals) is crucial for success in the modern world.
  • #8 data that is so incredibly large, so fast-moving, and so varied in its forms, that traditional ways of storing and processing information just can't handle it anymore.
  • #9 We save everything now because it's cheaper and easier to keep all the information than it is to decide what to throw away.
  • #15 Data Variability simply means that this stream of data is always changing and can be very inconsistent. It's not a steady, perfectly clean flow.
  • #16 he data is very hard to understand, organize, or connect because it's so varied and interlinked
  • #23 Three Core Pillars (Inputs): Domain experience: This refers to deep knowledge of the specific industry, business, or field from which the data originates. For example, if you're doing data science for a hospital, you need medical domain experience. This helps in understanding the context of the data and formulating relevant questions. Advanced analytics: This involves using sophisticated statistical methods, machine learning algorithms, and data modeling techniques to uncover patterns, make predictions, and extract insights from data. Software engineering: This provides the technical skills to build, maintain, and scale the systems and tools needed to collect, process, and manage large and diverse datasets. It includes programming, database management, and system architecture. Central Function: These three pillars (Domain experience, Advanced analytics, Software engineering) work together to "Support the end-to-end analysis of large and diverse data sets." This means they enable the entire process from data collection and cleaning to analysis and interpretation. Crucial Step: After analysis, the next critical step is "Communication." This involves translating complex analytical findings into understandable insights for non-technical audiences. It's about telling the story the data reveals. The Outcome: Effective communication leads to delivering insights "to stakeholders as actionable results." Stakeholders are the people who will use these insights to make decisions (e.g., business leaders, managers, clients). "Actionable results" means the insights are clear, practical, and directly applicable to solving problems or seizing opportunities. The Ultimate Goal: Delivering actionable results ultimately creates "value." This value can be anything from increased revenue, reduced costs, improved efficiency, better customer satisfaction, or more informed decision-making. In essence, SAS defines Data Science as the strategic combination of business/domain knowledge, analytical techniques, and technical programming skills to analyze complex data, communicate findings clearly, and ultimately generate tangible value for an organization.