WEB USAGE MINING Monu Chaudhary 071BCT522
INTRODUCTION Web Usage mining is the process of applying data mining techniques for the discovery of usage patterns from Web data, targeted towards various applications.
INTRODUCTION Data collected at different levels: ➢ Server level ➢ Client level ➢ Proxy level
INTRODUCTION Goal: ➢ analyze the behavioral patterns and profiles of users interacting with a Web site ➢ Understand and better serve the needs of Web-based applications
INTRODUCTION Classification based on Usage Data: ➢ Web server Data ➢ Application Server Data ➢ Application Level Data
INTRODUCTION Importance: ➢ Growth of e-commerce ○ Provides an a cost effective way of doing business. ➢ Hidden useful information ○ Visitors’ profile ○ Measure online marketing effort
INTRODUCTION 3 Phases: ➢ Preprocessing ➢ Pattern Discovery ➢ Pattern Analysis
PREPROCESSING Preprocessing consists of converting the: ➢ usage information ➢ content information ➢ structure information contained in the various available data sources into the data abstractions necessary for pattern discovery.
Web Usage Mining Process
Web Usage Mining Process
Preprocessing of Web Usage Mining
Preprocessing of Web Usage Mining Data Cleaning remove irrelevant references and fields in server logs, removes erroneous references and adds missing references due to caching.
Preprocessing of Web Usage Mining Sessionization: the activities performed by a user from the moment she enters the site until the moment she leaves it.
Sessionization
Preprocessing of Web Usage Mining User Identification records multiple sessions for user. This log is called User activity record.
User Identification
Preprocessing of Web Usage Mining A page view consists of every file that contributes to the display on a user's browser at one time.
Preprocessing of Web Usage Mining Conceptually, each Page view can be viewed as a collection of Web objects or resources representing a specific “user event,” e.g., reading an article, viewing a product page, or adding a product to the shopping cart.
Preprocessing of Web Usage Mining Path Completion: Client- or proxy-side caching can often result in missing access references to those pages or objects that have been cached.
Preprocessing of Web Usage Mining Path Completion: For instance, ➢ if a user returns to a page A during the same session, the second access to A will likely result in viewing the previously downloaded version of A that was cached on the client- side, and therefore, no request is made to the server.
Preprocessing of Web Usage Mining Path Completion: ➢ This results in the second reference to A not being recorded on the server logs.
Path Completion
Preprocessing of Web Usage Mining Episode is a subset or subsequence of a session comprised of semantically or functionally related page views.
PATTERN DISCOVERY Pattern discovery draws upon methods and algorithms developed from several fields such as statistics, data mining, machine learning and pattern recognition.
PATTERN DISCOVERY Methods: ➢ Statistical Analysis ➢ Association Rules ➢ Clustering ➢ Classification ➢ Sequential Patterns
PATTERN ANALYSIS The motivation behind pattern analysis is to filter out uninteresting rules or patterns from the set found in the pattern discovery phase.
PATTERN ANALYSIS Methods: ➢ A knowledge query mechanism such as SQL. ➢ Another method is to load usage data into a data cube in order to perform Online Analytical Processing (OLAP) operations.
PATTERN ANALYSIS Methods: ➢ Visualization techniques, such as graphing patterns or assigning colors to different values. ➢ content and structure information can be used to filter out patterns containing pages of a certain usage type, content type, or pages that match a certain hyperlink structure.
Application of Web Usage Mining
Advantages ➢ Personalized marketing. ➢ Fight against terrorism. ➢ Customer Relationship. ➢ Increase profitability by target pricing.
COLLABORATIVE FILTERING Subodh chandra shakya 071BCT543
What is collaborative filtering…??? Collaborative filtering is a method of making automatic predictions about the interest of a user by collecting preferences or taste information from other other users users(I.e collaborating the interest )
Application Mostly in e-commerce recommendation system Amazon Netflix
This is how it works…. 1.Weight all users with respect to similarity with active user 2. Select a subset of Users to use as a set of predictors 3. Compute a prediction from a weighted combination of selected neighbors’ ratings
Collaborative filtering types Memory Based: uses user rating data to compute similarity between users or items user rating,Neighbourhood based,Item Based etc Model Based:Uses data mining and machine learning Bayesian networks,neural embedding models,clustering models,latent semantic models such as SVD.
Approaches for CF (memory based) User-Based CF - compute similarity based on User Item-Based CF-Compute similarity base on item
User based CF Look for users who share the same rating patterns with the active user(the user whom the prediction is for) Use the ratings from those like-minded users to calculate a prediction for the active user
Item based CF 1. Build an item-item matrix determining relationships between pairs of items 1. Infer the tastes of the current user by examining the matrix and matching that user's data
Simple similarity is cosine similarity
Pearson correlation similarity
Collaborative Filtering problem Cold-start: There should be enough other users already in the system to find a match.New items need to get enough ratings Popularity Bias:Hard to recommend items to someone with unique tastes
RECOMMENDER SYSTEMS Atul Khatri 071bct509
Definition ● Estimate a utility function that automatically predicts how a user will like an item ● Based on ○ Past Behavior ○ Relations to other users ○ Item similarity ○ Context
Impact Apparent ● Advertisement ● Restaurants, cafes ● Movies, Tv shows, Music ● Books ● News articles ● Social sites including dating services
Impact(continued) Not so apparent ● Courses in E-learning ● Drug components ● Research papers ● Citations ● Code modules
Architecture
Types ● Collaborative Filtering system ● Content-based system ● Hybrid recommender system ○ Context-based system ○ Knowledge-based system
Paradigms of recommender systems
Content-Based Recommender System
● System creates a user profile based on users likes or dislikes which are explicitly stated ● Every purchase updates the user profile. ● A content-based recommender system matches the profile of item to user profile to decide its relevancy to the user
Storage of items in database
Content Representation ● Structured data ○ Small number of attributes ○ Each item described by same set of attributes ○ Known set of values of attributes
Content Representation(contd...) ● Unstructured data ○ No attribute names with well defined values ○ Need to impose structure on text before use ○ Natural language complexity ■ Same word with different meaning ■ Different word with same meaning
Context-Based Recommender Systems
● System uses additional data about context of an item consumption. ● Example: Additional component of time may be used to recommend restaurants to consumers i.e different restaurants for breakfast, lunch and so on. Further, information about whether you are going out to eat with your friends or family should also vary the recommendation.
Major obstacles for contextual computing ● Obtain sufficient and reliable data describing user context ● Understand the impact of contextual dimensions on personalisation process ● Computational model of contextual dimensions in more classical recommendation technology ● For instance: How to extend Collaborative filtering to include contextual dimensions?
Collective Intelligence Sagun Nakarmi 071bct533
● A shared or group intelligence that emerges from the collaboration and competition of many individuals. ● Groups of people and computers, connected by the Internet, collectively doing intelligent things.
It can be understood as an emergent property from the synergies among: 1) Data - knowledge-information 2) Software-hardware 3) Experts
For instance, Google technology harvests knowledge generated by millions of people creating and linking web pages and then uses this knowledge to answer queries in ways that often seem amazingly intelligent.
In Wikipedia, thousands of people around the world have collectively created a very large and high quality intellectual product with almost no centralized control, and almost all as volunteers!
Online multi-player games are another example of collective intelligence. Games such as Dota 2, Second Life and Call of Duty rely on gamers coming together as a community to form the game’s Identity.
Other examples: ● social networking ( perhaps the most popular of collective intelligence.) ● Amazon, Hamrobazaar & other ecommerce sites ● etc
THANK YOU FOR YOUR PATIENCE!!

Web usage mining

  • 1.
    WEB USAGE MINING MonuChaudhary 071BCT522
  • 2.
    INTRODUCTION Web Usage miningis the process of applying data mining techniques for the discovery of usage patterns from Web data, targeted towards various applications.
  • 3.
    INTRODUCTION Data collected atdifferent levels: ➢ Server level ➢ Client level ➢ Proxy level
  • 4.
    INTRODUCTION Goal: ➢ analyze thebehavioral patterns and profiles of users interacting with a Web site ➢ Understand and better serve the needs of Web-based applications
  • 5.
    INTRODUCTION Classification based onUsage Data: ➢ Web server Data ➢ Application Server Data ➢ Application Level Data
  • 6.
    INTRODUCTION Importance: ➢ Growth ofe-commerce ○ Provides an a cost effective way of doing business. ➢ Hidden useful information ○ Visitors’ profile ○ Measure online marketing effort
  • 7.
    INTRODUCTION 3 Phases: ➢ Preprocessing ➢Pattern Discovery ➢ Pattern Analysis
  • 8.
    PREPROCESSING Preprocessing consists ofconverting the: ➢ usage information ➢ content information ➢ structure information contained in the various available data sources into the data abstractions necessary for pattern discovery.
  • 9.
  • 10.
  • 12.
  • 13.
    Preprocessing of WebUsage Mining Data Cleaning remove irrelevant references and fields in server logs, removes erroneous references and adds missing references due to caching.
  • 14.
    Preprocessing of WebUsage Mining Sessionization: the activities performed by a user from the moment she enters the site until the moment she leaves it.
  • 15.
  • 16.
    Preprocessing of WebUsage Mining User Identification records multiple sessions for user. This log is called User activity record.
  • 17.
  • 18.
    Preprocessing of WebUsage Mining A page view consists of every file that contributes to the display on a user's browser at one time.
  • 19.
    Preprocessing of WebUsage Mining Conceptually, each Page view can be viewed as a collection of Web objects or resources representing a specific “user event,” e.g., reading an article, viewing a product page, or adding a product to the shopping cart.
  • 20.
    Preprocessing of WebUsage Mining Path Completion: Client- or proxy-side caching can often result in missing access references to those pages or objects that have been cached.
  • 21.
    Preprocessing of WebUsage Mining Path Completion: For instance, ➢ if a user returns to a page A during the same session, the second access to A will likely result in viewing the previously downloaded version of A that was cached on the client- side, and therefore, no request is made to the server.
  • 22.
    Preprocessing of WebUsage Mining Path Completion: ➢ This results in the second reference to A not being recorded on the server logs.
  • 23.
  • 24.
    Preprocessing of WebUsage Mining Episode is a subset or subsequence of a session comprised of semantically or functionally related page views.
  • 25.
    PATTERN DISCOVERY Pattern discoverydraws upon methods and algorithms developed from several fields such as statistics, data mining, machine learning and pattern recognition.
  • 26.
    PATTERN DISCOVERY Methods: ➢ StatisticalAnalysis ➢ Association Rules ➢ Clustering ➢ Classification ➢ Sequential Patterns
  • 27.
    PATTERN ANALYSIS The motivationbehind pattern analysis is to filter out uninteresting rules or patterns from the set found in the pattern discovery phase.
  • 28.
    PATTERN ANALYSIS Methods: ➢ Aknowledge query mechanism such as SQL. ➢ Another method is to load usage data into a data cube in order to perform Online Analytical Processing (OLAP) operations.
  • 29.
    PATTERN ANALYSIS Methods: ➢ Visualizationtechniques, such as graphing patterns or assigning colors to different values. ➢ content and structure information can be used to filter out patterns containing pages of a certain usage type, content type, or pages that match a certain hyperlink structure.
  • 30.
    Application of WebUsage Mining
  • 31.
    Advantages ➢ Personalized marketing. ➢Fight against terrorism. ➢ Customer Relationship. ➢ Increase profitability by target pricing.
  • 32.
  • 33.
    What is collaborativefiltering…??? Collaborative filtering is a method of making automatic predictions about the interest of a user by collecting preferences or taste information from other other users users(I.e collaborating the interest )
  • 34.
    Application Mostly in e-commercerecommendation system Amazon Netflix
  • 35.
    This is howit works…. 1.Weight all users with respect to similarity with active user 2. Select a subset of Users to use as a set of predictors 3. Compute a prediction from a weighted combination of selected neighbors’ ratings
  • 36.
    Collaborative filtering types MemoryBased: uses user rating data to compute similarity between users or items user rating,Neighbourhood based,Item Based etc Model Based:Uses data mining and machine learning Bayesian networks,neural embedding models,clustering models,latent semantic models such as SVD.
  • 37.
    Approaches for CF(memory based) User-Based CF - compute similarity based on User Item-Based CF-Compute similarity base on item
  • 38.
    User based CF Lookfor users who share the same rating patterns with the active user(the user whom the prediction is for) Use the ratings from those like-minded users to calculate a prediction for the active user
  • 40.
    Item based CF 1.Build an item-item matrix determining relationships between pairs of items 1. Infer the tastes of the current user by examining the matrix and matching that user's data
  • 44.
    Simple similarity iscosine similarity
  • 45.
  • 46.
    Collaborative Filtering problem Cold-start:There should be enough other users already in the system to find a match.New items need to get enough ratings Popularity Bias:Hard to recommend items to someone with unique tastes
  • 47.
  • 48.
    Definition ● Estimate autility function that automatically predicts how a user will like an item ● Based on ○ Past Behavior ○ Relations to other users ○ Item similarity ○ Context
  • 49.
    Impact Apparent ● Advertisement ● Restaurants,cafes ● Movies, Tv shows, Music ● Books ● News articles ● Social sites including dating services
  • 50.
    Impact(continued) Not so apparent ●Courses in E-learning ● Drug components ● Research papers ● Citations ● Code modules
  • 51.
  • 53.
    Types ● Collaborative Filteringsystem ● Content-based system ● Hybrid recommender system ○ Context-based system ○ Knowledge-based system
  • 54.
  • 58.
  • 59.
    ● System createsa user profile based on users likes or dislikes which are explicitly stated ● Every purchase updates the user profile. ● A content-based recommender system matches the profile of item to user profile to decide its relevancy to the user
  • 61.
    Storage of itemsin database
  • 63.
    Content Representation ● Structureddata ○ Small number of attributes ○ Each item described by same set of attributes ○ Known set of values of attributes
  • 64.
    Content Representation(contd...) ● Unstructured data ○No attribute names with well defined values ○ Need to impose structure on text before use ○ Natural language complexity ■ Same word with different meaning ■ Different word with same meaning
  • 65.
  • 66.
    ● System usesadditional data about context of an item consumption. ● Example: Additional component of time may be used to recommend restaurants to consumers i.e different restaurants for breakfast, lunch and so on. Further, information about whether you are going out to eat with your friends or family should also vary the recommendation.
  • 67.
    Major obstacles forcontextual computing ● Obtain sufficient and reliable data describing user context ● Understand the impact of contextual dimensions on personalisation process ● Computational model of contextual dimensions in more classical recommendation technology ● For instance: How to extend Collaborative filtering to include contextual dimensions?
  • 68.
  • 69.
    ● A sharedor group intelligence that emerges from the collaboration and competition of many individuals. ● Groups of people and computers, connected by the Internet, collectively doing intelligent things.
  • 71.
    It can beunderstood as an emergent property from the synergies among: 1) Data - knowledge-information 2) Software-hardware 3) Experts
  • 72.
    For instance, Google technologyharvests knowledge generated by millions of people creating and linking web pages and then uses this knowledge to answer queries in ways that often seem amazingly intelligent.
  • 73.
    In Wikipedia, thousandsof people around the world have collectively created a very large and high quality intellectual product with almost no centralized control, and almost all as volunteers!
  • 74.
    Online multi-player gamesare another example of collective intelligence. Games such as Dota 2, Second Life and Call of Duty rely on gamers coming together as a community to form the game’s Identity.
  • 75.
    Other examples: ● socialnetworking ( perhaps the most popular of collective intelligence.) ● Amazon, Hamrobazaar & other ecommerce sites ● etc
  • 76.
    THANK YOU FORYOUR PATIENCE!!