Web usage mining

WEB USAGE MINING Monu Chaudhary 071BCT522

INTRODUCTION Web Usage mining is the process of applying data mining techniques for the discovery of usage patterns from Web data, targeted towards various applications.

INTRODUCTION Data collected at different levels: ➢ Server level ➢ Client level ➢ Proxy level

INTRODUCTION Goal: ➢ analyze the behavioral patterns and profiles of users interacting with a Web site ➢ Understand and better serve the needs of Web-based applications

INTRODUCTION Classification based on Usage Data: ➢ Web server Data ➢ Application Server Data ➢ Application Level Data

INTRODUCTION Importance: ➢ Growth of e-commerce ○ Provides an a cost effective way of doing business. ➢ Hidden useful information ○ Visitors’ profile ○ Measure online marketing effort

INTRODUCTION 3 Phases: ➢ Preprocessing ➢ Pattern Discovery ➢ Pattern Analysis

PREPROCESSING Preprocessing consists of converting the: ➢ usage information ➢ content information ➢ structure information contained in the various available data sources into the data abstractions necessary for pattern discovery.

Preprocessing of Web Usage Mining

Preprocessing of Web Usage Mining Data Cleaning remove irrelevant references and fields in server logs, removes erroneous references and adds missing references due to caching.

Preprocessing of Web Usage Mining Sessionization: the activities performed by a user from the moment she enters the site until the moment she leaves it.

Preprocessing of Web Usage Mining User Identification records multiple sessions for user. This log is called User activity record.

Preprocessing of Web Usage Mining A page view consists of every file that contributes to the display on a user's browser at one time.

Preprocessing of Web Usage Mining Conceptually, each Page view can be viewed as a collection of Web objects or resources representing a specific “user event,” e.g., reading an article, viewing a product page, or adding a product to the shopping cart.

Preprocessing of Web Usage Mining Path Completion: Client- or proxy-side caching can often result in missing access references to those pages or objects that have been cached.

Preprocessing of Web Usage Mining Path Completion: For instance, ➢ if a user returns to a page A during the same session, the second access to A will likely result in viewing the previously downloaded version of A that was cached on the client- side, and therefore, no request is made to the server.

Preprocessing of Web Usage Mining Path Completion: ➢ This results in the second reference to A not being recorded on the server logs.

Preprocessing of Web Usage Mining Episode is a subset or subsequence of a session comprised of semantically or functionally related page views.

PATTERN DISCOVERY Pattern discovery draws upon methods and algorithms developed from several fields such as statistics, data mining, machine learning and pattern recognition.

PATTERN DISCOVERY Methods: ➢ Statistical Analysis ➢ Association Rules ➢ Clustering ➢ Classification ➢ Sequential Patterns

PATTERN ANALYSIS The motivation behind pattern analysis is to filter out uninteresting rules or patterns from the set found in the pattern discovery phase.

PATTERN ANALYSIS Methods: ➢ A knowledge query mechanism such as SQL. ➢ Another method is to load usage data into a data cube in order to perform Online Analytical Processing (OLAP) operations.

PATTERN ANALYSIS Methods: ➢ Visualization techniques, such as graphing patterns or assigning colors to different values. ➢ content and structure information can be used to filter out patterns containing pages of a certain usage type, content type, or pages that match a certain hyperlink structure.

Application of Web Usage Mining

Advantages ➢ Personalized marketing. ➢ Fight against terrorism. ➢ Customer Relationship. ➢ Increase profitability by target pricing.

COLLABORATIVE FILTERING Subodh chandra shakya 071BCT543

What is collaborative filtering…??? Collaborative filtering is a method of making automatic predictions about the interest of a user by collecting preferences or taste information from other other users users(I.e collaborating the interest )

Application Mostly in e-commerce recommendation system Amazon Netflix

This is how it works…. 1.Weight all users with respect to similarity with active user 2. Select a subset of Users to use as a set of predictors 3. Compute a prediction from a weighted combination of selected neighbors’ ratings

Collaborative filtering types Memory Based: uses user rating data to compute similarity between users or items user rating,Neighbourhood based,Item Based etc Model Based:Uses data mining and machine learning Bayesian networks,neural embedding models,clustering models,latent semantic models such as SVD.

Approaches for CF (memory based) User-Based CF - compute similarity based on User Item-Based CF-Compute similarity base on item

User based CF Look for users who share the same rating patterns with the active user(the user whom the prediction is for) Use the ratings from those like-minded users to calculate a prediction for the active user

Item based CF 1. Build an item-item matrix determining relationships between pairs of items 1. Infer the tastes of the current user by examining the matrix and matching that user's data

Simple similarity is cosine similarity

Pearson correlation similarity

Collaborative Filtering problem Cold-start: There should be enough other users already in the system to find a match.New items need to get enough ratings Popularity Bias:Hard to recommend items to someone with unique tastes

RECOMMENDER SYSTEMS Atul Khatri 071bct509

Definition ● Estimate a utility function that automatically predicts how a user will like an item ● Based on ○ Past Behavior ○ Relations to other users ○ Item similarity ○ Context

Impact Apparent ● Advertisement ● Restaurants, cafes ● Movies, Tv shows, Music ● Books ● News articles ● Social sites including dating services

Impact(continued) Not so apparent ● Courses in E-learning ● Drug components ● Research papers ● Citations ● Code modules

Types ● Collaborative Filtering system ● Content-based system ● Hybrid recommender system ○ Context-based system ○ Knowledge-based system

Paradigms of recommender systems

Content-Based Recommender System

● System creates a user profile based on users likes or dislikes which are explicitly stated ● Every purchase updates the user profile. ● A content-based recommender system matches the profile of item to user profile to decide its relevancy to the user

Content Representation ● Structured data ○ Small number of attributes ○ Each item described by same set of attributes ○ Known set of values of attributes

Content Representation(contd...) ● Unstructured data ○ No attribute names with well defined values ○ Need to impose structure on text before use ○ Natural language complexity ■ Same word with different meaning ■ Different word with same meaning

Context-Based Recommender Systems

● System uses additional data about context of an item consumption. ● Example: Additional component of time may be used to recommend restaurants to consumers i.e different restaurants for breakfast, lunch and so on. Further, information about whether you are going out to eat with your friends or family should also vary the recommendation.

Major obstacles for contextual computing ● Obtain sufficient and reliable data describing user context ● Understand the impact of contextual dimensions on personalisation process ● Computational model of contextual dimensions in more classical recommendation technology ● For instance: How to extend Collaborative filtering to include contextual dimensions?

Collective Intelligence Sagun Nakarmi 071bct533

● A shared or group intelligence that emerges from the collaboration and competition of many individuals. ● Groups of people and computers, connected by the Internet, collectively doing intelligent things.

It can be understood as an emergent property from the synergies among: 1) Data - knowledge-information 2) Software-hardware 3) Experts

For instance, Google technology harvests knowledge generated by millions of people creating and linking web pages and then uses this knowledge to answer queries in ways that often seem amazingly intelligent.

In Wikipedia, thousands of people around the world have collectively created a very large and high quality intellectual product with almost no centralized control, and almost all as volunteers!

Online multi-player games are another example of collective intelligence. Games such as Dota 2, Second Life and Call of Duty rely on gamers coming together as a community to form the game’s Identity.

Other examples: ● social networking ( perhaps the most popular of collective intelligence.) ● Amazon, Hamrobazaar & other ecommerce sites ● etc

Web usage mining

More Related Content

What's hot

Similar to Web usage mining

Recently uploaded

In this document

Web usage mining