#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com Efficient Algorithms for Mining Top-K High Utility Item Sets ABSTRACT High utility item sets (HUIs) mining is an emerging topic in data mining, which refers to discovering all item sets having a utility meeting a user-specified minimum utility threshold min_util. However, setting min_util appropriately is a difficult problem for users. Generally speaking, finding an appropriate minimum utility threshold by trial and error is a tedious process for users. If min_util is set too low, too many HUIs will be generated, which may cause the mining process to be very inefficient. On the other hand, if min_util is set too high, it is likely that no HUIs will be found. In this paper, we address the above issues by proposing a new framework for top-k high utility item set mining, where k is the desired number of HUIs to be mined. Two types of efficient algorithms named TKU (mining Top-K Utility item sets) and TKO (mining Top-K utility item sets in one phase) are proposed for mining such item sets without the need to set min_util. We provide a structural comparison of the two algorithms with discussions on their advantages and limitations. Empirical evaluations on both real and synthetic datasets show that the performance of the proposed algorithms is close to that of the optimal case of state- of-the-art utility mining algorithms. EXISTING SYSTEM FREQUENT item set mining is a fundamental research topic in data mining (FIM) mining. However, the traditional FIM may discover a large amount of frequent but low-value item sets and lose the information on valuable item sets having low selling frequencies. Hence, it cannot satisfy the requirement of users who desire to discover item sets with high utilities such as high profits. To address these issues, utility mining emerges as an important topic in data mining and has received extensive attention in recent years. In utility mining, each item is associated with a utility (e.g. unit profit) and an occurrence count in each transaction (e.g. quantity). The utility of an item set represents its importance, which can be measured in terms of weight, value, quantity or other information depending on the user specification. An item set is called high utility item set (HUI) if its utility is no less than a user-specified minimum utility threshold
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com min_util. HUI mining is essential to many applications such as streaming analysis, market analysis, mobile computing and biomedicine. DISADVANTAGES: 1. Efficiently mining HUIs in databases is not an easy task because the downward closure property used in FIM does not hold for the utility of item sets. 2. In other words, pruning search space for HUI mining is difficult because a superset of a low utility item set can be high utility. PROPOSED SYSTEM The concept of transaction weighted utilization (TWU) model was introduced to facilitate the performance of the mining task. In this model, an item set is called high transaction-weighted utilization item set (HTWUI) if its TWU is no less than min_util, where the TWU of an item set represents an upper bound on its utility. Therefore, a HUI must be a HTWUI and all the HUIs must be included in the complete set of HTWUIs. A classical TWU model-based algorithm consists of two phases. In the first phase, called phase I, the complete set of HTWUIs are found. In the second phase, called phase II, all HUIs are obtained by calculating the exact utilities of HTWUIs with one database scan. ADVANTAGES: 1. Two efficient algorithms named TKU (mining Top-K Utility items ets) and TKO (mining Top-K utility item sets in one phase) are proposed for mining the complete set of top-k HUIs in databases without the need to specify the min_util threshold. 2. The construction of the UP-Tree and prune more unpromising items in transactions, the number of nodes maintained in memory could be reduced and the mining algorithm could achieve better performance. MODULES 1. High Utility Item set Mining 2. Top-k Patter n Mining 3. Top-k High Utility Patter n Mining
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com MODULE DESCRIPTION High Utility Item set Mining In recent years, high utility item set mining has received lots of attention and many efficient algorithms have been proposed, such as Two-Phase. In the first phase, they generate a set of candidates that are potential high utility item sets. In the second phase, they calculate the exact utility of each candidate found in the first phase to identify high utility item sets. Top-k Pattern Mining Many studies have been proposed to mine different kinds of top-k patterns, such as top-k frequent item sets top-k frequent closed item sets, top-k closed sequential patterns, top-k association rules, top-k sequential rules, top-k correlation patterns and top-k cosine similarity interesting pairs. Top-k High Utility Pattern Mining High utility item set used in their study is different from the one used in this work. Chan et al.’s study has considered utilities of various items, but quantitative values of items in transactions were not taken into consideration. We have defined the task of top-k high utility item set mining by considering both quantities and profits of items.
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com SOFTWARE SPECIFICATION HARDWARE REQUIREMENTS  Processor - Dual core 2.4 GHz  RAM - 1 GB  Hard Disk - 80 GB  Key Board - Standard Windows Keyboard  Monitor - 5 VGA Colour SOFTWARE REQUIREMENTS  Operating System - Windows95/98/2000/XP  Programming Language - Java

Efficient algorithms for mining top k high utility item sets

  • 1.
    #13/ 19, 1stFloor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com Efficient Algorithms for Mining Top-K High Utility Item Sets ABSTRACT High utility item sets (HUIs) mining is an emerging topic in data mining, which refers to discovering all item sets having a utility meeting a user-specified minimum utility threshold min_util. However, setting min_util appropriately is a difficult problem for users. Generally speaking, finding an appropriate minimum utility threshold by trial and error is a tedious process for users. If min_util is set too low, too many HUIs will be generated, which may cause the mining process to be very inefficient. On the other hand, if min_util is set too high, it is likely that no HUIs will be found. In this paper, we address the above issues by proposing a new framework for top-k high utility item set mining, where k is the desired number of HUIs to be mined. Two types of efficient algorithms named TKU (mining Top-K Utility item sets) and TKO (mining Top-K utility item sets in one phase) are proposed for mining such item sets without the need to set min_util. We provide a structural comparison of the two algorithms with discussions on their advantages and limitations. Empirical evaluations on both real and synthetic datasets show that the performance of the proposed algorithms is close to that of the optimal case of state- of-the-art utility mining algorithms. EXISTING SYSTEM FREQUENT item set mining is a fundamental research topic in data mining (FIM) mining. However, the traditional FIM may discover a large amount of frequent but low-value item sets and lose the information on valuable item sets having low selling frequencies. Hence, it cannot satisfy the requirement of users who desire to discover item sets with high utilities such as high profits. To address these issues, utility mining emerges as an important topic in data mining and has received extensive attention in recent years. In utility mining, each item is associated with a utility (e.g. unit profit) and an occurrence count in each transaction (e.g. quantity). The utility of an item set represents its importance, which can be measured in terms of weight, value, quantity or other information depending on the user specification. An item set is called high utility item set (HUI) if its utility is no less than a user-specified minimum utility threshold
  • 2.
    #13/ 19, 1stFloor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com min_util. HUI mining is essential to many applications such as streaming analysis, market analysis, mobile computing and biomedicine. DISADVANTAGES: 1. Efficiently mining HUIs in databases is not an easy task because the downward closure property used in FIM does not hold for the utility of item sets. 2. In other words, pruning search space for HUI mining is difficult because a superset of a low utility item set can be high utility. PROPOSED SYSTEM The concept of transaction weighted utilization (TWU) model was introduced to facilitate the performance of the mining task. In this model, an item set is called high transaction-weighted utilization item set (HTWUI) if its TWU is no less than min_util, where the TWU of an item set represents an upper bound on its utility. Therefore, a HUI must be a HTWUI and all the HUIs must be included in the complete set of HTWUIs. A classical TWU model-based algorithm consists of two phases. In the first phase, called phase I, the complete set of HTWUIs are found. In the second phase, called phase II, all HUIs are obtained by calculating the exact utilities of HTWUIs with one database scan. ADVANTAGES: 1. Two efficient algorithms named TKU (mining Top-K Utility items ets) and TKO (mining Top-K utility item sets in one phase) are proposed for mining the complete set of top-k HUIs in databases without the need to specify the min_util threshold. 2. The construction of the UP-Tree and prune more unpromising items in transactions, the number of nodes maintained in memory could be reduced and the mining algorithm could achieve better performance. MODULES 1. High Utility Item set Mining 2. Top-k Patter n Mining 3. Top-k High Utility Patter n Mining
  • 3.
    #13/ 19, 1stFloor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com MODULE DESCRIPTION High Utility Item set Mining In recent years, high utility item set mining has received lots of attention and many efficient algorithms have been proposed, such as Two-Phase. In the first phase, they generate a set of candidates that are potential high utility item sets. In the second phase, they calculate the exact utility of each candidate found in the first phase to identify high utility item sets. Top-k Pattern Mining Many studies have been proposed to mine different kinds of top-k patterns, such as top-k frequent item sets top-k frequent closed item sets, top-k closed sequential patterns, top-k association rules, top-k sequential rules, top-k correlation patterns and top-k cosine similarity interesting pairs. Top-k High Utility Pattern Mining High utility item set used in their study is different from the one used in this work. Chan et al.’s study has considered utilities of various items, but quantitative values of items in transactions were not taken into consideration. We have defined the task of top-k high utility item set mining by considering both quantities and profits of items.
  • 4.
    #13/ 19, 1stFloor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com SOFTWARE SPECIFICATION HARDWARE REQUIREMENTS  Processor - Dual core 2.4 GHz  RAM - 1 GB  Hard Disk - 80 GB  Key Board - Standard Windows Keyboard  Monitor - 5 VGA Colour SOFTWARE REQUIREMENTS  Operating System - Windows95/98/2000/XP  Programming Language - Java