The 5th International Joint conference on Rules and Reasoning Sept. 8–15, 2021 (virtually in) Leuven, Belgium An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions Francesco Cauteruccio and Giorgio Terracina DEMACS, University of Calabria, Italy {cauteruccio, terracina}@mat.unical.it
Outline • Context and Motivation • Proposed Framework • ASP Approach • Experimental Evaluation • Conclusion 1/17
Context and Motivation • Pattern Mining is one of the most studied data mining branches • Find interesting patterns (set of items) in a database of transactions • Frequent pattern mining, sequential pattern mining, etc… • High-Utility Pattern Mining (HUPM) • Find patterns having a high-utility (w.r.t. some utility measure) • Example: in a sales database, the utility of a pattern may be represented by the profit of some items bought together. • Basic assumption: each item is associated with one, static utility . • However… • The utility of an item can be defined from very different point of views, • Transactions are not only flat lists of items but they can provide different level of abstractions. Pattern Mining and High-Utility Pattern Mining 2/17
Context and Motivation • We present a framework for HUPM extending basic notions and introducing: • A higher level of abstraction with transaction set representation • For each transaction, an Object, a Container and a Database level of aggregation can be defined. • The notion of facet • A facet is an attribute which can be associated with an item, a transaction, an object or a container; • Each element may be characterized by more than one facet. • A taxonomy of extended utility functions • Based on the database structure and facets, • They can be combined in several ways to fit different notions of utility. An extended framework for HUPM 3/17
Proposed Framework • The database representation is multi-layer • 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒 → 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑟 → 𝑂𝑏𝑗𝑒𝑐𝑡 → 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 • Given a database 𝐷 and a set of transactions {𝑇!, … , 𝑇"} • 𝐷 is organized as a set of containers 𝐶 = {𝐶!, … , 𝐶"} • 𝐶# can be associated with a set of objects 𝑂 = {𝑂!, … , 𝑂$} • 𝑂% contains a set of transactions {𝑇!, … , 𝑇&} Extending the database structure Running example depicting a sales database 5/17
Proposed Framework • The utility of 𝑖 may be defined from different perspectives: facets. • Each item 𝑖 can be associated with one or more facets. • Facets can also be defined for transactions, objects and containers. • To describe facets, we use the notion of utility vector. Facets 6/17
Proposed Framework • Item utility vector for an item 𝑖 • 𝐼𝑈! = [𝑖𝑢", 𝑖𝑢#, … , 𝑖𝑢$], and 𝑖𝑢% describes a certain facet of 𝑖 • Transaction utility vector for a transaction 𝑇# • 𝑇𝑈&! = [𝑡𝑢", 𝑡𝑢#, … , 𝑡𝑢'], and 𝑡𝑢% describes a certain facet of 𝑇( • Internal utility of 𝑖 in 𝑇( is available and denoted as 𝑞(𝑖, 𝑇() • Object utility vector • 𝑂𝑈) = 𝑜𝑢", 𝑜𝑢#, … , 𝑜𝑢* , and 𝑜𝑢% describes a certain facet of 𝑂 • Container utility vector • 𝐶𝑈+ = [𝑐𝑢", 𝑐𝑢#, … , 𝑐𝑢,], and 𝑐𝑢% describes a certain facet of 𝐶 Facets and utility vectors facets: price, weight 7/17
Proposed Framework • Intra-pattern utility function • Let 𝑃 be a pattern with 𝑟 items, and 𝑇( be a transaction containing it, • Let 𝐼𝑈𝑆 = {𝐼𝑈", … , 𝐼𝑈-} be the set of item utility vectors of 𝑃 • The intra-pattern utility function 𝐼𝑈&! = 𝑖𝑝(𝑃, 𝑇(, 𝐼𝑈𝑆) generates an unique item utility vector for the pattern occurrence. • 𝑖𝑝 ⋅ can be any function combining the utilities across the facets, such as SUM, MAX, MIN, AVG, etc… • Pattern utility function • The occurrence utility vector for a pattern 𝑃 is 𝑂𝑐𝑐𝑈&! = [𝐼𝑈&! , 𝑇𝑈&! , 𝑂𝑈&! , 𝐶𝑈&! ] • The pattern utility vector 𝑈. is the collection of all the occurrence utility vectors of 𝑃: 𝑈. = ⋃&!∈&" 𝑂𝑐𝑐𝑈&! • 𝑈. is a matrix (occurrence utility vectors × facets). Advanced utility functions In this example, 𝑖𝑝 = ∑ 𝑓𝑎𝑐𝑒𝑡 × 𝑖𝑛𝑡𝑒𝑟𝑛𝑎𝑙𝑈𝑡𝑖𝑙𝑖𝑡𝑦 8/17
Proposed Framework • The utility 𝑢 of a pattern 𝑃 can be obtained as an arbitrary combination of the values of 𝑈! using a function 𝑢(𝑃) • 𝑢(𝑃) can be classified as • Horizontal first 𝑢 𝑃 = 𝑓?(𝑓@(𝑢.)) combines by row (facets), then by column (occurrences) • Vertical first 𝑢 𝑃 = 𝑓@(𝑓?(𝑢.)) combines by column (occurrences), then by row (facets) • Mixed 𝑢 𝑃 = 𝑓 𝑢. combines the values at once • All of these classifications can be further classified in • Inter-transaction utility • Pattern-vs-object utility • Pattern-vs-container utility • These can be exploited to define utility measures relating item/transaction facets and one of the object/container facets, such as Pearson and Multiple correlation. Advanced utility functions As an example, suppose we select 𝑓# = 𝑓𝑖𝑙𝑡𝑒𝑟 ⋅ and 𝑓$ = max ⋅ , that is we first filter for a single facet, and then we take the maximum across all the occurrences. 9/17
Proposed Framework • Given a pattern 𝑃, we say that 𝑃 is an extended high-utility pattern if its utility 𝑢(𝑃) is greater than a minimum threshold 𝑡ℎ$ and it occurs in at least 𝑡ℎ% transactions. • The problem of extended high-utility pattern mining (e-HUPM) is to discover all the extended high-utility patterns in a given database 𝐷. The e–HUPM problem 10/17
ASP Approach • We want to provide as much flexibility as possible in the definition of what is a useful pattern. • Encoding the problem in Answer Set Programming (ASP) helps achieving the desidered flexibility and modularity. • Classic guess-and-check scenario • We generate one answer set for each valid pattern, • Complex utility functions are executed by means of external functions (e.g., in DLVHEX, WASP, clingo) • Pattern validity criteria and filter can be easily applied by encoding them as rules. The why and the how 11/17
ASP Approach The encoding %%% Input schema: %container(ContainerId) %object(ObjectId,ContainerId) %transaction(Tid, ObjectId) %item(Item, Tid, Position, Q) %itemUtilityVector(Item, I1, ..., Il) %transactionUtilityVector(Tid, T1, ..., Tm) %objectUtilityVector(ObjectId, O1, ..., On) %containerUtilityVector(ContainerId, C1, ..., Co) %%% Parameters occurrencesThreshold(...). utilityTreshold(...). %%% Item pre-filtering usefulItem(I):- item(I,_,_,_),....any condition on the items. %%% Candidate pattern generation {inCandidatePattern(I)}:- usefulItem(I). %%% Occurrences computation and check inTransaction(Tid):- transaction(Tid,_), not incomplete(Tid). incomplete(TiD):- transaction(Tid,_), inCandidatePattern(I), not contains(I,Tid). contains(I,Tid):- item(I,Tid,_,_). :- #count{ Tid : inTransaction(Tid)}=N, N < Tho, occurrencesThreshold(Tho). %%% Utility computation patternItemUtilityVectors(Tid,Item,I1,...,Il,Q):- inCandidatePattern(Item), itemUtilityVector(Item, I1, ..., Il), inTransaction(Tid), item(Item, Tid, Position, Q). intraPatternUtilityVector(Tid,I1,...,Il):- &computeIntraPatternUtility[patternItemUtilityVectors](Tid,I1,...,Il). occurrenceUtilityVector(Tid,I1,...,Il,T1,...Tm,O1,...On,C1,...,Co):- inTransaction(Tid), intraPatternUtilityVector(Tid,I1,...,Il) transactionUtilityVector(Tid, T1, ..., Tm), transaction(Tid, ObjectId), objectUtilityVector(ObjectId, O1, ..., On), object(ObjectId , ContainerId) containerUtilityVector(ContainerId, C1, ..., Co). :- &computeUtility[occurrenceUtilityVector](U), U < Thu, utilityTreshold(Thu). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 12/17
Experimental Evaluation • Aspect-based sentiment analysis of scientific reviews [1] • Each review is annotated with 8 different aspects: appropriataness, clarity, originality, empirical/theoretical soudness, meaningful comparison, substance, impact and recommendation • Each aspect has assigned one sentiment value from {𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒, 𝑛𝑒𝑢𝑡𝑟𝑎𝑙, 𝑎𝑏𝑠𝑒𝑛𝑡}. • 814 papers and a total of 1148 reviews. • 2230 annotated lines with a total of 15124 distinct words. Dataset [1] Chakraborty, S., Goyal, P., Mukherjee, A.: Aspect-based sentiment analysis of scientific reviews. In: JCDL ’20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, August 1-5, 2020. pp. 207–216 (2020), ACM 13/17
Experimental Evaluation • Average running time • 𝑢(𝑃): disagreement between originality and decision (% of pattern occurrences showing a positive originality and a reject decision, thresholds: [15, 50, 75, 100]) Quantitative analysis • Comparison with HUPM systems • No analysis involving more than one elements can be devised with such systems • Here, the utility of an item is the absolute value of the appropriateness aspect in the sentence it first appears in. 14/17
Experimental Evaluation • 𝑢 𝑃 : Pearson correlation between an aspect 𝑋 and the final decision on the corresponding paper. • «The set of review sentences containing 𝑃 are characterized by a direct correlation between the sentiment on 𝑋 and the final decision on the corresponding paper» • 𝑢(𝑃): Multiple correlation between two aspects 𝑋 and 𝑌 and the final decision on the corresponding paper. Qualitative analysis 15/17
Conclusion • We introduced a general framework for HUPM with several extensions. • The framework allows to work with multi-dimensional data and with different utility measures. • A versatile and modular ASP encoding has been developed. • We employed a real use case on paper review to carry out both quantitative and qualitative analyses. • Facets and advanced utility functions help reducing the amount of relevant patterns • Useful in providing deep insights on the data. • Not an ending point! • Apply the framework to new contexts, • Derive ad-hoc algorithms for the extended utility functions. 16/17
Thanks for your attention! These slides available at https://bit.ly/ehupm-ruleml21 Francesco Cauteruccio Research Fellow @ DEMACS, University of Calabria cauteruccio@mat.unical.it francescocauteruccio.info @finalfire

An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions

  • 1.
    The 5th InternationalJoint conference on Rules and Reasoning Sept. 8–15, 2021 (virtually in) Leuven, Belgium An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions Francesco Cauteruccio and Giorgio Terracina DEMACS, University of Calabria, Italy {cauteruccio, terracina}@mat.unical.it
  • 2.
    Outline • Context andMotivation • Proposed Framework • ASP Approach • Experimental Evaluation • Conclusion 1/17
  • 3.
    Context and Motivation •Pattern Mining is one of the most studied data mining branches • Find interesting patterns (set of items) in a database of transactions • Frequent pattern mining, sequential pattern mining, etc… • High-Utility Pattern Mining (HUPM) • Find patterns having a high-utility (w.r.t. some utility measure) • Example: in a sales database, the utility of a pattern may be represented by the profit of some items bought together. • Basic assumption: each item is associated with one, static utility . • However… • The utility of an item can be defined from very different point of views, • Transactions are not only flat lists of items but they can provide different level of abstractions. Pattern Mining and High-Utility Pattern Mining 2/17
  • 4.
    Context and Motivation •We present a framework for HUPM extending basic notions and introducing: • A higher level of abstraction with transaction set representation • For each transaction, an Object, a Container and a Database level of aggregation can be defined. • The notion of facet • A facet is an attribute which can be associated with an item, a transaction, an object or a container; • Each element may be characterized by more than one facet. • A taxonomy of extended utility functions • Based on the database structure and facets, • They can be combined in several ways to fit different notions of utility. An extended framework for HUPM 3/17
  • 5.
    Proposed Framework • Thedatabase representation is multi-layer • 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒 → 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑟 → 𝑂𝑏𝑗𝑒𝑐𝑡 → 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 • Given a database 𝐷 and a set of transactions {𝑇!, … , 𝑇"} • 𝐷 is organized as a set of containers 𝐶 = {𝐶!, … , 𝐶"} • 𝐶# can be associated with a set of objects 𝑂 = {𝑂!, … , 𝑂$} • 𝑂% contains a set of transactions {𝑇!, … , 𝑇&} Extending the database structure Running example depicting a sales database 5/17
  • 6.
    Proposed Framework • Theutility of 𝑖 may be defined from different perspectives: facets. • Each item 𝑖 can be associated with one or more facets. • Facets can also be defined for transactions, objects and containers. • To describe facets, we use the notion of utility vector. Facets 6/17
  • 7.
    Proposed Framework • Itemutility vector for an item 𝑖 • 𝐼𝑈! = [𝑖𝑢", 𝑖𝑢#, … , 𝑖𝑢$], and 𝑖𝑢% describes a certain facet of 𝑖 • Transaction utility vector for a transaction 𝑇# • 𝑇𝑈&! = [𝑡𝑢", 𝑡𝑢#, … , 𝑡𝑢'], and 𝑡𝑢% describes a certain facet of 𝑇( • Internal utility of 𝑖 in 𝑇( is available and denoted as 𝑞(𝑖, 𝑇() • Object utility vector • 𝑂𝑈) = 𝑜𝑢", 𝑜𝑢#, … , 𝑜𝑢* , and 𝑜𝑢% describes a certain facet of 𝑂 • Container utility vector • 𝐶𝑈+ = [𝑐𝑢", 𝑐𝑢#, … , 𝑐𝑢,], and 𝑐𝑢% describes a certain facet of 𝐶 Facets and utility vectors facets: price, weight 7/17
  • 8.
    Proposed Framework • Intra-patternutility function • Let 𝑃 be a pattern with 𝑟 items, and 𝑇( be a transaction containing it, • Let 𝐼𝑈𝑆 = {𝐼𝑈", … , 𝐼𝑈-} be the set of item utility vectors of 𝑃 • The intra-pattern utility function 𝐼𝑈&! = 𝑖𝑝(𝑃, 𝑇(, 𝐼𝑈𝑆) generates an unique item utility vector for the pattern occurrence. • 𝑖𝑝 ⋅ can be any function combining the utilities across the facets, such as SUM, MAX, MIN, AVG, etc… • Pattern utility function • The occurrence utility vector for a pattern 𝑃 is 𝑂𝑐𝑐𝑈&! = [𝐼𝑈&! , 𝑇𝑈&! , 𝑂𝑈&! , 𝐶𝑈&! ] • The pattern utility vector 𝑈. is the collection of all the occurrence utility vectors of 𝑃: 𝑈. = ⋃&!∈&" 𝑂𝑐𝑐𝑈&! • 𝑈. is a matrix (occurrence utility vectors × facets). Advanced utility functions In this example, 𝑖𝑝 = ∑ 𝑓𝑎𝑐𝑒𝑡 × 𝑖𝑛𝑡𝑒𝑟𝑛𝑎𝑙𝑈𝑡𝑖𝑙𝑖𝑡𝑦 8/17
  • 9.
    Proposed Framework • Theutility 𝑢 of a pattern 𝑃 can be obtained as an arbitrary combination of the values of 𝑈! using a function 𝑢(𝑃) • 𝑢(𝑃) can be classified as • Horizontal first 𝑢 𝑃 = 𝑓?(𝑓@(𝑢.)) combines by row (facets), then by column (occurrences) • Vertical first 𝑢 𝑃 = 𝑓@(𝑓?(𝑢.)) combines by column (occurrences), then by row (facets) • Mixed 𝑢 𝑃 = 𝑓 𝑢. combines the values at once • All of these classifications can be further classified in • Inter-transaction utility • Pattern-vs-object utility • Pattern-vs-container utility • These can be exploited to define utility measures relating item/transaction facets and one of the object/container facets, such as Pearson and Multiple correlation. Advanced utility functions As an example, suppose we select 𝑓# = 𝑓𝑖𝑙𝑡𝑒𝑟 ⋅ and 𝑓$ = max ⋅ , that is we first filter for a single facet, and then we take the maximum across all the occurrences. 9/17
  • 10.
    Proposed Framework • Givena pattern 𝑃, we say that 𝑃 is an extended high-utility pattern if its utility 𝑢(𝑃) is greater than a minimum threshold 𝑡ℎ$ and it occurs in at least 𝑡ℎ% transactions. • The problem of extended high-utility pattern mining (e-HUPM) is to discover all the extended high-utility patterns in a given database 𝐷. The e–HUPM problem 10/17
  • 11.
    ASP Approach • Wewant to provide as much flexibility as possible in the definition of what is a useful pattern. • Encoding the problem in Answer Set Programming (ASP) helps achieving the desidered flexibility and modularity. • Classic guess-and-check scenario • We generate one answer set for each valid pattern, • Complex utility functions are executed by means of external functions (e.g., in DLVHEX, WASP, clingo) • Pattern validity criteria and filter can be easily applied by encoding them as rules. The why and the how 11/17
  • 12.
    ASP Approach The encoding %%%Input schema: %container(ContainerId) %object(ObjectId,ContainerId) %transaction(Tid, ObjectId) %item(Item, Tid, Position, Q) %itemUtilityVector(Item, I1, ..., Il) %transactionUtilityVector(Tid, T1, ..., Tm) %objectUtilityVector(ObjectId, O1, ..., On) %containerUtilityVector(ContainerId, C1, ..., Co) %%% Parameters occurrencesThreshold(...). utilityTreshold(...). %%% Item pre-filtering usefulItem(I):- item(I,_,_,_),....any condition on the items. %%% Candidate pattern generation {inCandidatePattern(I)}:- usefulItem(I). %%% Occurrences computation and check inTransaction(Tid):- transaction(Tid,_), not incomplete(Tid). incomplete(TiD):- transaction(Tid,_), inCandidatePattern(I), not contains(I,Tid). contains(I,Tid):- item(I,Tid,_,_). :- #count{ Tid : inTransaction(Tid)}=N, N < Tho, occurrencesThreshold(Tho). %%% Utility computation patternItemUtilityVectors(Tid,Item,I1,...,Il,Q):- inCandidatePattern(Item), itemUtilityVector(Item, I1, ..., Il), inTransaction(Tid), item(Item, Tid, Position, Q). intraPatternUtilityVector(Tid,I1,...,Il):- &computeIntraPatternUtility[patternItemUtilityVectors](Tid,I1,...,Il). occurrenceUtilityVector(Tid,I1,...,Il,T1,...Tm,O1,...On,C1,...,Co):- inTransaction(Tid), intraPatternUtilityVector(Tid,I1,...,Il) transactionUtilityVector(Tid, T1, ..., Tm), transaction(Tid, ObjectId), objectUtilityVector(ObjectId, O1, ..., On), object(ObjectId , ContainerId) containerUtilityVector(ContainerId, C1, ..., Co). :- &computeUtility[occurrenceUtilityVector](U), U < Thu, utilityTreshold(Thu). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 12/17
  • 13.
    Experimental Evaluation • Aspect-basedsentiment analysis of scientific reviews [1] • Each review is annotated with 8 different aspects: appropriataness, clarity, originality, empirical/theoretical soudness, meaningful comparison, substance, impact and recommendation • Each aspect has assigned one sentiment value from {𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒, 𝑛𝑒𝑢𝑡𝑟𝑎𝑙, 𝑎𝑏𝑠𝑒𝑛𝑡}. • 814 papers and a total of 1148 reviews. • 2230 annotated lines with a total of 15124 distinct words. Dataset [1] Chakraborty, S., Goyal, P., Mukherjee, A.: Aspect-based sentiment analysis of scientific reviews. In: JCDL ’20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, August 1-5, 2020. pp. 207–216 (2020), ACM 13/17
  • 14.
    Experimental Evaluation • Averagerunning time • 𝑢(𝑃): disagreement between originality and decision (% of pattern occurrences showing a positive originality and a reject decision, thresholds: [15, 50, 75, 100]) Quantitative analysis • Comparison with HUPM systems • No analysis involving more than one elements can be devised with such systems • Here, the utility of an item is the absolute value of the appropriateness aspect in the sentence it first appears in. 14/17
  • 15.
    Experimental Evaluation • 𝑢𝑃 : Pearson correlation between an aspect 𝑋 and the final decision on the corresponding paper. • «The set of review sentences containing 𝑃 are characterized by a direct correlation between the sentiment on 𝑋 and the final decision on the corresponding paper» • 𝑢(𝑃): Multiple correlation between two aspects 𝑋 and 𝑌 and the final decision on the corresponding paper. Qualitative analysis 15/17
  • 16.
    Conclusion • We introduceda general framework for HUPM with several extensions. • The framework allows to work with multi-dimensional data and with different utility measures. • A versatile and modular ASP encoding has been developed. • We employed a real use case on paper review to carry out both quantitative and qualitative analyses. • Facets and advanced utility functions help reducing the amount of relevant patterns • Useful in providing deep insights on the data. • Not an ending point! • Apply the framework to new contexts, • Derive ad-hoc algorithms for the extended utility functions. 16/17
  • 17.
    Thanks for yourattention! These slides available at https://bit.ly/ehupm-ruleml21 Francesco Cauteruccio Research Fellow @ DEMACS, University of Calabria cauteruccio@mat.unical.it francescocauteruccio.info @finalfire