An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions

The 5th International Joint conference on Rules and Reasoning Sept. 8–15, 2021 (virtually in) Leuven, Belgium An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions Francesco Cauteruccio and Giorgio Terracina DEMACS, University of Calabria, Italy {cauteruccio, terracina}@mat.unical.it

Outline • Context and Motivation • Proposed Framework • ASP Approach • Experimental Evaluation • Conclusion 1/17

Context and Motivation • Pattern Mining is one of the most studied data mining branches • Find interesting patterns (set of items) in a database of transactions • Frequent pattern mining, sequential pattern mining, etc… • High-Utility Pattern Mining (HUPM) • Find patterns having a high-utility (w.r.t. some utility measure) • Example: in a sales database, the utility of a pattern may be represented by the profit of some items bought together. • Basic assumption: each item is associated with one, static utility . • However… • The utility of an item can be defined from very different point of views, • Transactions are not only flat lists of items but they can provide different level of abstractions. Pattern Mining and High-Utility Pattern Mining 2/17

Context and Motivation • We present a framework for HUPM extending basic notions and introducing: • A higher level of abstraction with transaction set representation • For each transaction, an Object, a Container and a Database level of aggregation can be defined. • The notion of facet • A facet is an attribute which can be associated with an item, a transaction, an object or a container; • Each element may be characterized by more than one facet. • A taxonomy of extended utility functions • Based on the database structure and facets, • They can be combined in several ways to fit different notions of utility. An extended framework for HUPM 3/17

Proposed Framework • The database representation is multi-layer • 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒 → 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑟 → 𝑂𝑏𝑗𝑒𝑐𝑡 → 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 • Given a database 𝐷 and a set of transactions {𝑇!, … , 𝑇"} • 𝐷 is organized as a set of containers 𝐶 = {𝐶!, … , 𝐶"} • 𝐶# can be associated with a set of objects 𝑂 = {𝑂!, … , 𝑂$} • 𝑂% contains a set of transactions {𝑇!, … , 𝑇&} Extending the database structure Running example depicting a sales database 5/17

Proposed Framework • The utility of 𝑖 may be defined from different perspectives: facets. • Each item 𝑖 can be associated with one or more facets. • Facets can also be defined for transactions, objects and containers. • To describe facets, we use the notion of utility vector. Facets 6/17

Proposed Framework • Item utility vector for an item 𝑖 • 𝐼𝑈! = [𝑖𝑢", 𝑖𝑢#, … , 𝑖𝑢$], and 𝑖𝑢% describes a certain facet of 𝑖 • Transaction utility vector for a transaction 𝑇# • 𝑇𝑈&! = [𝑡𝑢", 𝑡𝑢#, … , 𝑡𝑢'], and 𝑡𝑢% describes a certain facet of 𝑇( • Internal utility of 𝑖 in 𝑇( is available and denoted as 𝑞(𝑖, 𝑇() • Object utility vector • 𝑂𝑈) = 𝑜𝑢", 𝑜𝑢#, … , 𝑜𝑢* , and 𝑜𝑢% describes a certain facet of 𝑂 • Container utility vector • 𝐶𝑈+ = [𝑐𝑢", 𝑐𝑢#, … , 𝑐𝑢,], and 𝑐𝑢% describes a certain facet of 𝐶 Facets and utility vectors facets: price, weight 7/17

Proposed Framework • Intra-pattern utility function • Let 𝑃 be a pattern with 𝑟 items, and 𝑇( be a transaction containing it, • Let 𝐼𝑈𝑆 = {𝐼𝑈", … , 𝐼𝑈-} be the set of item utility vectors of 𝑃 • The intra-pattern utility function 𝐼𝑈&! = 𝑖𝑝(𝑃, 𝑇(, 𝐼𝑈𝑆) generates an unique item utility vector for the pattern occurrence. • 𝑖𝑝 ⋅ can be any function combining the utilities across the facets, such as SUM, MAX, MIN, AVG, etc… • Pattern utility function • The occurrence utility vector for a pattern 𝑃 is 𝑂𝑐𝑐𝑈&! = [𝐼𝑈&! , 𝑇𝑈&! , 𝑂𝑈&! , 𝐶𝑈&! ] • The pattern utility vector 𝑈. is the collection of all the occurrence utility vectors of 𝑃: 𝑈. = ⋃&!∈&" 𝑂𝑐𝑐𝑈&! • 𝑈. is a matrix (occurrence utility vectors × facets). Advanced utility functions In this example, 𝑖𝑝 = ∑ 𝑓𝑎𝑐𝑒𝑡 × 𝑖𝑛𝑡𝑒𝑟𝑛𝑎𝑙𝑈𝑡𝑖𝑙𝑖𝑡𝑦 8/17

Proposed Framework • The utility 𝑢 of a pattern 𝑃 can be obtained as an arbitrary combination of the values of 𝑈! using a function 𝑢(𝑃) • 𝑢(𝑃) can be classified as • Horizontal first 𝑢 𝑃 = 𝑓?(𝑓@(𝑢.)) combines by row (facets), then by column (occurrences) • Vertical first 𝑢 𝑃 = 𝑓@(𝑓?(𝑢.)) combines by column (occurrences), then by row (facets) • Mixed 𝑢 𝑃 = 𝑓 𝑢. combines the values at once • All of these classifications can be further classified in • Inter-transaction utility • Pattern-vs-object utility • Pattern-vs-container utility • These can be exploited to define utility measures relating item/transaction facets and one of the object/container facets, such as Pearson and Multiple correlation. Advanced utility functions As an example, suppose we select 𝑓# = 𝑓𝑖𝑙𝑡𝑒𝑟 ⋅ and 𝑓$ = max ⋅ , that is we first filter for a single facet, and then we take the maximum across all the occurrences. 9/17

Proposed Framework • Given a pattern 𝑃, we say that 𝑃 is an extended high-utility pattern if its utility 𝑢(𝑃) is greater than a minimum threshold 𝑡ℎ$ and it occurs in at least 𝑡ℎ% transactions. • The problem of extended high-utility pattern mining (e-HUPM) is to discover all the extended high-utility patterns in a given database 𝐷. The e–HUPM problem 10/17

ASP Approach • We want to provide as much flexibility as possible in the definition of what is a useful pattern. • Encoding the problem in Answer Set Programming (ASP) helps achieving the desidered flexibility and modularity. • Classic guess-and-check scenario • We generate one answer set for each valid pattern, • Complex utility functions are executed by means of external functions (e.g., in DLVHEX, WASP, clingo) • Pattern validity criteria and filter can be easily applied by encoding them as rules. The why and the how 11/17

ASP Approach The encoding %%% Input schema: %container(ContainerId) %object(ObjectId,ContainerId) %transaction(Tid, ObjectId) %item(Item, Tid, Position, Q) %itemUtilityVector(Item, I1, ..., Il) %transactionUtilityVector(Tid, T1, ..., Tm) %objectUtilityVector(ObjectId, O1, ..., On) %containerUtilityVector(ContainerId, C1, ..., Co) %%% Parameters occurrencesThreshold(...). utilityTreshold(...). %%% Item pre-filtering usefulItem(I):- item(I,_,_,_),....any condition on the items. %%% Candidate pattern generation {inCandidatePattern(I)}:- usefulItem(I). %%% Occurrences computation and check inTransaction(Tid):- transaction(Tid,_), not incomplete(Tid). incomplete(TiD):- transaction(Tid,_), inCandidatePattern(I), not contains(I,Tid). contains(I,Tid):- item(I,Tid,_,_). :- #count{ Tid : inTransaction(Tid)}=N, N < Tho, occurrencesThreshold(Tho). %%% Utility computation patternItemUtilityVectors(Tid,Item,I1,...,Il,Q):- inCandidatePattern(Item), itemUtilityVector(Item, I1, ..., Il), inTransaction(Tid), item(Item, Tid, Position, Q). intraPatternUtilityVector(Tid,I1,...,Il):- &computeIntraPatternUtility[patternItemUtilityVectors](Tid,I1,...,Il). occurrenceUtilityVector(Tid,I1,...,Il,T1,...Tm,O1,...On,C1,...,Co):- inTransaction(Tid), intraPatternUtilityVector(Tid,I1,...,Il) transactionUtilityVector(Tid, T1, ..., Tm), transaction(Tid, ObjectId), objectUtilityVector(ObjectId, O1, ..., On), object(ObjectId , ContainerId) containerUtilityVector(ContainerId, C1, ..., Co). :- &computeUtility[occurrenceUtilityVector](U), U < Thu, utilityTreshold(Thu). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 12/17

Experimental Evaluation • Aspect-based sentiment analysis of scientific reviews [1] • Each review is annotated with 8 different aspects: appropriataness, clarity, originality, empirical/theoretical soudness, meaningful comparison, substance, impact and recommendation • Each aspect has assigned one sentiment value from {𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒, 𝑛𝑒𝑢𝑡𝑟𝑎𝑙, 𝑎𝑏𝑠𝑒𝑛𝑡}. • 814 papers and a total of 1148 reviews. • 2230 annotated lines with a total of 15124 distinct words. Dataset [1] Chakraborty, S., Goyal, P., Mukherjee, A.: Aspect-based sentiment analysis of scientific reviews. In: JCDL ’20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, August 1-5, 2020. pp. 207–216 (2020), ACM 13/17

Experimental Evaluation • Average running time • 𝑢(𝑃): disagreement between originality and decision (% of pattern occurrences showing a positive originality and a reject decision, thresholds: [15, 50, 75, 100]) Quantitative analysis • Comparison with HUPM systems • No analysis involving more than one elements can be devised with such systems • Here, the utility of an item is the absolute value of the appropriateness aspect in the sentence it first appears in. 14/17

Experimental Evaluation • 𝑢 𝑃 : Pearson correlation between an aspect 𝑋 and the final decision on the corresponding paper. • «The set of review sentences containing 𝑃 are characterized by a direct correlation between the sentiment on 𝑋 and the final decision on the corresponding paper» • 𝑢(𝑃): Multiple correlation between two aspects 𝑋 and 𝑌 and the final decision on the corresponding paper. Qualitative analysis 15/17

Conclusion • We introduced a general framework for HUPM with several extensions. • The framework allows to work with multi-dimensional data and with different utility measures. • A versatile and modular ASP encoding has been developed. • We employed a real use case on paper review to carry out both quantitative and qualitative analyses. • Facets and advanced utility functions help reducing the amount of relevant patterns • Useful in providing deep insights on the data. • Not an ending point! • Apply the framework to new contexts, • Derive ad-hoc algorithms for the extended utility functions. 16/17

Thanks for your attention! These slides available at https://bit.ly/ehupm-ruleml21 Francesco Cauteruccio Research Fellow @ DEMACS, University of Calabria cauteruccio@mat.unical.it francescocauteruccio.info @finalfire

An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions

More Related Content

What's hot

Similar to An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions

Recently uploaded

An Answer Set Programming based framework for High-Utility Pattern Mining extended with Facets and Advanced Utility Functions