Questions tagged [data-mining]
Using the techniques of artificial intelligence and machine learning to extract patterns from large data sets and transforming those data into a useful, organized form for future processing.
142 questions
0 votes
0 answers
64 views
Measuring logicality of programming languages?
I have a simple question of how would you measure the logicality of a programming language? EDIT: I was asked to specify the term "logicality". Hence I will try and provide a stipulation. By ...
0 votes
0 answers
43 views
Restaurant Galaxy schema
I want to make a Galaxy schema of a Restaurant. There are 2 fact tables sales and purchases. sales are related to customer and purchases to supplier of ingredients. Now my question is how can i make ...
1 vote
1 answer
173 views
What books are there to learn to implement these graph algorithms?
I saw a post on Reddit (https://www.reddit.com/r/math/comments/ci50d3/visualizing_mathematical_subjects/) that utilizes label propagation, Fruchterman-Reingold algorithm, and edge betweenness ...
0 votes
0 answers
60 views
Persona matching algorithm
I have a project to match two groups of people. Under insurance, if the initial sales agent leaves the insurer, their customers will become so-called “orphan customers”. I've given a big data set ...
1 vote
1 answer
75 views
Machine learning and test split for time series data
I have used different machine learning algorithms to predict solar panels' power output. There are ten independent features for weather data. In all models, I set time as an index and have used the ...
0 votes
0 answers
151 views
Efficient way to do self join with minimum support?
In the apriori algorithm there's the self join step, So, say we have 1 3 2 3 3 5 2 5 If I were to do an exhaustive join I'd end up with a tuple including (1, 3, ...
1 vote
3 answers
2k views
How can we express value of cosine similarity of two documents into percentage?
We were doing project work for plagiarism checking. For this purpose, we have taken a term frequency vector of two documents and measured the similarity using a cosine similarity measure. The value of ...
0 votes
0 answers
158 views
Difference between C-Index and Spearman correlation
Suppose I have a list that reflects the priority of web pages for recrawling: l1 = [3, 2, 1, 4, 2, 5] Now, I have tried to estimate the priorities with two ...
1 vote
1 answer
59 views
Would samples be considered data redundancy if they are similar to each other fairly naturally?
I am working on building ML/DL solution for a problem where that data is considered, naturally similar and I am worried if that would be considered as data redundancy. My question is, is that so? and ...
1 vote
0 answers
101 views
What do we mean by permissible transformations in types of attributes-:nominal,ordinal,interval,ratio? [closed]
I am studying data mining and I stumbled upon types of attributes. They are Nominal Ordinal Interval Ratio Data mining book by Tan,Steinbech,Kumar says Permissible transformations for-: nominal-: ...
0 votes
0 answers
63 views
What are the confusion matrix values?
I'm currently going through past paper questions and was wondering if I could get some help answering this one? 'Consider a classification model which is applied to a set of records, of which 100 ...
1 vote
0 answers
36 views
About the paper Privacy-preserving in association rule mining using an improved discrete binary artificial bee colony
I don't understand two parts in this paper: The min notion on page 4 line 357 (equation 10d): I understand this as to find all the $M_{10}$, $M_{11}$, $M_{01}$ first and then try to minimize the ...
0 votes
1 answer
184 views
How to detect outliers using DBSCAN?
I am working on a Fraudulent Cash Transaction Detection System using DBSCAN and I want to know what is the proper way to identify outliers? Thank you ##Edite## I had a problem how to represent the ...
1 vote
1 answer
54 views
In a set of sentences, how could I determine the fewest sentences that contains all characters?
So, for the sake of simplicity, I am going to use English characters for this example. Let's say I have a set of strings of characters in English ranked by difficulty: Easy, Intermediate, Advanced. So ...
0 votes
0 answers
66 views
How to handle distribution of values with same attributes into different classes
I'm a student studying a data mining course and have come across a problem. I need to explain the problem with the help of an example scenario as I do not know how to explain the problem in any other ...