Search Results

Advanced Search Tips

Search type	Search syntax
Tags	[tag]
Exact	"words here"
Author	user:1234 user:me (yours)
Score	score:3 (3+) score:0 (none)
Answers	answers:3 (3+) answers:0 (none) isaccepted:yes hasaccepted:no inquestion:1234
Views	views:250
Code	code:"if (foo != bar)"
Sections	title:apples body:"apples oranges"
URL	url:"*.example.com"
Saves	in:saves
Status	closed:yes duplicate:no migrated:no wiki:no
Types	is:question is:answer
Exclude	-[tag] -apples
For more details on advanced search visit our help page

Results tagged with machine-learning

Search options not deleted user 45374

28 results

Relevance Newest Score Active

Machine Learning is a subfield of computer science that draws on elements from algorithmic analysis, computational statistics, mathematics, optimization, etc. It is mainly concerned with the use of data to construct models that have high predictive/forecasting ability. Topics include modeling building, applications, theory, etc.

1 vote

Machine Learning: Take into account a variable if a condition is met (depending on another v...

If you are interested to see if there are any other girls, you'll likely need to create a new indicator variable in your modelling data - perhaps something like: other_girls_in_class = 1 if gender = …

1,695

answered Sep 10, 2018 at 7:44

1 vote

Gradient Tree Boosting

My layman's understanding is that binary classification is usually calculated using the logit transform. I believe then that the residuals are the difference between the response and the predicted pr …

1,695

answered Jun 11, 2018 at 14:38

2 votes

Why decision tree needs categorical variable to be encoded?

As I understand it, decision trees use the rules < threshold_value or >= threshold_value to group observations together, where threshold_value is the value of a variable which minimises the cost funct …

1,695

answered May 16, 2019 at 13:47

0 votes

Machine learning algorithm that uses the Pearson or Spearman correlation?

I think it depends on the context - whether or not you are interested in algorithms for modelling + prediction or algorithms for feature selection. I'm not aware of any modelling + prediction algorit …

1,695

answered Oct 14, 2019 at 10:25

1 vote

Configuring Incremental XGBoost model

The "optimal" stopping point for XGBoost really depends on the data you feed into it. Using Chunk N and Chunk N+1 for instance, consider the two scenarios: data in Chunk N and Chunk N+1 is very dif …

1,695

answered Feb 20, 2018 at 10:00

1 vote

Dealing with a dataset with a mix of continuous and categorical variables

It depends on which algorithm (and implementation) you are using. For instance, the linear regression implemented in sklearn requires all input variables to be numeric and so encoding will be necessa …

1,695

answered Nov 29, 2019 at 10:23

2 votes

Which feature selection technique to pickup(Boruta vs RFE vs step selection)

Some algorithms perform feature selection inherently - e.g. LASSO, random forests, and gradient-boosted models like XGBoost and LightGBM. If you are using those then there is no need for manual featur …

1,695

answered Jan 7, 2020 at 13:34

2 votes

How to handle a large number of categories in one column effectively in machine learning?

Target encoding calculated using an appropriate cross-validation strategy can also be powerful for high-cardinality categorical features. In some instances, frequency encoding can also be useful.

1,695

answered Jul 15, 2021 at 13:54

1 vote

Adding extra variables to XGboost model is worsening the train and test accuracy

In general, when you change the data being fed into a model you should also consider re-tuning the model parameters. It could be that the addition of the two new features in your data set means that …

1,695

answered Nov 8, 2019 at 9:29

4 votes

Accepted

Boruta Feature Selection package

In R, Boruta relies on the ranger implementation of random forest. So: Converting input variables from categorical to numeric is not necessary. You will need to address NA values prior to running th …

1,695

answered May 2, 2018 at 12:11

4 votes

Accepted

How to handle "unknown" category in machine learning classification problems?

I think this is one of those topics with the most frustrating answer - it depends. To your questions: How can we handle these data which fall into "unknown" category? There are many ways of doing t …

1,695

answered Sep 3, 2018 at 11:59

0 votes

In machine learning how to find feature interdepencies?

Tree-based methods (e.g. random forests) and boosted tree methods (e.g. XGBoost) are usually quite good at detecting underlying relationships between features, and it's usually quite straightforward t …

1,695

answered Sep 12, 2018 at 12:56

1 vote

What are the limitations while using XGboost algorithm?

I think you should be more specific about what you mean by "fail". As an example, a practitioner could consider an xgboost model as a failure if it achieves < 80% accuracy. Nevertheless, there are so …

1,695

answered Jan 18, 2019 at 9:27

0 votes

Accepted

Random Forest application with 40+ Predictor Variables

The randomForest package supports various tasks using an existing randomForest object. For instance, it offers the predict method which will perform prediction using a given trained forest and a give …

1,695

answered May 23, 2019 at 13:09

1 vote

How to model a Machine learning problem considering links between features

Welcome to the site! What you are describing above is known as an interaction. You should consider the algorithm you wish to use and whether it allows for interactions between predictors. Some techn …

1,695

answered Jul 3, 2018 at 9:22

1

15 30 50 per page