Skip to main content
Search type Search syntax
Tags [tag]
Exact "words here"
Author user:1234
user:me (yours)
Score score:3 (3+)
score:0 (none)
Answers answers:3 (3+)
answers:0 (none)
isaccepted:yes
hasaccepted:no
inquestion:1234
Views views:250
Code code:"if (foo != bar)"
Sections title:apples
body:"apples oranges"
URL url:"*.example.com"
Saves in:saves
Status closed:yes
duplicate:no
migrated:no
wiki:no
Types is:question
is:answer
Exclude -[tag]
-apples
For more details on advanced search visit our help page
Results tagged with
Search options not deleted user 45374

Machine Learning is a subfield of computer science that draws on elements from algorithmic analysis, computational statistics, mathematics, optimization, etc. It is mainly concerned with the use of data to construct models that have high predictive/forecasting ability. Topics include modeling building, applications, theory, etc.

1 vote

Machine Learning: Take into account a variable if a condition is met (depending on another v...

If you are interested to see if there are any other girls, you'll likely need to create a new indicator variable in your modelling data - perhaps something like: other_girls_in_class = 1 if gender = …
bradS's user avatar
  • 1,695
1 vote

Gradient Tree Boosting

My layman's understanding is that binary classification is usually calculated using the logit transform. I believe then that the residuals are the difference between the response and the predicted pr …
bradS's user avatar
  • 1,695
2 votes

Why decision tree needs categorical variable to be encoded?

As I understand it, decision trees use the rules < threshold_value or >= threshold_value to group observations together, where threshold_value is the value of a variable which minimises the cost funct …
bradS's user avatar
  • 1,695
0 votes

Machine learning algorithm that uses the Pearson or Spearman correlation?

I think it depends on the context - whether or not you are interested in algorithms for modelling + prediction or algorithms for feature selection. I'm not aware of any modelling + prediction algorit …
bradS's user avatar
  • 1,695
1 vote

Configuring Incremental XGBoost model

The "optimal" stopping point for XGBoost really depends on the data you feed into it. Using Chunk N and Chunk N+1 for instance, consider the two scenarios: data in Chunk N and Chunk N+1 is very dif …
bradS's user avatar
  • 1,695
1 vote

Dealing with a dataset with a mix of continuous and categorical variables

It depends on which algorithm (and implementation) you are using. For instance, the linear regression implemented in sklearn requires all input variables to be numeric and so encoding will be necessa …
bradS's user avatar
  • 1,695
2 votes

Which feature selection technique to pickup(Boruta vs RFE vs step selection)

Some algorithms perform feature selection inherently - e.g. LASSO, random forests, and gradient-boosted models like XGBoost and LightGBM. If you are using those then there is no need for manual featur …
bradS's user avatar
  • 1,695
2 votes

How to handle a large number of categories in one column effectively in machine learning?

Target encoding calculated using an appropriate cross-validation strategy can also be powerful for high-cardinality categorical features. In some instances, frequency encoding can also be useful.
bradS's user avatar
  • 1,695
1 vote

Adding extra variables to XGboost model is worsening the train and test accuracy

In general, when you change the data being fed into a model you should also consider re-tuning the model parameters. It could be that the addition of the two new features in your data set means that …
bradS's user avatar
  • 1,695
4 votes
Accepted

Boruta Feature Selection package

In R, Boruta relies on the ranger implementation of random forest. So: Converting input variables from categorical to numeric is not necessary. You will need to address NA values prior to running th …
bradS's user avatar
  • 1,695
4 votes
Accepted

How to handle "unknown" category in machine learning classification problems?

I think this is one of those topics with the most frustrating answer - it depends. To your questions: How can we handle these data which fall into "unknown" category? There are many ways of doing t …
bradS's user avatar
  • 1,695
0 votes

In machine learning how to find feature interdepencies?

Tree-based methods (e.g. random forests) and boosted tree methods (e.g. XGBoost) are usually quite good at detecting underlying relationships between features, and it's usually quite straightforward t …
bradS's user avatar
  • 1,695
1 vote

What are the limitations while using XGboost algorithm?

I think you should be more specific about what you mean by "fail". As an example, a practitioner could consider an xgboost model as a failure if it achieves < 80% accuracy. Nevertheless, there are so …
bradS's user avatar
  • 1,695
0 votes
Accepted

Random Forest application with 40+ Predictor Variables

The randomForest package supports various tasks using an existing randomForest object. For instance, it offers the predict method which will perform prediction using a given trained forest and a give …
bradS's user avatar
  • 1,695
1 vote

How to model a Machine learning problem considering links between features

Welcome to the site! What you are describing above is known as an interaction. You should consider the algorithm you wish to use and whether it allows for interactions between predictors. Some techn …
bradS's user avatar
  • 1,695

15 30 50 per page