Search Results
| Search type | Search syntax |
|---|---|
| Tags | [tag] |
| Exact | "words here" |
| Author | user:1234 user:me (yours) |
| Score | score:3 (3+) score:0 (none) |
| Answers | answers:3 (3+) answers:0 (none) isaccepted:yes hasaccepted:no inquestion:1234 |
| Views | views:250 |
| Code | code:"if (foo != bar)" |
| Sections | title:apples body:"apples oranges" |
| URL | url:"*.example.com" |
| Saves | in:saves |
| Status | closed:yes duplicate:no migrated:no wiki:no |
| Types | is:question is:answer |
| Exclude | -[tag] -apples |
| For more details on advanced search visit our help page | |
Results tagged with machine-learning
Search options not deleted user 45374
Machine Learning is a subfield of computer science that draws on elements from algorithmic analysis, computational statistics, mathematics, optimization, etc. It is mainly concerned with the use of data to construct models that have high predictive/forecasting ability. Topics include modeling building, applications, theory, etc.
1 vote
Machine Learning: Take into account a variable if a condition is met (depending on another v...
If you are interested to see if there are any other girls, you'll likely need to create a new indicator variable in your modelling data - perhaps something like: other_girls_in_class = 1 if gender = …
1 vote
Gradient Tree Boosting
My layman's understanding is that binary classification is usually calculated using the logit transform. I believe then that the residuals are the difference between the response and the predicted pr …
2 votes
Why decision tree needs categorical variable to be encoded?
As I understand it, decision trees use the rules < threshold_value or >= threshold_value to group observations together, where threshold_value is the value of a variable which minimises the cost funct …
0 votes
Machine learning algorithm that uses the Pearson or Spearman correlation?
I think it depends on the context - whether or not you are interested in algorithms for modelling + prediction or algorithms for feature selection. I'm not aware of any modelling + prediction algorit …
1 vote
Configuring Incremental XGBoost model
The "optimal" stopping point for XGBoost really depends on the data you feed into it. Using Chunk N and Chunk N+1 for instance, consider the two scenarios: data in Chunk N and Chunk N+1 is very dif …
1 vote
Dealing with a dataset with a mix of continuous and categorical variables
It depends on which algorithm (and implementation) you are using. For instance, the linear regression implemented in sklearn requires all input variables to be numeric and so encoding will be necessa …
2 votes
Which feature selection technique to pickup(Boruta vs RFE vs step selection)
Some algorithms perform feature selection inherently - e.g. LASSO, random forests, and gradient-boosted models like XGBoost and LightGBM. If you are using those then there is no need for manual featur …
2 votes
How to handle a large number of categories in one column effectively in machine learning?
Target encoding calculated using an appropriate cross-validation strategy can also be powerful for high-cardinality categorical features. In some instances, frequency encoding can also be useful.
1 vote
Adding extra variables to XGboost model is worsening the train and test accuracy
In general, when you change the data being fed into a model you should also consider re-tuning the model parameters. It could be that the addition of the two new features in your data set means that …
4 votes
Accepted
Boruta Feature Selection package
In R, Boruta relies on the ranger implementation of random forest. So: Converting input variables from categorical to numeric is not necessary. You will need to address NA values prior to running th …
4 votes
Accepted
How to handle "unknown" category in machine learning classification problems?
I think this is one of those topics with the most frustrating answer - it depends. To your questions: How can we handle these data which fall into "unknown" category? There are many ways of doing t …
0 votes
In machine learning how to find feature interdepencies?
Tree-based methods (e.g. random forests) and boosted tree methods (e.g. XGBoost) are usually quite good at detecting underlying relationships between features, and it's usually quite straightforward t …
1 vote
What are the limitations while using XGboost algorithm?
I think you should be more specific about what you mean by "fail". As an example, a practitioner could consider an xgboost model as a failure if it achieves < 80% accuracy. Nevertheless, there are so …
0 votes
Accepted
Random Forest application with 40+ Predictor Variables
The randomForest package supports various tasks using an existing randomForest object. For instance, it offers the predict method which will perform prediction using a given trained forest and a give …
1 vote
How to model a Machine learning problem considering links between features
Welcome to the site! What you are describing above is known as an interaction. You should consider the algorithm you wish to use and whether it allows for interactions between predictors. Some techn …