Crash Course to Crack Machine Learning Interview – Part 2: Linear Regression

Karun Thankachan Last Updated : 07 Nov, 2025

7 min read

When it comes to machine learning interviews, Linear Regression almost always shows up. It’s one of those algorithms that looks simple at first, and that’s exactly why interviewers love it. It’s like the “hello world” of ML: easy to understand on the surface, but full of details that reveal how well you actually know your fundamentals.

A lot of candidates dismiss it as “too basic,” but here’s the truth: if you can’t clearly explain Linear Regression, it’s hard to convince anyone you understand more complex models.

So in this post, I’ll walk you through everything you really need to know, assumptions, optimization, evaluation metrics, and those tricky pitfalls that interviewers love to probe. Think of this as your practical, no-fluff guide to talking about Linear Regression with confidence.

Also check out my previous interview guides:

What Linear Regression Really Does?

At its heart, Linear Regression is about modeling relationships.

Imagine you’re trying to predict someone’s weight from their height. You know taller people tend to weigh more, right? Linear Regression just turns that intuition into a mathematical equation; basically, it draws the best-fitting line that connects height to weight.

The simple version looks like this:

y = β₀ + β₁x + ε

Here, y is what you want to predict, x is your input, β₀ is the intercept (value of y when x=0), β₁ is the slope (how much y changes when x increases by one unit), and ε is the error, the stuff the line can’t explain.

Of course, real-world data is rarely that simple. Most of the time, you have multiple features. That’s when you move to multiple linear regression:

y = β₀ + β₁x₁ + β₂x₂ + … + βₙxₙ + ε

Now you’re fitting a hyperplane in multi-dimensional space instead of just a line. Each coefficient tells you how much that feature contributes to the target, holding everything else constant. This is one of the reasons interviewers like asking about it: it tests whether you actually understand what your model is doing, not just whether you can run .fit() in scikit-learn.

The Famous Assumptions (and Why They Matter)

Linear Regression is elegant, but it rests on a few key assumptions. In interviews, you’ll often get bonus points if you can not only name them but also explain why they matter or how to check them.

Linearity – The relationship between features and the target should be linear.
Test it: Plot residuals vs. predicted values; if you see patterns or curves, it’s not linear.
Fix it: Try transformations (like log or sqrt), polynomial terms, or even switch to a non-linear model.
Independence of Errors – Errors shouldn’t be correlated. This one bites a lot of people doing time-series work.
Test it: Use the Durbin–Watson test (around 2 = good).
Fix it: Consider ARIMA or add lag variables.
Homoscedasticity – The errors should have constant variance. In other words, the spread of residuals should look roughly the same everywhere.
Test it: Plot residuals again. A “funnel shape” means you have heteroscedasticity.
Fix it: Transform the dependent variable or try Weighted Least Squares.
Normality of Errors – Residuals should be roughly normally distributed (mostly matters for inference).
Test it: Histogram or Q–Q plot.
Fix it: With enough data, this matters less (thanks, Central Limit Theorem).
No Multicollinearity – Predictors shouldn’t be too correlated with each other.
Test it: Check VIF scores (values >5 or 10 are red flags).
Fix it: Drop redundant features or use Ridge/Lasso regression.

In practice, these assumptions are rarely perfect. What matters is knowing how to test and fix them; that’s what separates theory from applied understanding.

How Linear Regression Learns?

Once you’ve set up the equation, how does the model actually learn those coefficients (the βs)?

The goal is simple: find β values that make the predicted values as close as possible to the actual ones.

The most common method is Ordinary Least Squares (OLS), it minimizes the sum of squared errors (the differences between actual and predicted values). Squaring prevents positive and negative errors from canceling out and penalizes big mistakes more.

There are two main ways to find the best coefficients:

Closed-form solution (analytical):
Directly solve for β using linear algebra:
β̂ = (XᵀX)⁻¹Xᵀy
This is exact and fast for small datasets, but it doesn’t scale well when you have thousands of features.
Gradient Descent (iterative):
When the dataset is huge, gradient descent takes small steps in the direction that reduces error the most.
It’s slower but much more scalable, and it’s the foundation of how neural networks learn today.

Making Sense of the Coefficients

Each coefficient tells you how much the target changes when that feature increases by one unit, assuming all others stay constant. That’s what makes Linear Regression so interpretable.

For example, if you’re predicting house prices, and the coefficient for “square footage” is 120, it means that (roughly) every extra square foot adds $120 to the price, holding other features constant.

This interpretability is also why interviewers love it. It tests if you can explain models in plain English, a key skill in data roles.

Evaluating Your Model

Once your model is trained, you’ll want to know: how good is it? There are a few go-to metrics:

MSE (Mean Squared Error): Average of squared residuals. Penalizes big errors heavily.
RMSE (Root MSE): Just the square root of MSE, so it’s in the same units as your target.
MAE (Mean Absolute Error): Average of absolute differences. More robust to outliers.
R² (Coefficient of Determination): Measures how much variance in the target your model explains.

The closer to 1, the better, though adding features always increases it, even if they don’t help. That’s why Adjusted R² is better; it penalizes adding useless predictors.

There’s no “best” metric; it depends on your problem. If large mistakes are extra bad (say, predicting medical dosage), go with RMSE. If you want something robust to outliers, MAE is your friend.

Also Read: A Comprehensive Introduction to Evaluating Regression Models

Practical Tips & Common Pitfalls

A few things that can make or break your regression model:

Feature scaling: Not strictly required, but essential if you use regularization (Ridge/Lasso).
Categorical features: Use one-hot encoding, but drop one dummy to avoid multicollinearity.
Outliers: Can heavily distort results. Always check residuals and use robust methods if needed.
Overfitting: Too many predictors? Use regularization, Ridge (L2) or Lasso (L1).
- Ridge shrinks coefficients
- Lasso can actually drop unimportant ones (useful for feature selection).

And remember, Linear Regression doesn’t imply causation. Just because a coefficient is positive doesn’t mean changing that variable will cause the target to rise. Interviewers love candidates who acknowledge that nuance.

10 Common Interview Questions on Linear Regression

Here are a few that come up all the time:

Q1. What are the key assumptions of linear regression, and why do they matter?

A. Linear regression comes with a few rules that make sure your model works properly. You need a linear relationship between features and target, independent errors, constant error variance, normally distributed residuals, and no multicollinearity. Basically, these assumptions make your coefficients meaningful and your predictions trustworthy. Interviewers love it when you also mention how to check them, like looking at residual plots, using the Durbin-Watson test, or calculating VIF scores.

Q2. How does ordinary least squares estimate coefficients?

A. OLS finds the best fit line by minimizing the squared differences between predicted and actual values. For smaller datasets, you can solve it directly with a formula. For larger datasets or lots of features, gradient descent is usually easier. It just takes small steps in the direction that reduces the error until it finds a good solution.

Q3. What is multicollinearity and how do you detect and handle it?

A. Multicollinearity happens when two or more features are highly correlated. That makes it hard to tell what each feature is actually doing and can make your coefficients unstable. You can spot it using VIF scores or a correlation matrix. To fix it, drop one of the correlated features, combine them into one, or use Ridge regression to stabilize the estimates.

Q4. What is the difference between R² and Adjusted R²?

A. R² tells you how much of the variance in your target variable your model explains. The problem is it always increases when you add more features, even if they are useless. Adjusted R² fixes that by penalizing irrelevant features. So when you are comparing models with different numbers of predictors, Adjusted R² is more reliable.

Q5. Why might you prefer MAE over RMSE as an evaluation metric?

A. MAE treats all errors equally while RMSE squares the errors, which punishes big mistakes more. If your dataset has outliers, RMSE can make them dominate the results, while MAE gives a more balanced view. But if large errors are really bad, like in financial predictions, RMSE is better because it highlights those mistakes.

Q6. What happens if residuals are not normally distributed?

A. Strictly speaking, residuals don’t have to be normal to estimate coefficients. But normality matters if you want to do statistical inference like confidence intervals or hypothesis tests. With big datasets, the Central Limit Theorem often takes care of this. Otherwise, you could use bootstrapping or transform variables to make the residuals more normal.

Q7. How do you detect and handle heteroscedasticity?

A. Heteroscedasticity just means the spread of errors is not the same across predictions. You can detect it by plotting residuals against predicted values. If it looks like a funnel, that’s your clue. Statistical tests like Breusch-Pagan also work. To fix it, you can transform your target variable or use Weighted Least Squares so the model doesn’t give too much weight to high-variance points.

Q8. What happens if you include irrelevant variables in a regression model?

A. Adding irrelevant features makes your model more complicated without improving predictions. Coefficients can get inflated and R² might trick you into thinking your model is better than it really is. Adjusted R² or Lasso regression can help keep your model honest by penalizing unnecessary predictors.

Q9. How would you evaluate a regression model when errors have different costs?

A. Not all mistakes are equal in real life. For example, underestimating demand might cost way more than overestimating it. Standard metrics like MAE or RMSE treat all errors the same. In these cases, you could use a custom cost function or Quantile Regression to focus on the more expensive mistakes. This shows you understand the business side as well as the math.

Q10. How do you handle missing data in regression?

Missing data can mess up your model if you ignore it. You could impute with the mean, median, or mode, or use regression or k-NN imputation. For more serious cases, multiple imputation accounts for uncertainty. The first step is always to ask why the data is missing. Is it completely random, random based on other variables, or not random at all? The answer changes how you handle it.

If you can confidently answer those, you’re already ahead of most candidates.

Conclusion

Linear Regression might be old-school, but it’s still the backbone of machine learning. Mastering it isn’t about memorizing formulas; it’s about understanding why it works, when it fails, and how to fix it. Once you’ve nailed that, everything else, from logistic regression to deep learning, starts to make a lot more sense.

Karun Thankachan

Karun Thankachan is a Senior Data Scientist specializing in Recommender Systems and Information Retrieval. He has worked across E-Commerce, FinTech, PXT, and EdTech industries. He has several published papers and 2 patents in the field of Machine Learning. Currently, he works at Walmart E-Commerce improving item selection and availability.

Karun also serves on the editorial board for IJDKP and JDS and is a Data Science Mentor on Topmate. He was awarded the Top 50 Topmate Creator Award in North America(2024), Top 10 Data Mentor in USA (2025) and is a Perplexity Business Fellow. He also writes to 70k+ followers on LinkedIn and is the co-founder BuildML a community running weekly research papers discussion and monthly project development cohorts.

Free Courses

4.6

Exploratory Data Analysis with Python & GenAI

Learn EDA with Python: Transform data into insights using PandasAI & more.

4.5

Ace a Data Scientist Interview in 2025

Build a powerful 2025-ready data science resume using AI tools.

4.5

No Code Predictive Analytics with Orange

No-code AI course for business pros with real-world ML use cases.

4.7

Adaptive Email Agents with DSPy

Build adaptive email agents with DSPy using context and smart learning.

4.9

Introduction to AI & ML

AI & ML are transforming industries. Learn their impacts in this course.

Reading list