A very simple introduction to Regression with XGBoost

Vicky
2 min readJan 29, 2022

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable.

Regression problems involve predicting contiguous or real values. Few regressions algorithms include:

  1. Linear regression
  2. Decision trees

Decision trees can be used for both classification and regression.

An XGBoost regression model can be defined by creating an instance of the XGBRegressor class.

xgb_reg= XGBRegressor()

Common regression metrics to evaluate the quality of a regression model are:

Root mean square error(RMSE): It is calculated by taking the difference between the predicted values and actual values, squaring those differences, and then computing the mean and finally taking the square root of the mean.

MEAN absolute error(MAE): It simply sums the absolute differences between predicted and actual values across all the samples

Objective functions

It measures the difference between the estimated and true values for the data. The objective function contains loss function and a regularization term. Loss functions in XGBoost takes in account how complex the model is and also find models that are accurate. Regularization is the idea of penalization of the models as they become more complex.

Loss functions used in XGBoost include:

  • reg:linear — use for regression problems
  • bianry:logistic — use when you want the probability
  • reg:logistic — use when you want the decision

Regularization Parameters in XGBoost are:

  • lambda — l2 regularization on leaf weights, which is a smoother penalty than l1 that causes leaf weight to smoothly decrease
  • alpha — l1 regularization on leaf weights, higher the value, higher the regularization
  • gamma — It is a parameter for tree base learners that controls whether a given node on a base learner will split based on the expected reduction in the loss that would occur after performing the split, higher values lead to fewer splits.

Base Learners

XGBoost use many individual models that combines to give a final prediction. These individual models are called base learners. We want each of the base learners to be good at predicting different part of the dataset.

Here is an example where we use Decision trees as base learners using XGBRegressor:

You can read classification with XGBoost in the following article:

--

--