Welcome to Innominds Blog. As thought leaders and visionaries in the tech industry, our blog serves as your resource for innovative ideas, advanced technical solutions and industry-standard technologies. Enjoy our insights and engage with us!

Innominds Blog

Don’t Ignore the Deviances. They Tell You Good Things About Your Model

By Ravi Kumar Meduri

 Many a times, data scientists and analysts input data and then train a model like logistic regression for classification. Most of the practitioners do not seem to spend enough time on this part of the output and instead focus only on the top level diagnostics consisting of coefficient summary, RMSE or classification matrix and may be the overall measure such as an R2 in the case of a linear regression or a C-Stat or an area under curve (AUC) for logistic regression. While not denying the value of these top level diagnostics, it is also important to check for model fit than just the estimation or classification error. This is when the deviance based diagnostics come handy.

rainbow.png

Source: By Geek3 - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=9884213

Considering the quartile distribution of the deviance residuals from the above tells us how the deviance residuals vary with each quartile and how they can be used to test the independence and normality (a critical assumption for linear regression) of the error terms whose distribution is unknown and can only be seen or approximated empirically by residuals. In this case, the quartile distribution of the deviance residuals suggests that there is a basis to reject the independence of the residuals and also a residual plot gives more information upon how the residuals are distributed.

rainbow1.jpg

The Null deviance indicates whether the model with just the intercept explains better than the saturated model and the Residual deviance indicates whether the model with intercept and parameters explains better than the saturated model. These deviances fit to a Chi2 distribution and a check for the probability for 458.52 for 394 degrees of freedom is 0.013651 and for 499.98 on 399 degrees of freedom is 0.000428.

We can now conclude that the Null deviance and Residual deviance in this case are for real (since the probability values are less than the significance threshold value of 0.01). This implies that the proposed model explains significantly better than the saturated model.

Hence, whenever these deviance measures are seen; it is advisable to make use of the deviances and test your model fit much before you step to the output and diagnostics. There are more goodness of fit measures for advanced diagnostics and more of that later.

Topics: Big Data, Technology

Ravi Kumar Meduri

Ravi Kumar Meduri

Vice President, Engineering Services, Innominds.

Subscribe to Email Updates