ON USING NONPARAMETRIC REGRESSION METHODS TO ESTIMATE CATEGORICAL OUTCOMES MODELS WITH MIXED DATA TYPES

Authors

  • Ahmed M. Mami Department of Statistics, Faculty of Science, Benghazi University, Benghazi, Libya
  • Ayman Ali Elberjo Department of Statistics, Faculty of Science, Benghazi University, Benghazi, Libya P.O.Box 9480

DOI:

https://doi.org/10.53555/eijas.v1i3.15

Keywords:

Nonparametric Regression, logistic regression model, a conditional cumulative distribution function(CDF), conditional probabilitydensity functions (PDF), bandwidth selection, Cross-validation criterion (CV), Correct Classification Ratio (CCR), log likelihood (LLK), Household Expenditure Survey (HES)

Abstract

Many data analysis methods are sensitive to the type of data under study.  When we begin any statistical data analysis, it is very important to recognize the different types of data. Data can take a variety of values or belong to various categories, whichever numerical or nominal. However, there are two types of data, quantitative and qualitative (Categorical) data. The general and powerful methodological approaches for the analysis of quantitative data have been widely taught for several decades. While the analysis for qualitative data analysis have blossomed only in the past 25 years. The need for analysis of categorical data techniques has increased steadily in recent years, in economic, health, social science. However, analysis of categorical data models when the dependent variable binary or multinomial outcomes with mixed explanatory variables are complex. The main goal of this paper is to estimate a nonparametric regression model of the binary and multinomial outcomes models with mixed explanatory variables, it is based on nonparametric conditional CDF method and (PDF) method of bandwidth selection, presented by   Li and Racine (2008). Then we have compared it with one of the most common method of parametric regression (the logistic regression model). The comparisons will be based on two criteria depends on their classification ability through Correct Classification Ratio CCR as well as their log likelihood value LLK. We conducted several simulation studies using generated random data (categorical discrete and continues) in order to investigate the performance of both the parametric model and the nonparametric model for binary and multinomial outcomes. Interesting results have been achieved in this work. Application on real-data have also been applied when there exist mixed variables. We make use of dataset of the Household Expenditure Survey (HES).

References

. Agresti. A. (1990). Categorical Data Analysis. Wiley, New York.

. Agresti, A. (2002). Categorical Data Analysis 2nd edition. John Wiley & Sons, Inc. Hoboken, New Jersey.

. David W. Hosmer (2000) Applied logistic regression, 2nd ed. John Wiley & Sons, Inc. Hoboken, New Jersey.

. Eubank, Randall L. (1988) Spline Smoothing and Nonparametric Regression. New York: Marcel Dekker 1st ed.

. Hall, P., Racine, J. & Li, Q. (2004) Cross-validation and the estimation of conditional probability densities, Journal of the American Statistical Association 99(2), 1015–1026.

. Li Q, Racine J (2008) Nonparametric Estimation of Conditional CDF and Quantile Functions with Mixed Categorical and Continuous Data. Journal of Business and Economic Statistics, 26(4), 423–434.

Downloads

Published

2015-09-27