I think that over the past months, I have been saying non-correct things about classification with categorical covariates. Because I never took time to look at it carefuly. Consider some simulated dataset, with a logistic regression, > n=1e3 > set.seed(1) > X1=runif(n) > q=quantile(X1,(0:26)/26) > q[1]=0 > X2=cut(X1,q,labels=LETTERS[1:26]) > p=exp(-.1+qnorm(2*(abs(.5-X1))))/(1+exp(-.1+qnorm(2*(abs(.5-X1))))) > Y=rbinom(n,size=1,p) > df=data.frame(X1=X1,X2=X2,p=p,Y=Y) Here, we use some continuous covariate, except that is considered as not-observed. Instead, we have a categorical covariate…