Function for Scaled Difference Tests (chi-square and logliklihood values)

#For use when comparing the fit of nested models with complex data,
# (e.g. TYPE = COMPLEX is mplus)
# The Scaled Difference Chi-square Test Statistic can be found at
#http://preprints.stat.ucla.edu/260/chisquare.pdf
# This function provides scaled differences tests based on chi-square
# and loglikihood values.
# The notation below is taken from http://www.statmodel.com/chidiff.shtml
 
#For the chi-square difference test (which is the default in this function),
# the notation from
#www.statmodel.com reads:
# "d0 is the degrees of freedom in the nested model, 
#c0 is the scaling correction factor for the nested model,
#d1 is the degrees of freedom in the comparison model, 
#and c1 is the scaling correction factor for the comparison model.
#Be sure to use the correction factor given in the output for the H0 model.
#... T0 and T1 are the MLM, MLR, 
#or WLSM chi-square values for the nested and comparison model, respectively."
 
#For the difference test using logliklihood value (loglike=TRUE):
#d0 and d1 are the number of parameters for the H0 and H1 models.
#c0 and c1 are the scaling correction factors for the H0 and H1 models.
# t0 and t1 are the loglikelihood values for the H0 and H1 models.
 
scale.diff.test <- function (d0, d1, c0, c1, t0, t1, loglike=FALSE) {
   if(loglike) {cd.2<- (d0*c0 - d1*c1 ) / (d0 - d1)
    x<- t0 - t1
    TRd.2<- -2*(x) / cd.2
    df.2 <- abs(d0 - d1)
    p.2 <- pchisq(TRd.2, df.2)
    sig.2 <- 1 - p.2
  cat ("scaled loglikelihood difference test for complex data:", "\n")
  cat ("value =", TRd.2)
  cat ("  df=", df.2)
  cat ("  sig. =", sig.2, "\n")}
  else {cd<- (d0*c0 - d1*c1 ) / (d0 - d1)
  TRd<-(t0*c0 - t1*c1) / cd
  df <- abs(d0 - d1)
  p <- pchisq(TRd, df)
  sig <- 1 - p
  cat ("scaled Chi-squared difference test for complex data:", "\n")
  cat ("value =", TRd)
  cat (" df=", df)
  cat ("  sig. = ", sig, "\n") }
 }
 
scale.diff.test (45, 43, 1.12, 1.123, 246.44, 237.846)
scale.diff.test (39, 47, 1.45, 1.546, -2606, -2583, loglike=TRUE)
 
scale.diff.test (162, 166, 1.088, 1.09, 777.043, 791.371)
scale.diff.test (166, 162, 1.09, 1.088, 791.371, 777.043)
 
scale.diff.test (77, 72, 1, 1, 250, 231)

Created by Pretty R at inside-R.org

Examples:
scale.diff.test (45, 43, 1.12, 1.123, 246.44, 237.846)
scale.diff.test (39, 47, 1.45, 1.546, -2606, -2583, loglike=TRUE)

Output:
scaled Chi-squared difference test for complex data:
value = 8.443147 df= 2 sig. = 0.01467553

scaled loglikelihood difference test for complex data:
value = 22.84012 df= 8 sig. = 0.003575695

Posted in Uncategorized | Leave a comment

Collider Example

library (MASS)
covar<-mvrnorm(250, c(0, 0), matrix(c(1, 0.00, 0.00, 1), 2, 2))
mydata<-data.frame(covar)
names(mydata)<-c("sat", "mot")
mydata$admin<- mydata$sat + mydata$mot
mydata$admin2<- ifelse (mydata$admin >=quantile(mydata$admin, .85), "pass", "fail")
 
library(ggplot2)
qplot(sat,mot, data = mydata, color = admin2)

Created by Pretty R at inside-R.org

Posted in Uncategorized | Leave a comment

Stirling’s formula for R

I am sure it is out there but for my record keeping purposes and for anyone interested here is a quick function to find Stirling’s approximation of all possible permutations

Stirling<- function (n)((2*pi*n)^.5)*(n/exp(1))^n

Created by Pretty R at inside-R.org

Posted in Uncategorized | 1 Comment

Pseudo R Square Values in R

R does not give a Pseudo R square value and neither does Mplus. As such, here is a real quick R function to calculate McFadden’s and Adjusted McFadden’s Pseudo R square value:

 

#Pesudo R square
#x = full model
#y = null model
#k = number of parameters
 
PRsq<- function (x, y, k ) {
a= 1- (x/y)
b= 1- ((x-k)/y)
cat ("Pesudo R Square Estimates", "\n")
cat ("McFadden's:          ", a, "\n")
cat ("adjusted McFadden's: ", b, "\n")
}

Posted in Uncategorized | 2 Comments

Additional Functions for the MatchIt Package – Part 2

I have reworked the graphing and summarizing functions I wrote for the matchit package to be cleaner and to provide more information.

Currently I have functions for a sorted by size summary table of the post matching mean differences:

meandifftable<- function (x){
	post<-data.frame(x$sum.matched[4])
	matchID <- as.vector (row.names (post) )
	names(post)[1]<-c("m_mean_diff")
	post$absolute<- abs(post[1])
	total2<-post[order (-post$absolute, na.last=NA) ,]
	meandiffover1<- subset(total2[1], total2[1]> .1 | total2[1]< -.1)
	meandiffover1
}

A graph function (histogram and density plot) for both pre-matched mean differences:

all_meandiffplot <- function (x) {
	adiff<-data.frame(x$sum.all)
	names(adiff)[4]<-c("all_mean_diff")
	diffplot<-ggplot(adiff, aes(all_mean_diff) )
	diffplot<- diffplot+ geom_histogram (fill="grey")
	diffplot<- diffplot+ geom_density (colour="red")
	diffplot<-diffplot+xlim(-.5, .5)
	diffplot
	}

and after matching mean differences:

matched_meandiffplot <- function (x) {
	mdiff<-data.frame(x$sum.matched)
	names(mdiff)[4]<-c("matched_mean_diff")
	diffplot<-ggplot(mdiff, aes(matched_mean_diff) )
	diffplot<- diffplot+ geom_histogram (fill="grey")
	diffplot<- diffplot+ geom_density (colour="red")
	diffplot<-diffplot+xlim(-.5, .5)
	diffplot
	}

Not that both plots are on a scale from -.5 to .5 so that they can easily be compared.

Finally I have a set of simple tables which indicate how many “large” (>.25), “medium” (>.20) and “small” (<.20) standardized mean differences there were before matching:

all_meandiffcount<-function (x){
	all<-data.frame(x$sum.all[4])
	all$all_group[all[1] > .25]<- "Large"
 	all$all_group[all[1] < -.25] <- "Large"
	all$all_group[all[1] > .20 & all[1] < .25 ] <- "Medium"
	all$all_group[all[1] < -.20 & all[1] > -.25] <- "Medium"
	all$all_group[all[1] < .20 & all[1] > .00]<- "Small"
	all$all_group[all[1] > -.20 & all[1] < .00] <- "Small"
	table(all$all_group)
}

and after matching:

matched_meandiffcount<-function (x){
	matched<-data.frame(x$sum.matched[4])
	matched$matched_group[matched[1] > .25]<- "Large"
 	matched$matched_group[matched[1] < -.25] <- "Large"
	matched$matched_group[matched[1] > .20 & matched[1] < .25 ] <- "Medium"
	matched$matched_group[matched[1] < -.20 & matched[1] > -.25] <- "Medium"
	matched$matched_group[matched[1] < .20 & matched[1] > .00]<- "Small"
	matched$matched_group[matched[1] > -.20 & matched[1] < .00] <- "Small"
	table(matched$matched_group)
}
Posted in Uncategorized | Leave a comment

Simplified Output for Matchit package

Have been attending a seminar on propensity score matching this week, when I was asked if I could write a function to reduce the complexity of the matchit package output. I produced two functions.

First I wrote a function to produce a histogram of the matched mean differences which gives a nice summary of how well the matching procedure has achieved balance (note you need to install the package ggplot2 to use this function):

meandiffplot <- function (x) {
	mdiff<-data.frame(x$sum.matched)
	names(mdiff)[4]<-c("m_mean_diff")
	diffplot<-ggplot(mdiff, aes(m_mean_diff) )
	diffplot<- diffplot+ geom_histogram (fill="grey")
	diffplot<- diffplot+ geom_density (colour="red")
	diffplot 
	}

Next I produced a function the reports only those matched mean differences with a standardized difference of over .1 (sorted by absolute size).

meandifftable<- function (x){
	post<-data.frame(x$sum.matched[4])
	matchID <- as.vector (row.names (post) )
	names(post)[1]<-c("m_mean_diff")
	post$absolute<- abs(post[1])
	total2<-post[order (-post$absolute, na.last=NA) ,]
	meandiffover1<- subset(total2[1], total2[1]> .1 | total2[1]< -.1)
	meandiffover1
}

If you use the matchit package I think these functions are really useful.

 

Posted in Uncategorized | Leave a comment

Displaying K-means Results.

I am a fan of K-means approaches to clustering data particularly when you have a theoretical reason to expect a certain number of clusters and you have a large data set. However, I think ploting the cluster means can be misleading. Reading though Hadley Wickham’s ggplot2 book he suggest the following, to which I add a few little change.

#First we run the kmeans analysis: In brackets is the dataset used 
#(in this case I only want variables #1 through 11 hence the [1:11]) 
#and the number of clusters I want produced (in this case 4).
 
cl <- kmeans(mydata[1:11], 4)
 
#We will need to add an id variable for later use. In this case I have called it .row.
 
clustT1WIN$.row <- rownames(clustT1WIN)
 
#At this stage I also make a new variable indicating cluster membership as below.
# I have a good #idea of what my clusters will be called so 
#I gave them those names in the second line of the code. 
#Then I put it together and put the data in a form that is good for graphing.
 
cluster<-cl$cluster
 
cl.cluster<-as.vector(recode (cluster, "1='FC'; 2='FV'; 3='SO'; 4= 'OS' ", 
as.numeric.result=FALSE) )
 
clustT1WIN2<- data.frame (clustT1WIN [1:12], cl.cluster) 
molten2 <- melt(clustT1WIN2, id = c(".row", "cl.cluster") )
 
#OK set up the graph background. 
#Following the ggplot book I also create a jit parameter cause it is
 #much easier to alter this and type it in than the full code over and over again.
 
pcp_cl <- ggplot(molten2,   aes(variable, value, group = .row, colour = cl.cluster) )          
jit<- position_jitter (width = .08, height = .08)
 
#Ok first graph the cluster means.
 
pcp_cl + stat_summary(aes(group = cl.cluster), fun.y = mean,   geom = "line")
 
#Then we produce a colourful but uninformative parallel coordinates 
#plot with a bit of alpha blending and jitter.
 
pcp_cl + geom_line(position = jit, alpha = 1/5)
 
#All code up to this point is as per Wickham but 
#I also add the cluster means graph that we
 #first produced as well as changing the angle of the x axis text so it is readable.
 
pcp_cl + geom_line(position = jit, colour = alpha("black", 1/4))
 +  stat_summary(aes(group = cl.cluster), fun.y = mean,   geom = "line", size = 1.5 )
 +  facet_wrap(~ cl.cluster)+ opts(axis.text.x=theme_text(angle=-45, hjust=0) )

Relatively simple but visually very informative. Here is the final result:

The final product

Posted in Uncategorized | 1 Comment