Creating Composite Scores in R

There maybe times when you need scores corrected for measurement error and latent factor loading. In these instances it can be beneficial to use  composite scores derived from one factor congeneric models. While easy they are time consuming to calculate by hand and I am unaware of any program that does it for you. The following code in R will take normal Mplus output (which must include FSCOEFFICIENT FSDETERMINACY in the output line of your mplus input syntax) and produce a vector of fit and a vector of weights. I have also included the nifty feature of calculating latent variable reliability estimates. The code is in early stages of development so only supports scales with 2 to 4 items but you will get the idea. Now this looks like a lot of work BUT in reality all you need to do is specify where your mplus output file is located in the first line of code and then simply run all the code giving you proportional weights for composite score calculation in about 5 seconds. Again R code is in italics (compile these with the non-italics instructions removed before running).

Creation of proportional factor score regression: Weights for composite score creation

#First run your model in mplus and read it into R. This is done as follows
out <- readLines(D:MPLUSFILENAME.out” )
#Next we use the grep feature to tell R to find aspects of our data
#to include in the output we will produce. This has the advantage of
#being the same no mater how long or short your Mplus input file is
#(that means you dont need to constantly count lines and change input).
#Lets start with chi square. This will give us the value, degrees of freedom and significance.
# Chi Square
ind1 <- grep( “Chi-Square Test of Model Fit” , out )[1]
chisq <- c( as.numeric( substring( out[ ind1 + 2 ] , 38 ) ) )
df <- c( as.numeric( substring( out[ ind1 + 3 ] , 38 ) ) )
p <- c( as.numeric( substring( out[ ind1 + 4 ] , 38 ) ) )
#Next we will get the fit
ind2 <- grep( “CFI/TLI” , out )
cfi <- c( as.numeric( substring( out[ ind2 + 2 ] , 38 ) ) )
tli<- c(as.numeric( substring( out[ ind2 + 3 ] , 38 ) ) )
ind3 <- grep( “RMSEA” , out )[1]
rmsea <- c( as.numeric( substring( out[ ind3 + 2 ] , 38 ) ) )
ind4 <- grep( “SRMR” , out )[1]
srmr <- c( as.numeric( substring( out[ ind4 + 2 ] , 38 ) ) )
#Next we calculate the latent factor reliabilities. 
#First we grab the factor loadings then the residuals
#Factor Loadings
ind6 <- grep( “STDYX Standardization” , out )
PEST<- c (  as.numeric( substring( out[ ind6 + 6 ] , 23, 28) ),                              
as.numeric( substring( out[ ind6 + 7 ] , 23, 28) ),                             
as.numeric( substring( out[ ind6 + 8 ] , 23, 28) ),                               
as.numeric( substring( out[ ind6 + 9 ] , 23, 28) )
RES<- c(    as.numeric( substring( out[ ind6 + 21 ] , 23, 28) ),
as.numeric( substring( out[ ind6 + 22 ] , 23, 28) ), 
as.numeric( substring( out[ ind6 + 23 ] , 23, 28) ), 
as.numeric( substring( out[ ind6 + 24 ] , 23, 28) )
#Next we calculate the reliability rounded to 2 decimal places.
rel <- round( (sum (PEST)^2)/ ( (sum (PEST)^2) + sum (RES) ), digits = 2)
#We can now wrap the fit into a list and
#move one to calculating the proportional weights
# for the composite scores calculations.
fit<-data.frame(chisq, df, p, cfi, tli, rmsea, srmr, fsd, rel)
#Now specific to the factor scores 
#we want we will get the factor determinates and regression weights.
#Factor score determinates
ind5 <- grep( ”           FACTOR DETERMINACIES”  , out )
fsd <- c(as.numeric( substring( out[ ind5 + 2 ] , 22, 27) ) )
#Factor score regression weights
ind7 <- grep( “SUMMARY OF FACTOR SCORES” , out )
FSRW <- c(  as.numeric( substring( out[ ind7 + 9 ] , 16, 21) ),
as.numeric( substring( out[ ind7 + 9 ] , 30, 35) ),  
as.numeric( substring( out[ ind7 + 9 ] , 44, 49) ),   
as.numeric( substring( out[ ind7 + 9 ] , 58, 63) ),    
as.numeric( substring( out[ ind7 + 9 ] , 72, 77) )
#Now we have the data we just calculate the weights
#Finally we call the output we need.
#First we check the proportional weights add to 1 as required
sum (FSPROP,na.rm=T)
#Next we get the vector of fit values and the latent variable relatibility
#Now all we need is the composite score weights
Posted in Uncategorized | Leave a comment

Corrected Cohen’s D

Calculating Cohen’s D is a pain particularly when you have data which is not independent (e.g. is longitudinal). So I have written a function which take away the hassle. It is in early form but produces a normal Cohen’s D and a corrected Cohen’s D for the paired nature of the data.

CCD<- function (x1, x2) {
dif<-mean(x2,na.rm=TRUE)-mean(x1, ,na.rm=TRUE)
SD<-(sd(x1,na.rm=TRUE) + sd(x2,na.rm=TRUE) )/2
COR<-cor(x1, x2, use="complete.obs")	
CCD<-CD/( (2*(1-COR) ) ^.5)	
list("Cohen's d"=CD, "Corrected Cohen's d"=CCD) 
Posted in Uncategorized | Leave a comment

Data generation in R Part 2 – Covariance matirx

Ok now we have done a simple regression we will move to a more general strategy. In this case we will generate a simple covariance matrix with a mean structure. In this case I am only using this approach to generate 2 continuous variables but it can be extended to include many more. Furthermore this approach can  be used for a number of purposes only limited by the fact I have not figured out how to include categorical variables in this system (let me know if you figure it out please). Note that will will need to install the package MASS. Again R code is in italics.

Generating a covariance structure:

#First install MASS which allows for the creation of covariance structures. 
#Then load said package.
library (MASS)
#Now we need to specify a multivariate normal covariance matrix.
covar<-mvrnorm(100, c(0, 0), matrix(c(1, 0.50, 0.50, 1), 2, 2))
#The first set of numbers = number of cases (here 100). 
#Second set in c() = means (here set to 0). 
#Third set = the covariances (variances here set at 1 and covariances at .5). 
#Fourth set = nature of matrix (here its a 2 by 2). Vary all of these as you like.
#Now lets check the matrix looks right. To check the matrix has the setup you want use the following.
check<-matrix(c(1, 0.50, 0.50, 1), nrow=2,ncol=2,byrow=TRUE)
#Next we wrap it up into a dataframe
#Finally we give the variables names of interest
# (best to let them reflect the sort of thing you do in research commonly) 
#and attach names for use
names(mydata)<-c("x1", "x2")
Posted in Uncategorized | 1 Comment

Data Generation in R

Finding information on how to generate data in R that looks like the data you would use in a research paper is hard. As such I will post a few examples on how to do so here. Many thanks to Dason from Talk Stats for getting me started (R code is in italics).  Lets start with simple regression and I will post more complex examples later.

So to generate a simple regression:

#First create a set of random numbers with a mean of 0 and a SD of 1
#To set different mean and sd use mean = x, sd = x
#For example age<-rnorm(100, mean=5, sd=1.5)
#You will want to simplify the data set to look more real
age<-round(age, digits = 2)
#create a model matrix in which age is the predictor
X = model.matrix(~age)
#To be able to easily set and change the error easily and to not have to worry about change the #code if we decide to alter the number of 'participants' in our randomly generated data
#set the intercept and regression parameter. Intercept at 1 and effect of age on y at 1.3 (can be #whatever you like however)
beta <- c(1,1.3)
#add error keeping in mind the scale of the random variable generated. Vary this to see the effect #of error on significance levels and parameter bias.
error <- rnorm(dim(X)[1],mean = 0, sd = 0.5)
#create outcome variable with parameter estimates plus error
SC <- X%*%beta + error
#round the data to be more realistic
SC <- round(SC, digits = 2)
#package into a data frame
mydata <- data.frame(age,SC)
#See if it works
plot(age, SC)
Posted in Uncategorized | Leave a comment