Ok now we have done a simple regression we will move to a more general strategy. In this case we will generate a simple covariance matrix with a mean structure. In this case I am only using this approach to generate 2 continuous variables but it can be extended to include many more. Furthermore this approach can be used for a number of purposes only limited by the fact I have not figured out how to include categorical variables in this system (let me know if you figure it out please). Note that will will need to install the package MASS. Again R code is in italics.

**Generating a covariance structure:**

#First install MASS which allows for the creation of covariance structures.
#Then load said package.
library (MASS)
#Now we need to specify a multivariate normal covariance matrix.
covar<-mvrnorm(100, c(0, 0), matrix(c(1, 0.50, 0.50, 1), 2, 2))
#The first set of numbers = number of cases (here 100).
#Second set in c() = means (here set to 0).
#Third set = the covariances (variances here set at 1 and covariances at .5).
#Fourth set = nature of matrix (here its a 2 by 2). Vary all of these as you like.
#Now lets check the matrix looks right. To check the matrix has the setup you want use the following.
check<-matrix(c(1, 0.50, 0.50, 1), nrow=2,ncol=2,byrow=TRUE)
#Next we wrap it up into a dataframe
mydata<-data.frame(covar)
#Finally we give the variables names of interest
# (best to let them reflect the sort of thing you do in research commonly)
#and attach names for use
names(mydata)<-c("x1", "x2")
attach(mydata)

### Like this:

Like Loading...

*Related*

## About Philip Parker

I am a post doc in developmental and educational psychology at a Germany university. I did my PhD at the university of Sydney in stress and well-being. Most days I am hunched over a computer yelling at statistical software or responding to journal editors who seem to always want twice the amount of content but with half the words. For fun I like to read up on the latest developments in R and programming various functions.

Is this method usable to generate data to test unsupervised feature selection algorithms?