I am a fan of K-means approaches to clustering data particularly when you have a theoretical reason to expect a certain number of clusters and you have a large data set. However, I think ploting the cluster means can be misleading. Reading though Hadley Wickham’s ggplot2 book he suggest the following, to which I add a few little change.
#First we run the kmeans analysis: In brackets is the dataset used #(in this case I only want variables #1 through 11 hence the [1:11]) #and the number of clusters I want produced (in this case 4). cl <- kmeans(mydata[1:11], 4) #We will need to add an id variable for later use. In this case I have called it .row. clustT1WIN$.row <- rownames(clustT1WIN) #At this stage I also make a new variable indicating cluster membership as below. # I have a good #idea of what my clusters will be called so #I gave them those names in the second line of the code. #Then I put it together and put the data in a form that is good for graphing. cluster<-cl$cluster cl.cluster<-as.vector(recode (cluster, "1='FC'; 2='FV'; 3='SO'; 4= 'OS' ", as.numeric.result=FALSE) ) clustT1WIN2<- data.frame (clustT1WIN [1:12], cl.cluster) molten2 <- melt(clustT1WIN2, id = c(".row", "cl.cluster") ) #OK set up the graph background. #Following the ggplot book I also create a jit parameter cause it is #much easier to alter this and type it in than the full code over and over again. pcp_cl <- ggplot(molten2, aes(variable, value, group = .row, colour = cl.cluster) ) jit<- position_jitter (width = .08, height = .08) #Ok first graph the cluster means. pcp_cl + stat_summary(aes(group = cl.cluster), fun.y = mean, geom = "line") #Then we produce a colourful but uninformative parallel coordinates #plot with a bit of alpha blending and jitter. pcp_cl + geom_line(position = jit, alpha = 1/5) #All code up to this point is as per Wickham but #I also add the cluster means graph that we #first produced as well as changing the angle of the x axis text so it is readable. pcp_cl + geom_line(position = jit, colour = alpha("black", 1/4)) + stat_summary(aes(group = cl.cluster), fun.y = mean, geom = "line", size = 1.5 ) + facet_wrap(~ cl.cluster)+ opts(axis.text.x=theme_text(angle=-45, hjust=0) )
Relatively simple but visually very informative. Here is the final result: