Displaying K-means Results.

I am a fan of K-means approaches to clustering data particularly when you have a theoretical reason to expect a certain number of clusters and you have a large data set. However, I think ploting the cluster means can be misleading. Reading though Hadley Wickham’s ggplot2 book he suggest the following, to which I add a few little change.

#First we run the kmeans analysis: In brackets is the dataset used 
#(in this case I only want variables #1 through 11 hence the [1:11]) 
#and the number of clusters I want produced (in this case 4).
 
cl <- kmeans(mydata[1:11], 4)
 
#We will need to add an id variable for later use. In this case I have called it .row.
 
clustT1WIN$.row <- rownames(clustT1WIN)
 
#At this stage I also make a new variable indicating cluster membership as below.
# I have a good #idea of what my clusters will be called so 
#I gave them those names in the second line of the code. 
#Then I put it together and put the data in a form that is good for graphing.
 
cluster<-cl$cluster
 
cl.cluster<-as.vector(recode (cluster, "1='FC'; 2='FV'; 3='SO'; 4= 'OS' ", 
as.numeric.result=FALSE) )
 
clustT1WIN2<- data.frame (clustT1WIN [1:12], cl.cluster) 
molten2 <- melt(clustT1WIN2, id = c(".row", "cl.cluster") )
 
#OK set up the graph background. 
#Following the ggplot book I also create a jit parameter cause it is
 #much easier to alter this and type it in than the full code over and over again.
 
pcp_cl <- ggplot(molten2,   aes(variable, value, group = .row, colour = cl.cluster) )          
jit<- position_jitter (width = .08, height = .08)
 
#Ok first graph the cluster means.
 
pcp_cl + stat_summary(aes(group = cl.cluster), fun.y = mean,   geom = "line")
 
#Then we produce a colourful but uninformative parallel coordinates 
#plot with a bit of alpha blending and jitter.
 
pcp_cl + geom_line(position = jit, alpha = 1/5)
 
#All code up to this point is as per Wickham but 
#I also add the cluster means graph that we
 #first produced as well as changing the angle of the x axis text so it is readable.
 
pcp_cl + geom_line(position = jit, colour = alpha("black", 1/4))
 +  stat_summary(aes(group = cl.cluster), fun.y = mean,   geom = "line", size = 1.5 )
 +  facet_wrap(~ cl.cluster)+ opts(axis.text.x=theme_text(angle=-45, hjust=0) )

Relatively simple but visually very informative. Here is the final result:

The final product

Advertisements

About Philip Parker

I am a post doc in developmental and educational psychology at a Germany university. I did my PhD at the university of Sydney in stress and well-being. Most days I am hunched over a computer yelling at statistical software or responding to journal editors who seem to always want twice the amount of content but with half the words. For fun I like to read up on the latest developments in R and programming various functions.
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Displaying K-means Results.

  1. G Lau says:

    Reblogged this on Data Meaning… and commented:
    Another K Means example to learn from

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s