Skip to main content

Principal coordinate analysis and the quest for a solution to a non-existent problem

I had an interesting experience yesterday - spent a good few hours on a silly problem. You don't need to know the technicality of the analyses at all, but I'm sure you'll appreciate the humour in this.

I am frequently running principal coordinate (PCo) analyses recently. This is because I am using an interesting application of multiple regressions and PCoA on phylogeny vs phenotypic variables called the phylogenetic eigenvector regression (PVR; Diniz-Filho et al., 1998; Desdevises et al., 2003). In short, you take a phylogenetic tree of a given group of animals (or plants, or whatever your favourite group of organism), reduce the complex topology into manageable columns of numbers (by PCoA), and test these columns with some phenotypic/ecological variable of your choice for any correlations using multiple regression. Sounds pretty easy, and it is, in practice at least. You can code R to do this very efficiently, if you know the R language already.

Anyway, yesterday, I reread the protocol that I had been following for the last few weeks and realised that, PCoA on phylogenetic trees could sometimes result in negative eigenvalues - which is kind of annoying if you think of eigenvalues as representing the "amount of variation in the data explained by that axis" (Hammer and Harper 2006); a negative value indicates a negative contribution???. Supposedly, this is because of the nature of the phylogenetic distance matrix not necessarily being Euclidean distances. So I had a look at my PCoA results and realised to my horror that a lot of my values were negative. Holy shit! Do I have to go back and reanalyse?

But first, I did the sensible thing and checked if my distances were Euclidean or not (using the is.euclid() in R). Surprisingly, or unsurprisingly, my distances were Euclidean. Strange. But the values are negative...

I sat there scratching my head for a while.

I read further and noted that in cases where you get negative eigenvalues, you may need to transform your original distance matrix following some standard procedures.

I searched for the relevant references and there were several suggested transformation procedures. All of them seemed pretty straight forward. So I tried all of them in turn.

None of them worked. The negative values are still there....

I'm really stuck now. I don't know what the cause of this problem is. Is there something inherently wrong with my data? Is there some other transformation that I could still use? Is there another command in R that could potentially solve this problem - it's really common in R for you to miss a basic command - ? or, is there something fundamentally screwed up with the PCoA command in R, and I've discovered some serious programming failure?

But at this point, it's time for my coffee break. I went out for coffee with my girlfriend, complained to her about it, of course with no solution other than stress relief (which of course I am extremely grateful for her to provide me). I went back to my office, sat down in front of my computer again for more head scratching - by this time, it's more like head-banging-on-desk.

But then, as I was reviewing the R commands for PCoA, it all hit me. How could I be so stupid?

There's this thing in R that returns what's called "points" and "eig", the former being the coordinate points of each specimen along each PCo axis within the multidimensional space, and the latter being the eigenvalues associated with each axis. And "points" are returned by default. I had been looking at the "points" all this time. Of course, the points are going to include negative values because the whole ordination is done so that the points are scattered around the origin.

I turned the "eig" feature on, and R returned the eigenvalues; all positive.

I never thought I could be extremely happy with myself at the same time as being incredibly furious for making such a stupid mistake.

The moral of this story is: you learn from your mistakes.

Comments

Malacoda said…
I never make these sort of mistaks
es...
I'm newbie for principal coordinate analysis. Infact my current assign ment ot o write about principal coordinate analysis and principal component analysis.... but the problem is that i dun get enough simple reference to get an idea about these principals....can u please any useful explanation? thank you.
Raptor's Nest said…
Hi Sudharsan,

I've found it difficult to find a single good reference on principal coordinates and principal components analyses, so I had to read multiple sources, namely multivariate statistics text books. But there is a pretty good essay by Norman MacLeod (Natural History Museum, London) that explains in relatively simple terms PCA and other related methods:

http://www.palass.org/modules.php?name=palaeo_math&page=3

The great thing about this essay series is that it starts with correlations and regressions and extends the line of thought to PCA.
Matt said…
Holy crap! I found this blog post with the EXACT same problem in R. You just saved me some time.
Pauly said…
I'm also having a problem very close to this one, with the exception that i'm actually getting a non-euclidean distance matrix out of UniFrac (strictly speaking, NOT a positive semi-definite matrix). And i'm also wondering what the hell to do with those negative eigenvalues. Does anyone know what they ultimately mean?

Popular posts from this blog

R for beginners and intermediate users 3: plotting with colours

For my third post on my R tutorials for beginners and intermediate users, I shall finally touch on the subject matter that prompted me to start these tutorials - plotting with group structures in colour.

If you are familiar with R, then you may have noticed that assigning group structure is not all that straightforward. You can have a dataset that may have a column specifically for group structure such as this:

B0 B1 B2 Family
Acrocanthosaurus 0.308 -0.00329 3.28E-05 Allosauroidea
Allosaurus 0.302 -0.00285 2.04E-05 Allosauroidea
Archaeopteryx 0.142 -0.000871 2.98E-06 Aves
Bambiraptor 0.182 -0.00161 1.10E-05 Dromaeosauridae
Baryonychid 0.189 -0.00238 2.20E-05 Basal_Tetanurae
Carcharodontosaurus 0.369 -0.00502 5.82E-05 Allosauroidea
Carnotaurus 0.312 -0.00324 2.94E-05 Neoceratosauria
Ceratosaurus 0.377 -0.00522 6.07E-05 Neoceratosauria
Citipati 0.278 -0.00119 5.08E-06 Oviraptorosauria
Coelophysi…

The difference between Lion and Tiger skulls

A quick divergence from my usual dinosaurs, and I shall talk about big cats today. This is because to my greatest delight, I had discovered today a wonderful book. It is called The Felidæ of Rancho La Brea (Merriam and Stock 1932, Carnegie Institution of Washington publication, no. 422). As the title suggests it goes into details of felids from the Rancho La Brea, in particular Smilodon californicus (probably synonymous with S. fatalis), but also the American Cave Lion, Panthera atrox. The book is full of detailed descriptions, numerous measurements and beautiful figures. However, what really got me excited was, in their description and comparative anatomy of P. atrox, Merriam and Stock (1932) provide identification criteria for the Lion and Tiger, a translation of the one devised by the French palaeontologist Marcelin Boule in 1906. I have forever been looking for a set of rules for identifying lions and tigers and ultimately had to come up with a set of my own with a lot of help fro…

Top 10 scientifically important theropod dinosaurs of all time (off the top of my head)

I thought I'd do a fun post for once. And since list based articles are the norm for fun on the internet, I thought I'd do one on dinosaurs, but given that I know most about theropods, I've decided to restrict my list to theropods (...maybe in a future post, I'll do other clades).

My ranking is based mostly on scientific importance so it may not reflect awesomeness, and it is obviously subjective as to how I rank importance to science. For instance, interesting discoveries or unique palaeobiology are ranked relatively low compared to wealth of information and data or completely revolutionising our understanding of the evolution of theropods.

So here are my top 10 scientifically important theropod dinosaurs of all time (off the top of my head)

10. Megalosaurus

Being the first dinosaur to be named, Megalosaurus automatically deserves a spot on this list, but given the fragmentary nature of known fossil specimens, and being mostly useless as a meaningful source for biologi…