Skip to main content

Posts

Showing posts with the label R

R for beginners and intermediate users 4: object oriented programming

The topic of this post was mentioned in a tangential rant featured in my previous post , and I thought I might as well expand on this a bit. I'm not going to talk about programming language model or anything like that since I'm not a programmer - rather, I will treat this more like a tutorial or a "Pro-tip" kind of post. I will be focusing on an aspect of R that is often taken for granted and maybe not well known by entry-level users. That is, R is an object-oriented programming language. If you already know this, then this blog post is not for you. First, I'll list out a few interesting/useful features of R: R is interpretable R is based on vectors R can utilise functions (e.g. functional programming) R utilises objects (object-oriented programming) Like I've already mentioned, this post will focus on point 4, that R is an object-oriented programming language (or simply that R can be object-oriented if you don't want to call R a program...

Reproducibility of science and open source

I'm all for open access. I'm all for open source. I'm all for reproducible science. I'm all for replicable studies. So I like that data are shared. I like that protocols are shared. I also really appreciate it when code is shared - but only when it is appropriate. Times that I think are appropriate to share code are, for instance, when there's an entirely new method introduced - then I think it is important to release the code/script as source code, package, program etc so that it enables other scientists to reproduce your work or use it in their own analyses. However, I've noticed that, often times, shared code/script are nothing more than just the authors' workflows - in which case I don't want to see it. Everyone has a different workflow and I don't want to have to get into the heads of other people to figure out exactly what I'm looking at and what the code is doing - because commonly associated with shared workflow are uncommented...

R for beginners and intermediate users 3: plotting with colours

For my third post on my R tutorials for beginners and intermediate users, I shall finally touch on the subject matter that prompted me to start these tutorials - plotting with group structures in colour. If you are familiar with R, then you may have noticed that assigning group structure is not all that straightforward. You can have a dataset that may have a column specifically for group structure such as this: B0 B1 B2 Family Acrocanthosaurus 0.308 -0.00329 3.28E-05 Allosauroidea Allosaurus 0.302 -0.00285 2.04E-05 Allosauroidea Archaeopteryx 0.142 -0.000871 2.98E-06 Aves Bambiraptor 0.182 -0.00161 1.10E-05 Dromaeosauridae Baryonychid 0.189 -0.00238 2.20E-05 Basal_Tetanurae Carcharodontosaurus 0.369 -0.00502 5.82E-05 Allosauroidea Carnotaurus 0.312 -0.00324 2.94E-05 Neoceratosauria Ceratosaurus 0.377 -0.00522 6.07E-05 Neoceratosauria Citipati 0.278 -0.00119 5.08E-06 Ovir...

R for beginners and intermediate users 2: extracting subsets of data

For my second post on R, I think I will address how to extract subsets of data based on some selection criterion like taxon names. For instance, I have a huge dataset of morphometric variables for at least 36 species of cats (living and fossil). Sometimes I'd like to do some stats on a subset of this dataset, like all the living cats or just on the Panthera lineage species ( Panthera and Neofelis ). Till recently, I've been doing most of my dataset manipulation in Excel by filtering out certain taxa from the spreadsheet and copy-pasting to a text file, which I read into R. However, you can select subsets of data in R based on taxon names. In my dataset that I call cat , I have a column labelled Taxa which contains all my taxon names. So typing cat$Taxa would be the way to call up my taxon names. Let's say I want to extract from my dataset cat just the data for the lion Panthera leo . The associated taxon names in cat$Taxa would be Panthera_leo . So to extract that ...

R for beginners and intermediate users: reading and manipulating data

I had been preparing a comprehensive tutorial on how to plot in R ( The R Project ) with different groups differentiated in different colours, but Blogger stupidly erased my post and decided to automatically save my empty draft at that precise moment. Since I cannot reproduce the original post, I decided to break it up into a series of smaller topics. There are plenty of R resources available in various places but I found that they are frequently one of two extremes; either too basic or too advanced.  I think of myself as an intermediate user (i.e., I can comfortably handle canned packages but want a bit more control than the default settings allow) so the type of info I find are not too helpful. So I thought it would benefit others like me if I summed up some of the simple things I learned over the last year or two. As a first of such posts, I will deal with reading in and manipulating data.  These may be very simple and basic, but some of the things I wanted to do req...