Heatmaps using ranks in R and ggplot2

The following fragments of R code illustrate one of the ways of showing quantitative data through a custom heatmap. For this example, I will be using a table of results containing growth changes observed under various conditions for yeast deletion gene mutants (published available data sources). The complete table, “selected_screens_top50_etal20150410raw.csv” can be downloaded from github.

1. Prepare a dataframe from tabular data:

read.csv("selected_screens_top50_etal20150410raw.csv", stringsAsFactors=F)
# [1] "X" "ORF" "Long_description" "haploids_SR7575_exp1" "haploids_SR7575_exp2"
# [6] "avghaplo" "rank1" "rank2" "Gene" "SGTC_352"
# [11] "SGTC_513" "SGTC_1789" "CDC48_YDL126C_tsq209" "CDC48_YDL126C_tsq208" "RPN4_YDL020C"
# [16] "INO2_YDR123C" "SRP101_YDR292C_damp" "HAC1_YFL031W" "OPI3_YJR073C" "PGA3_YML125C_damp"
# [21] "UBC7_YMR022W" "YNL181W_YNL181W_damp" "X4166_8_HOP_0148"

2. Define the function that will categorize the data nicely:

This function takes a vector of numerical values and returns, for each value, a category, that can be customized, via breaks and their associated labels. The important library here is scales, which does all the magic.

library(scales) #required for the rescale function
rank_pc <- function(vect, brks, labls){
    rks <- rank(vect, na.last="keep")
    #example breaks = c(-1, 0.1, 0.3, Inf), labels=c("lo", "mid", "other")
    endclass <- cut(rescale(rks), breaks = c(brks, Inf), labels=labls)
    #rescale(rks) just changes the ranks from 1:4000 to 0:1 interval
    #it becomes thus very easy to pick, with the "cut" function, 
    #the intervals we want

3. Massage the data to get a reduced set just for the plot:

cut_data <- lapply(thedata[,c(6, 10:13)], rank_pc, c(-1, 0.01, 0.025, 0.05), c("01", "025", "05", "other"))
cut_df <- data.frame(cut_data)
cut_df$ORF <- thedata$ORF
cut_df$Gene <- thedata$Gene
#rearrange columns
cut_df <- cut_df[, c(6, 7, 1:5)]
# ORF Gene avghaplo SGTC_352 SGTC_513 SGTC_1789 CDC48_YDL126C_tsq209
# Length:4909 Length:4909 01 : 50 01 : 41 01 : 41 01 : 41 01 : 30
# Class :character Class :character 025 : 73 025 : 60 025 : 60 025 : 60 025 : 45
# Mode :character Mode :character 05 : 123 05 : 100 05 : 100 05 : 100 05 : 74
# other:4663 other:3812 other:3812 other:3812 other:2823
# NA's : 896 NA's : 896 NA's : 896 NA's :1937

Select the ORFs of interest (or the genes, whatever):

goodorfs <- c("YOL013C", "YLR207W", "YDR057W", "YML029W", "YBR201W",
"YIL030C", "YMR264W", "YMR022W", "YML013W", "YML012C-A", "YBR170C",
"YMR067C", "YKL213C", "YMR276W", "YBL067C", "YDL190C", "YDL020C",
"YFL031W", "YHR079C", "YOR067C", "YBR171W", "YBR283C", "YER019C-A",
"YHR193C", "YCL045C", "YKL207W", "YGL231C", "YIL027C", "YLL014W",
"YLR262C", "YLR039C", "YDR137W", "YDR127W", "YGL148W", "YPR060C",
"YBR068C", "YGL013C", "YOR153W")

goodorfs_df <- data.frame(goodorfs)
goodorfs_df$order <- row.names(goodorfsdf)
#we will need the order later to keep them exactly as needed!
cut_subset <- cut_df[which(cut_df$ORF %in% goodorfs),]
#38 observation of 7 variables
toplot <- merge.data.frame(goodorfs_df, cut_subset, by.x="goodorfs", by.y="ORF", all.x=T, all.y=F)
toplot <- toplot[order(-as.integer(toplot$order)), ]
#reversed order, because the y axis grows from top to bottom (the first that we want will be on top)
toplot_simple <- toplot[, c(3:8)]

4. Define and use a plot function with some bells and whistles:

library(grid) # for unit function
plotheat <- function(mydf){
    #the data frame has the first column called 'Gene'
    #all the other columns are categorical
    nms <- names(mydf)
    lnms <- length(nms)
    rowsall <- length(mydf$Gene)
    colsall<- length(names(mydf[, c(2:lnms)]))
    rowIndex = rep(1:rowsall, times=colsall)
    # 1 2 3 ...
    colIndex = rep(1:colsall, each=rowsall)
    # 1 1 1 ...
    mydf.m <- cbind(rowIndex, colIndex ,melt(mydf, id=c("Gene")))
    p <- ggplot(mydf.m, aes(variable, Gene))
    p2 <- p+geom_rect(aes(x=NULL, y=NULL, xmin=colIndex-1,
    ymax=rowIndex, fill=value), colour="grey")
    p2=p2+scale_x_continuous(breaks=(1:colsall)-0.5, labels=colnames(mydf[, c(2:lnms)]))
    p2=p2+scale_y_continuous(breaks=(1:rowsall)-0.5, labels=mydf[, 1])
    p2=p2+theme_bw(base_size=8, base_family = "Helvetica")+
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        axis.ticks.length = unit(0, "cm"),
        axis.text.x = element_text(vjust=0, angle=90, size=8)
    scale_fill_manual(values = c("black", "grey30", "grey60", "grey90"))


ggsave("myniceheatmap.pdf", width = 18, height = 15, unit="cm")

The result, after some polishing with Inkscape:


One year of desktop Linux


If you consider switching to Linux from a Windows or OS X machine, you will find some of my experience in this post, one year after starting to use Linux exclusively for my professional life as a scientist. With the ubiquity of web applications, the desktop environment might seem less important than a few years ago. Still, there are plenty of things that are better done on a desktop, including computation-heavy or specialized analyses. This post assumes you have some knowledge of Linux, specifically how to install applications and how to edit configuration files.

I first tested Linux on my laptop in 2001 and 2002 but was only convinced to permanently switch to using it for my personal use with the arrival of Ubuntu Dapper Drake in 2006. Using Ubuntu at home had an impact on my daily professional life. For example, our collection of reagents is maintained on a LabKey server on Ubuntu in a virtual machine. Experience with the command line definitely helped for the administration of our laboratory backup server, a Sinology network disk station. Last, but not least, the extensive experience with shell and shell scripts, was crucial for data analysis projects both for research and in teaching.

Before going further, I would like to share with you a desktop screenshot, showing my current theme in Gnome, OneStepBack. The easiest way to install the theme is to download and unzip the corresponding file in a .themes folder in your home directory. The theme can be applied with the Gnome Tweaks application.


An important choice when switching to Linux is the distribution, as it affects several aspects of the linuxian life . Based on my previous experience with Ubuntu, I chose it, because “it just works”. The size of the Ubuntu users community is very important, since many questions that I might have had were already asked and answered by someone. Many thanks to those which asked and answered questions, as well as to those sharing their experience with specific issues of application install or configuration. In addition to generic Linux documentation, one of the best sources of information on arcane Linux configuration options is the ArchWiki, a documentation portal for ArchLinux (which you can try in a user-friendly version called Manjaro).

Installation and basic configuration

It was not an easy choice to change after 18 years of Mac use. I still like the way Macs work and how the desktop looks, but the lack of a Pro solution in latest years, as well as the steady development of user-friendliness in Linux, settled the decision. Ubuntu 17.04 was thus installed on a Dell machine with plenty of RAM, an SSD disk for the system (more on this later) and two 2 Tb data disks.

Initial installation required endless updates for the preinstalled Windows system, reducing the size of the Windows partition and installation of Ubuntu. The most difficult part was the assignment of the two data disk to a zfs pool in a ‘mirrored’ configuration. The data, which are also mirrored daily to an external network disk, are thus duplicated among the two disks, without any need for further configuration. The use of ZFS is probably a headache for most users, and unfortunately I did not have enough time to do more than basic configuration.

While it might seem a minor issue for most of the system administrators who use Linux¬† servers mostly through a command line interface, the desktop environment has a major impact on Linux used through a graphical user interface. It has been my pleasure to test and work for some time with most of the major desktop environments and, strangely enough, I like them all. So, no, you won’t find here any of the “The 5 best desktop environments for Linux” non-sense. The current blog post is written under a Gnome Shell environment, which, with extensions, works mostly as I want it to work. KDE Plasma is probably my “go to” distribution if I take my laptop on a journey, since I am sure that external monitors, for example, are correctly recognized. On the workstation, Cinnamon, a GTK3-based DE from the Linux Mint creators, works flawlessly. I spent some time as well with XFCE, my distribution of choice for my 12 year old laptop, and with MATE.

Advantages and annoyances of using Linux

My major problem since using Linux has been the death of the system SSD in December last year, 8 months after buying the workstation. Maybe it was not Linux’s fault, but I tend to believe that somehow, the very frequent writing of logs and other information to the disk had a catastrophic effect. I am not a specialist, but I prefer to use now a 7200 rpm hard disk for the system. So, be cautious on what type of media your operating system lives. You might want to follow some advice from Pjotr, a Linux user from Holland.

A clear advantage in using Linux is anything linked with development, from scripting in R or Python, to installing software that you need for mapping reads to DNA sequences (where Conda and Bioconda are fantastic tools). Many available text editors are fantastic, and I have a personal preference for Kate, both for its multiple customization options, speed, and overall usability.


A major advantage in using Linux daily is the existence of excellent open-source tools and their ease of installation, mostly with a apt install program command. For Python, I’m using Spyder and for R, RStudio. The manuscript figures benefit from editing in Inkscape and the manuscripts, as well as other documents are written with LibreOffice. Bibliography is handled gracefully by Zotero and its extensions, together with Firefox. For any kind of ideas, for collecting figures from papers, and general notes, CherryTree is a fantastic open-source program. PDF reading benefits from Okular. Image analysis and editing are the realm of ImageJ (and FIJI) and Gimp. For file searches, CatFish is fantastic. For a knowledge database with indexed pdf files, Recoll is my tool of choice. Handling pdf files can be done with an extremely flexible Java tool, jPdftweak.

Switching to Linux is not easy and you have to be ready to invest some time in customizing your experience with the system. Some very strange bugs manifested themselves with the printing system, for example. From time to time, the pages printed on a local network printer have a “Top secret” watermark :-(. Fonts can also be a problem, especially when exchanging files with your colleagues, who use Macs or Windows machines. Establishing a file share server is not painless either (the best solution I found was to install and configure a Samba server on my machine).

In view of the time and effort spent with setting up and customizing my Linux machines, it would be impossible for me to say that I regret my decision. Having invested in this installation, places me between the post-purchase rationalization and IKEA effect cognitive biases. I’ll be back with experience at two years post-switch, probably in April 2019….

Conclusion: if you are already a ‘power user’ for Linux and you don’t need proprietary applications for your work, switching to Linux is a lot of work coupled with a lot of fun.

If you have any specific question or want to share your positive or negative experience, please go to the comments section. Thank you.