Heatmaps using ranks in R and ggplot2

The following fragments of R code illustrate one of the ways of showing quantitative data through a custom heatmap. For this example, I will be using a table of results containing growth changes observed under various conditions for yeast deletion gene mutants (published available data sources). The complete table, “selected_screens_top50_etal20150410raw.csv” can be downloaded from github.

1. Prepare a dataframe from tabular data:

read.csv("selected_screens_top50_etal20150410raw.csv", stringsAsFactors=F)
names(thedata)
# [1] "X" "ORF" "Long_description" "haploids_SR7575_exp1" "haploids_SR7575_exp2"
# [6] "avghaplo" "rank1" "rank2" "Gene" "SGTC_352"
# [11] "SGTC_513" "SGTC_1789" "CDC48_YDL126C_tsq209" "CDC48_YDL126C_tsq208" "RPN4_YDL020C"
# [16] "INO2_YDR123C" "SRP101_YDR292C_damp" "HAC1_YFL031W" "OPI3_YJR073C" "PGA3_YML125C_damp"
# [21] "UBC7_YMR022W" "YNL181W_YNL181W_damp" "X4166_8_HOP_0148"

2. Define the function that will categorize the data nicely:

This function takes a vector of numerical values and returns, for each value, a category, that can be customized, via breaks and their associated labels. The important library here is scales, which does all the magic.

library(scales) #required for the rescale function
rank_pc <- function(vect, brks, labls){
    rks <- rank(vect, na.last="keep")
    #example breaks = c(-1, 0.1, 0.3, Inf), labels=c("lo", "mid", "other")
    endclass <- cut(rescale(rks), breaks = c(brks, Inf), labels=labls)
    #rescale(rks) just changes the ranks from 1:4000 to 0:1 interval
    #it becomes thus very easy to pick, with the "cut" function, 
    #the intervals we want
    return(endclass)
}

3. Massage the data to get a reduced set just for the plot:

cut_data <- lapply(thedata[,c(6, 10:13)], rank_pc, c(-1, 0.01, 0.025, 0.05), c("01", "025", "05", "other"))
cut_df <- data.frame(cut_data)
cut_df$ORF <- thedata$ORF
cut_df$Gene <- thedata$Gene
#rearrange columns
cut_df <- cut_df[, c(6, 7, 1:5)]
summary(cut_df)
# ORF Gene avghaplo SGTC_352 SGTC_513 SGTC_1789 CDC48_YDL126C_tsq209
# Length:4909 Length:4909 01 : 50 01 : 41 01 : 41 01 : 41 01 : 30
# Class :character Class :character 025 : 73 025 : 60 025 : 60 025 : 60 025 : 45
# Mode :character Mode :character 05 : 123 05 : 100 05 : 100 05 : 100 05 : 74
# other:4663 other:3812 other:3812 other:3812 other:2823
# NA's : 896 NA's : 896 NA's : 896 NA's :1937

Select the ORFs of interest (or the genes, whatever):

goodorfs <- c("YOL013C", "YLR207W", "YDR057W", "YML029W", "YBR201W",
"YIL030C", "YMR264W", "YMR022W", "YML013W", "YML012C-A", "YBR170C",
"YMR067C", "YKL213C", "YMR276W", "YBL067C", "YDL190C", "YDL020C",
"YFL031W", "YHR079C", "YOR067C", "YBR171W", "YBR283C", "YER019C-A",
"YHR193C", "YCL045C", "YKL207W", "YGL231C", "YIL027C", "YLL014W",
"YLR262C", "YLR039C", "YDR137W", "YDR127W", "YGL148W", "YPR060C",
"YBR068C", "YGL013C", "YOR153W")

goodorfs_df <- data.frame(goodorfs)
goodorfs_df$order <- row.names(goodorfsdf)
#we will need the order later to keep them exactly as needed!
cut_subset <- cut_df[which(cut_df$ORF %in% goodorfs),]
#38 observation of 7 variables
toplot <- merge.data.frame(goodorfs_df, cut_subset, by.x="goodorfs", by.y="ORF", all.x=T, all.y=F)
toplot <- toplot[order(-as.integer(toplot$order)), ]
#reversed order, because the y axis grows from top to bottom (the first that we want will be on top)
toplot_simple <- toplot[, c(3:8)]

4. Define and use a plot function with some bells and whistles:

library(grid) # for unit function
library(ggplot2)
library(reshape2)
plotheat <- function(mydf){
    #the data frame has the first column called 'Gene'
    #all the other columns are categorical
    nms <- names(mydf)
    lnms <- length(nms)
    rowsall <- length(mydf$Gene)
    colsall<- length(names(mydf[, c(2:lnms)]))
    rowIndex = rep(1:rowsall, times=colsall)
    # 1 2 3 ...
    colIndex = rep(1:colsall, each=rowsall)
    # 1 1 1 ...
    mydf.m <- cbind(rowIndex, colIndex ,melt(mydf, id=c("Gene")))
    p <- ggplot(mydf.m, aes(variable, Gene))
    p2 <- p+geom_rect(aes(x=NULL, y=NULL, xmin=colIndex-1,
    xmax=colIndex,ymin=rowIndex-1,
    ymax=rowIndex, fill=value), colour="grey")
    p2=p2+scale_x_continuous(breaks=(1:colsall)-0.5, labels=colnames(mydf[, c(2:lnms)]))
    p2=p2+scale_y_continuous(breaks=(1:rowsall)-0.5, labels=mydf[, 1])
    p2=p2+theme_bw(base_size=8, base_family = "Helvetica")+
    theme(
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        axis.ticks.length = unit(0, "cm"),
        axis.text.x = element_text(vjust=0, angle=90, size=8)
    )+ 
    scale_fill_manual(values = c("black", "grey30", "grey60", "grey90"))

return(p2)}

plotheat(toplot_simple)
ggsave("myniceheatmap.pdf", width = 18, height = 15, unit="cm")

The result, after some polishing with Inkscape:

Twitter archive for 2016

A collection of my tweets in the past two years (only a very limited number of retweets), recovered from Twitter, with the option to be found in everyone’s profile. Chronologically reversed order.

Will probably edit later to highlight what looks most important for me.

2016-December

A ‘get things done’ file manager for any Java-enabled platform, recently revived is muCommander (with pretty ghosts) mucommander.com/index.html
Teachers need to use a historical perspective when teaching mathematics, but who will teach the teachers ? https://t.co/ic7ZVEduES
Sad how Twitter thinks that the most *popular* tweets are the *best* tweets. No, they are the most *popular* and should be labeled as such.
If politicians do not understand the language of science, scientists should learn the language of politics and translate for them.
If interested in science, mostly physics, its history and science-fiction, a blog to follow is Greg Gbur’s :skullsinthestars.com
Creating a Python3 environment is painless: >conda create -n py3k python=3 anaconda …wait >source activate py3k continuum.io/blog/developer…
Maybe I know myself less than the #Google+ algorithm knows me, but no, I’m not interested that much in the Bitcoin G+ community. Thank you. pic.twitter.com/qI6l89mCk7
Music streaming services, like subscribing to ‘unlimited books’ offers have the side-effect that *no* sharing of music or books can occur.
#Spyder is the #Python equivalent of #RStudio, easy to configure, with completions, block comments and #iPython console. Slow on OSX though.
The fact that tweets most retweeted contain a higher proportion of images does not mean that adding an irrelevant image will increase impact
I like the #StarWars saga because its broad context is the struggle of democratic Rebels to replace a dictatorship, the Empire.
@ToKTeacher Naive questions: are there testable predictions the multiverse model makes that other theories do not ? Has it more reach ?
Realistic human face rendering in movies still not there in 2016. #RogueOne
@Elysee @fhollande @najatvb Le numerique, oui, mais mettre l’accent sur la methode scientifique aurait plus d’impact à long terme.
To manage strains, plasmids or reagents, an open and mature solution is @LabKey . Used every day in our laboratory labkey.org/home/project-b…
Wondering what is the ratio bot/human in my Twitter #followers. Is this ratio variable, in general ?
Some people spend tremendous amounts of energy to share knowledge, without any obvious immediate benefit for themselves. Why ?
A nice explanation about the hard to grasp concept of wave-particle duality: https://t.co/blojbN35Wa
Open source success: lots of scientific presentations use the open source mouse image by @Lemmling on @openclipart openclipart.org/detail/17558/s…
When nothing else works, or you just want to select columns in a text file, try #jEdit, a great text editor: jedit.org
Have questions about globalization ? Interesting answers and predictions: https://t.co/b2VrNVk0Ef
Some tweets should be retweeted twice: Ten rules for structuring papers @biorxivpreprint https://t.co/22kUubEASv
‘In 2016, lies, the whole lies and nothing but the lies’ Joel Stein’s title of ‘The awesome column’, Time December 2016
@TIME – sad to see your #CRISPR story failing to mention that the discovery was the result of many years of publicly funded research.
Deep learning and expert systems are not artificial intelligence but sophisticated and useful tools for complex data.
My best teachers helped me understand the value of well crafted books for learning chemistry, physics or mathematics.
Going slightly mad with GFFs, GTFs, #gffutils, #Salmon, #bedtools, etc. So many tools, so short the time …
Calling coding sequences “cDNA” is misleading in the #Ensemblgenomes, please correct README files (e.g. ftp.ensemblgenomes.org/pub/fungi/curr…)
«Tweeting is teaching, for better or for worse.» Anonymous
Sur BFM – les ‘experts’ pensent qu’apprendre une langue etrangere à l’ecole diminue le temps pour le francais… En fait, c’est le contraire
Will not click on links in tweets that begin with “Scientists have discovered a causal link between something and else.” Just #correlations.
@TIME – please refrain from using axes with arbitrary starting values for percentages. It distords the message. pic.twitter.com/mpyzWAMetF
Learnt today one of my long time interests is called epistemiology – a scary word designating the study of ways to gain knowledge.
Sun Tzu’s 2500 years old ‘The Art of War’ is probably famous because it reads like a scientific report. Concise, precise and pragmatic.
Wondering how much equivalent of Random Access Memory, #RAM, humans have. Most efficient ways to increase it ?
The innate desire of humans to teach others was probably selected during evolution. *What* we teach is a deliberate choice.
Video tutorials in which a guy explains his slides are a waste of time compared with an equivalent written document.
When you can’t see the top of the head for men in cropped pictures on the net, probability is high that they are bald.Apprendre l’anglais à l’école permettrait l’accès à des superbes livres, non-traduits, comme celui de John Stillwell: Maths and its history.

2016-November

Fantastic open-source Android note taking application: OmniNotes. play.google.com/store/apps/det… pic.twitter.com/Z7CqG7Vtn9
Read 16 out of the 17 science fiction ‘classics’ listed by lifehacker: lifehacker.com.au/2016/09/17-sci… . But why 17 ?
After lots of phylosophy today, pragmatic observations on how to touch-type on an Android device with Multiling O: aboutfoto.wordpress.com/2016/11/21/and…
Looked yesterday into a book about happiness through meditation. Scientists should all be cheerful because they never stop meditating.
People think economic progress depends on making more and better ‘products’, when in reality all progress depends on advancing ‘knowledge’.
Est-ce-que c’est normal d’aller voter pour soi-même quand on est candidat ? Surtout quand le seul enjeux est de choisir une personne ?
About impact: even the most influential books, like those by Darwin or Popper, needed people, active proponents, to spread the word.
‘Refocus journal club discussion from findings to methods and approaches.’ Great advice. mbio.asm.org https://t.co/T4b19quaIW
“It is easy to obtain confirmations, or verifications, for nearly every theory – if we look for confirmations.” stephenjaygould.org/ctrl/popper_fa…
Converting one font format to another works beautifully with #FontForge, a complex program for font design (fontforge.github.io/en-US/)
Philosophers are to science what critics are to art. Good ones are true scientists, just like good art critics are great writers. pic.twitter.com/LlLTxxeDiU
Bizarre comment je suis content de voir une place de parking libre même si je suis à pied…

2016-September-October

The historical origins of ‘seconds’ and ‘minutes’ explained by a reddit user: reddit.com/comments/5a1mh… pic.twitter.com/mVljHDNZYe
Human intervention *still* required for #Sanger DNA #sequencing. There are only two “T” bases there. pic.twitter.com/jeyaWeqQt9
“Science is what we have learned about how to keep from fooling ourselves.” Richard Feynman
Dommage qu’avec #Bayard et #jaimelirestore les e-book pour enfants sont disponibles seulement sur une app iPad. ePub sans DRM resoudrait ça.
Not satisfied by an #eBay seller, I tend to leave no evaluation. A lack of evaluations might be a robust indicator of not that great sellers
Great to see how yeast genetic screens with fungal viruses led to insights into mRNA degradation in humans (#SKIcomplex) #EESmRNA
Removing those annoying newsletters does not require any special program to assist you in clicking on the ‘unsubscribe’ link at msg end. pic.twitter.com/Ee7pw0e7M7
Many kids today believe that #Amazon is a kind of god that answers prayers done on your home computer. Next day delivery.
The person most likely to re-read this tweet some day is my future self. The same with blog entries.
The smartwatch of 1981, with perpetual calendar until … 2019. Pricely #Citizen at that time. 8940 movement. pic.twitter.com/arnlXqTyoR
Batch renaming files is done in OS X #Finder through right click on a list of selected files. Should be a default option in other OSes. pic.twitter.com/IiwU9NXG1M
Cut-throat academia leads to ‘natural selection of bad science’, claims study theguardian.com/science/2016/s…
Wondering if everyone stops dreaming about flying, with age.
Learning to program by making a game is like trying to get a driving licence with a 2 inch replica car.
Most annoying error in simple #bash scripts so far: COMMA not used in array variables.
Instead of dealing with with GEO sra files, search for the equivalent fastq files in the European Nucleotide Archive ebi.ac.uk/ena/browse
Why not allow comments on #GooglePlay for apps without having to give “stars” ? Constructive criticism could be done without a rating change
Un exemple des limites des methodes de “deep-learning”, completer l’information manquante deforme trop l’image. twitter.com/dlouapre/ https://t.co/02eyS79WFh

2016-July-August

BBC News – How to be mediocre and be happy with yourself bbc.com/news/business-…
If only R could warn stupid/ignorant/beginner users when they push “Enter” on a command that risks to run forever …
I dream of smarter command line interfaces. Imagine a shell that can correct typing mistakes and gives usage hints when things go wrong.
Only 85.5% of viewers will like this cartoon #smbc #hiveworks smbc-comics.com/comic/explosiv…
#PokemonGo illustrates in the real what Vernor Vinge described as widespread in the Hugo awarded Rainbows end book. en.wikipedia.org/wiki/Rainbows_…
“The Feynman Lectures on Physics,” The Most Popular Physics Book Ever Written, Now Online goo.gl/TRsYFg pic.twitter.com/Ifpu4Pl5cV
A beautiful large-scale study of the impact of combinations of codons on gene expression in yeast. cell.com/cell/fulltext/… https://t.co/siszOQN2Dy

2016-May-June

Ca fait plaisir de voir #Musnet en vitrine. Superbe bande déssinée de #Kickliy. pic.twitter.com/dqP88IF1Su
CRISPR-directed mitotic recombination enables genetic mapping without crosses science.sciencemag.org/content/early/…
Switching the Linux kernel from 4.4.8 to 4.6 in #Ubuntu is easy and dramatically affected idle energy load from 15 to 9W.
L”application #Android #OrangeRadio est excellente pour les radios FM par Internet. radio.orange.com/home
If #Gimp or #Inkscape look fuzzy on a high resolution screen in Windows 10, inactivate scaling in the “compatibility” tab of properties.
Switching back to #Windows after 10 years of Linux desktop use. 5 hours of battery life for Windows 10 vs 1.5 hours on #Ubuntu 16.04.
Translation dynamics of single mRNAs in live cells and neurons (Singer lab) science.sciencemag.org/content/early/…

2016-March-April

#DoubleCommander can rename a bunch of files with or without regular expressions. Works well and replaces other batch file renamers.
Trouver un trèfle à quatre feuilles n’est pas une question de chance mais de persévérance. pic.twitter.com/R2JE6CVX2E
UpSetR, our #rstats based alternative to venn diagrams has now more than 5k downloads! github.com/hms-dbmi/UpSetR pic.twitter.com/flqDLtXhrd
App stores should have separate ratings for buginess, annoyance and ergonomy of apps.
Most scientific, ‘expert’, reviews repeat the conclusions of published papers instead of discussing actual data and potential pitfalls.
Feather: A Fast On-Disk Format for Data Frames for R and Python. This may interest you @biocs. blog.rstudio.org/2016/03/29/fea… #rstats #python
Comment #SFR va garder ses clients tout en augmentant leur tarifs? En proposant des services dont on n’a pas besoin? pic.twitter.com/MXZkHjg7TX
Upgrading to the latest version of the #TiddlyWiki is simple like drag and drop: tiddlywiki.com/upgrade.html
PIN digits distribution, math puzzles and more: datagenetics.com/blog.html
Authentic mail excerpt: “X salutes you for your compendium of writings which immensely help the global society and their #descendants…”
“[Scientific] work is not done for the sake of an application. It is done for the excitement of what is found out.” R. Feynman
#SQL on delim #textfiles as databases. Ex: $ cat some.txt | q -H -t “select * from – where Type like ‘%pattrn%'” harelba.github.io/q/index.html
There is a choice between Tree Style Tab and Tab Tree for tab handling in #Firefox on a wide screen. pic.twipic.twitter.com/X4uHYKSmk4
#Inkscape and #Sozi work well together for presentations nicely rendered by any web browser: aboutfoto.wordpress.com/2016/03/28/svg…
One of the reasons why using #TeX is more cumbersome that it could be. We should not need tips to write “32°F”. twitter.com/TeXtip/status/…
Essential #genes might be essential because there was no need to duplicate them in evolution. “Essential” is different from “Important”.
After Ubuntu’s cloud storage, #Copy.com goes extinct. I hope #Hubic and #SpiderOak will continue to operate. pic.twitter.com/QQ8ZrowpSC
“The X journal invites you to view your publication performance” – I didn’t know my poor publication was in a competition for “performance”.
This is when you know you are followed by #bots. pic.twitter.com/0Pu5J1SWRb
Testing #Sozi sozi.baierouge.fr for an #Inkscape based presentation shown in a web browser. Looks promising so far.
Thank you #PlayOnLinux and #wine for the easy install of #AdobeReader on a #Ubuntu machine. Sad that #Adobe dropped support for Linux.
A useful dock in #XFCE is #DockBarX, in its panel integrated version. Install on #Ubuntu: webupd8.org/2015/09/dockba… pic.twitter.com/xO2dgSGxQ6
All JS libraries should be authored in TypeScript staltz.com/all-js-librari… via @andrestaltz
/Soft drop shadows/ is a feature that would greatly improve the aspect of #LibreOffice presentations pic.twitter.com/IDEXaP8E1s
Scientists tend to discover beautiful equations in maths and physics because the brain is rewarded by “beauty” project-syndicate.org/commentary/why…
If a 7 yo kid asks about what fractions are, tell them they were invented. #Mathematics is a mind construct and should be taught as such.
Another highly readable #font for #ebooks mobile devices is #Fontin: exljbris.com/fontin.html pic.twitter.com/tYF9pzrZ3J
Font preferences in #Thunderbird may need the “Other Writing Systems” option to be modified (not only Latin). pic.twitter.com/HamU4jTMz2
Recovered my tweets from #Twitter and found that none of the uploaded images were present in the archive, just links to them. Frustrating…
Quick list of customizations that can be done on a laptop after installing #Ubuntu 14.04. #selfpromotion aboutfoto.wordpress.com/2016/03/07/cus…
You never know for how long you can read your bought #DRMed books. Sadly, #Nook #ebooks and tablets cease operation.
#wordpressdotcom seems to have a technical bug with two-factor authentication. Sending messages to phones all over the world might be hard..
Once well implemented, #Ubuntu #Unity features are time and screen space-saving: the launcher, integrated menus and the dock.
The comment for the article about how basic maths are known and used are very interesting too. twitter.com/treycausey/sta…

2016-January-February

#Zootopie est l’equivalent moderne de #Monstresetcie. Drole, positif et bourré de blagues visuelles et textuelles. #films
Non-Long Term Support versions of #Ubuntu should be called Experimental. Best tested in a virtual environment of an LTS version.
Back to #Ubuntu after a year of using a Mac. Waiting for the next Unity release, to be able to adjust global menus. pic.twitter.com/K5qeCxsdJu
Funny how Windows needs installation of a dozen of drivers when Linux mostly just works. Unless you happen to have an M.2 SSD drive :-((
#Acer built into the #Z630 phone a perfect set of endurance and expandabiliy for 200€. Crop of a ISO 280 image, f/2. pic.twitter.com/nlVd8Xfgzv
Taking notes on #Android is made easy with #ColorNote. play.google.com/store/apps/det…. Plenty of nice ergonomy features. pic.twitter.com/IxfG8xPrHg
Savoir les dates des vacances scolaires par zone, c’est plus facile avec des calendriers officiels: data.gouv.fr/fr/datasets/le…
Hard to decide if the home (and work) laptop should run an #ArchLinux-based (#Manjaro) or a #Debian-based distribution (#Ubuntu, #Mint).
The only thing I learnt from lousy lectures delivered by boring teachers was that I should not inflict the same treatment to my students.
Code: “Good style is important because while your code only has one author, it will usually have multiple readers.” r-pkgs.had.co.nz/r.html
The #Outlook web app asks users to wait until the attachments are uploaded. Much less elegant than #Gmail.
Making interactive plots with #ggiraph and #ggplot2: davidgohel.github.io/ggiraph/introd…. Other extensions here: (ggplot2-exts.github.io/index.html).
@TiddlyWiki is wonderful as a note-keeper – sync on any platform (e.g. Hubic) and easy to use with #TiddlyFox. pic.twitter.com/X1QuhLMKpU
The system font of the old Nokia #Symbian S60 phones is one of the most readable for #ebooks. pic.twitter.com/6X7FWWgUHT
Update on my blog post about sharing folders between host and #VirtualBox machines (important -o gid=1000,uid=1000) aboutfoto.wordpress.com/2016/02/01/vir…
#Doublecommander, a file manager that can synchronize the content of two folders #LinMacWin. doublecmd.sourceforge.net pic.twitter.com/K6vgm4SgJo
Bravo pour l’enseignant qui montre aux enfants de CE1 l’edition des documents avec #LibreOffice. Ce n’est pas une initiative du ministere…
Wondering if there is any dual pane file manager that can do folder content synchronization #rsync-like.
“The criterion /upon which we base our decision to reject/ the hypothesis” -> The criterion /used to reject/ the hypothesis. #Gopen/writing
Stable #Debian that plays nice in virtualbox and allows easy install of recent versions of many apps: MX15 Linux. mepiscommunity.org/mx
The worse error of an aspiring statistician is to look at models instead of visualizing the data.
#opensuse is the first distribution tested in the last monhs that would not install on a virtualbox architecture
Qui a dit que #windowsxp est périmé? Toujours en vie et vaillant sur des machines ouvertes au public… en 2016. pic.twitter.com/4cWznMEm3K
One day, regular users might use #containers on their operating systems, to test new features without adding bloat. linuxcontainers.org/lxc/getting-st…
Visualize how different means vary together; one of the gems in “Street-fighting maths” mitpress.mit.edu/books/street-f… pic.twitter.com/akwz12U4wV
“Quotidiennitude” – une perle entendue à la radio #OMGlangue
‘Trying is the first step towards failure.’ — Homer Simpson
“…updating a blog post with new content makes more sense than creating a new post that continues the discussion.” wehart.blogspot.fr/2012/01/differ…
“[…]Much of an expert’s competence stems from having learned to avoid the most common bugs.[…]” – Marvin Minsky web.media.mit.edu/~minsky/OLPC-1…
#Conversations add-on for @mozthunderbird allows easy finding of mails and replies, better than Gmail’s thread view addons.mozilla.org/en-US/thunderb…
“Never use a complex word when a simple word will do.” cgi.duke.edu/web/sciwriting…
At last the Apple FR keyboard behaving on #Linux: setxkbmap -layout fr -model macintosh. Make it persistent by adding to .profile.
Efficient data #backup in #Linux is easier from a specific disk partition than from a /home directory overcrowded by hidden files.
Un excellent investissement pour 10 euros – “Les clés de l’#orthographe” par André Porquet. Petit livre, excellentes explications.
Not checking mail for a few work hours could be effective. Doing this is not likely to require a special service. twitter.com/hadleywickham/…
A post about how I move columns of a text file or do small but global changes fast in text files with #awk and #sed: aboutfoto.wordpress.com/2016/01/13/col…
“Improving the quality of writing actually improves the quality of thought.” Not only in science. americanscientist.org/issues/id.877,… Gopen, Swan 1990
Keeping offline copies of my #GoogleCalendar and #gmail is pretty easy (takeout.google.com/settings/takeo…).
For an #Xfce panel that overlays windows (webupd8.org/2011/05/get-mi…) use and add a False ‘disable-struts’ to panel.
#Geany is a cross-platform alternative to #TextWrangler for editing or source code print (geany.org). pic.twipic.twitter.com/IGS4FPHPWG
#Thunderbird is a more reliable IMAP mail client than #Outlook, on Mac. N=3, but still…
If you like the no-nonsense approach of #Xfce, #Manjaro is an alternative to #Xubuntu. Derived and user-friendlier than #ArchLinux.
#Archlinux is an interesting choice when you need #Linux in a virtual env. Very reasonable minimal install size. pic.twitter.com/yhQRXArjGO
‘With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.’ — John von Neumann

2015

What’s the secret of good writing? gu.com/p/4efj9/stw
En verlan, le “iPhone” devient “Phonei” (phoney = “imposteur” en anglais).
Using +scale+ when saving #ggplots, and other tips twitter.com/leejoeyk/statu…
Slide changed before I could manage to understand a #Venn diagram. Images with 3 or more Venn regions are utterly unreadable.
Un “sociologue” sur Fr info: “Les classes pop votent moins malgré le fait qu’ils ont plus de temps libre le weekend.” #abstention (????)
Programmer un bon outil en #bioinfo, et en général en #informatique, n’est possible que si le developpeur veut cet outil pour soi-même.
Trying #Rodeo, a #Python IDE that makes an #RStudio – like interface with #Electron/#Jupyter. Promising but freezes at first tab key press.
Just like a redditor said: reading tweets and reddit front page in the morning replaced the daily newspaper.
“This is a sickness in Europe,” Galbraith says: “The past is valued more than the future.” gu.com/p/4dgvf/stw
Humor: “Users can be very dumb but it’s still the developer’s fault for not making something easy enough.” from davidwalsh.name/impostor-syndr…
#Coding for one hour provides about 0.01% of a programmer’s knowledge. “Teach Yourself Programming in Ten Years” norvig.com/21-days.html
#TiddlyWiki (tiddlywiki.com) became more user-friendly with the #TiddlyFox extension (addons.mozilla.org/en-US/firefox/…). Personal or shared.
#Zotero is used by one third of scientists who answered an online survey (N=882). 101innovations.wordpress.com/2015/06/23/fir…
A cheap way to promote clicks is to hide essential content of a tweet by truncation. This is why
Dismissed #plotly as a commercial tool in the past, but it recently became open source, seems very promising plot.ly/javascript/ope…
Mail clients should focus on standards and very long term support and compatibility instead of ‘revolutionary’ user interfaces.

Reading ‘Tom Sawyer’ for my son is like rediscovering a beautiful landscape filled with delightful details.
#IGV, #ImageJ, #JavaTreeView, #LabKey – some of the most used programs for research are written in #Java. Should reconsider attitude on J.
Défi: expliquer à un enfant de 7 ans ce que c’est un “groupe nominal” et surtout à quoi ça sert de connaitre ça… #écoleprimaire
Well written and thorough explanations on the benefits of SVG for animated graphs or icons. twitter.com/newsycombinato…
@mozilla developers – please read and take into account the users. @zotero is a major tool for science. twitter.com/adam42smith/st…
Excellent explanations on stat overinterpretations. twitter.com/ClausWilke/sta…
#ElCapitan makes the pointer larger if you wiggle the mouse fast enough. No mouse pointer lost anymore!
I thought I knew English from reading on the Internet, but then I started reading a #book. pic.twitter.com/zJ7mPwaYI4
Moins d’éducation corrélé à une méfiance envers la vaccination parmi les participants à l’etude @GrippeNet. pic.twitter.com/FvFoAAFZQn
Are people more likely to attend a talk knowing that sweets and coffee is offered afterwards ? An experiment to test this would be fun.
#ElCapitan – slight lack of attention to the detail even before installing it. Polished and nice otherwise. pic.twitter.com/uaPA5IRbxF
Videos and postcasts should not be used as replacements of good old written articles.
How nice would it be if #Zotero could tell me if the article I’m viewing right now is already in the database or not…
Seeing the pixels of the glorious qHD 534 ppi screen of the #LGG3. Because water droplets work as lenses.
Identical size selection in #Gimp is easy by writing the size in the tool option dialog. Handy with #scientific figs pic.twitter.com/rMNj4RiThF
Invitations to ‘guest edit’ an issue of an obscure journal are like ‘free lunch, bring your food’.
The #Nexus5X is available but, at 479€, too expensive for the number of compromises they made (no FM, no SD card, no user-replaceble batt).

How git works? No idea! twitter.com/xkcdComic/stat…
There is a need for an #antitweet – to cancel tweets or blog posts that are notoriously made just to go viral and for that, distord reality.
For an avid #reader, replacing a #smartphone with one with a bigger screen is like changing from the paperback to #hardcover edition.
#Themartian me: ‘Nobody will like this movie, there are no fights, no action’ 7 yo: ‘But dad, he’s fighting with the entire planet’ me: …
“Motivation: It’s Not the Carrot or the Stick” by @callmemrmorris medium.com/synapse/motiva…
Current reading font: #Andada – beautiful and highly lisible. google.com/fonts/specimen…
Apparently #DavMail can be used to access goodies from an Exchange server. Good do know but setup seems cplx: davmail.sourceforge.net/thunderbirdima…
Scored an estimated 43 wpm writing speed with #MessagEase #keyboard on #Android. bit.ly/hAjG4W. Without predictive mode!
Entering the second phase of #twittership: ‘blasé’
“Great products happen when people build a product for themselves.” Eric Schmidt medium.com/cs183c-blitzsc…
Did not really feel like a hacker when copy-pasting commands from a forum in a terminal to gain root access to a phone. But it worked.
A few simple and not that simple data analysis tasks done with #R AND #Python and various packages. twitter.com/newsycombinato…
It would make sense to have Twitter as a #publicservice rather than a private company.
Wondering if there is some weekly periodicity when loosing weight (with the help of #R and #ggplot2). pic.twitter.com/3nhy55hIFk
Elephants: Large, Long-Living and Less Prone to Cancer nyti.ms/1OnYzVE
Understand why there is no FM radio in so many new devices including #GalaxyS6, #iPhone6s or #Nexus phones: npr.org/sections/allte…
L’information des affiches électorales frôle le zéro car 99% de leur surface n’est qu’un grand #sourirevotepourmoi. Et ca marche?
Laziness must correlate pretty well with the retweets to tweets ratio. Or is is lack of self-confidence?
The advanced search dialog in #Thunderbird should be promoted as the only search system, easily accesible from the toolbar.
Fast shoe lace tying – works with reasonably long laces: fieggen.com/shoelace/iankn…
Funny online tool to see what a short script does: Learning programming at scale radar.oreilly.com/2015/08/learni… via @radar
Realized today that I was following a prolific Twitter bot (#YCombinator). Impressed.
Just discovered that the Google #keyboard can assist in orthograph corrections on a previously written text. play.google.com/store/apps/det…
Le campus de Stanford a commencé à utiliser #WordPress il y a 8 ans, maintenant ils ont un accès privilegié à #Overleaf. A réfléchir.
Potentially a way to use a single image file and load various resolutions of it on a web page or on screen. twitter.com/newsycombinato…
The #nexus5x has a nice camera but non-removable #battery, no #SD card an no #FM radio. Too expensive in Europe too.

Nice discussion and study on the ergonomy of screens while driving, for lovers of #typography and #design : ustwo.com/blog/cluster/
Great advice for faster weight loss: Don’t.
My post about #Substance.io, the library that serves as the basis of Lens viewer in the #eLife journal and others aboutfoto.wordpress.com/2015/09/23/any…
CEO who raised price of old pill more than $700 calls journalist a ‘moron’ for asking why wpo.st/ePSb0
Eating less without knowing it: smaller portions in smaller plates. cochrane.org/news/portion-p…
Nobody looks smart when driving a Ferarri in a traffic jam.
Reseautage – moche mot ‘moderne’ pour ‘se creer des relations’ – entendu à la radio.
Reminder: a tweet a day keeps the bad doctor away.
@Science_Open Thank you. So, the term ‘peer review’ is recent, not the process in itself. It is an entirely different perspective.
Admiring scientists who seem to know what questions are worth exploring.
Me: Have you heard of Ebola ? Someone: Oh, yes! Me: It’s an RNA virus and I’m working on RNA. (better intro to my research?)
Still far from the time when the smartphone would automatically search for the bus schedule when you walk towards the bus station.
The humble #Nikon E 50mm f/1.8 is compact light and perfectly usable in low light for a scientific meeting. pic.twitter.com/4sfXVeZJhQ
Introducing computers in schools does not lead to improved performance for pupils, probably the opposite. phys.org/news/2015-09-t…
Wondering why online sellers would still send untracked parcels. Cheaper is definitely bad for the bussiness.
Bourré de trouvailles et d’images surprenantes, “Le tout nouveau testament”. Effectivement iconoclaste.
Possible solution pour inclure les annotations et descriptions des données dans un fichier #csv blog.datacite.org/using-yaml-fro… via @mfenner
Amusant – kibi, mébi, et autres unités de mesures des multiples des octets … twitter.com/ptilouk/status…
“Humans desire certainty, and science infrequently provides it.” sciencemag.org/content/349/62…
Ediacarans extinction was probably due to ‘new kids on the block’ organisms, currently known as animals. phy.so/360415286
There is a reason why 12, 60 or 360 are very much in use – they are all “collosally abundant numbers”. en.wikipedia.org/wiki/Colossall…

Interesting analogy between automatic elevators and Google cars – @NPR: n.pr/1eFebp9
Messy HTML that cannot be removed in a #gmail “Reply”, can be avoided by highlighting some text before clicking Reply gizmodo.com/5963768/the-be…
“Before the introduction of laboratory mice, scientists endured many difficult years with lab bears”. tinyurl.com/ncam8ca
Would love to see pre-defined shapes like arrows and Pacmans in #Inkscape.
Bioinformatics is about extracting meaning from data. Software engineering principles do not apply, mostly (liorpachter.wordpress.com/2015/07/10/the…)
Shouldn’t have invested any money in an e-book reader. An Android smartphone is a better reading device. #FBReader, #AlReader or #ZXReader.
#Unity interface of #Ubuntu fails completely for several colleague scientists. Their conclusion: “#Linux is hard to use and unfriendly”.
Diets are harmful both physically and emotionally. Slight mods of lifestyle are not. medium.com/@NoelDickover/…
Maigrir durablement c’est maigrir lentement – le propos bien documenté de “L’Anti-régime” de Michel #Desmurget (enquete-debat.fr/archives/inter…).
“Pour un sommeil reparateur … dormez sur un drap de cuivre.” Pub pour les fakirs embetes par les clous en fer.. pic.twitter.com/cGmfS2HEwA
“A key accomplishment of evolution is … piling up endless new ways of doing the same old thing”. Amazon wisdom: amazon.com/review/R1QHCXV…
#Ubuntu 14.04 with default Unity does not have a way to easily set up a time for programmed shutdown on non-laptops. Pretty annoying.

Et si MongoDB et NodeJS n’étaient pas le bon choix … Une présentation: mcfunley.com/choose-boring-… pic.twitter.com/ejHA1UtYF3
Le prince de Motordu est un grand plastique de la litterature francaise pour enfants. A deguster à partir de 6-7 ans.
For those scared of the flood of data in science, excellent reference. twitter.com/vsbuffalo/stat…
Retweeting a link gives credit to the one who found it but means more clicks for those who are interested in the original story.
For those who want a glimpse into the world of drug addiction : Junk de Burgess leslibraires.fr/livre/7123443-…
Comparing versions of text with #Meld (meldmerge.org) is both cool and efficient. Complements LaTeX and git. pic.twitter.com/2kybtGfd8f
Comments with examples about efficiency of numpy use and difficulties in using an alternative like Pypy or Cython (frama.link/numpy)
Pocket spectrophotometer paper – parallel intensity measurements of ligh filtered through an array of quantum dots: rdcu.be/dimT
“need a change of institutional culture so that, instead of being rewarded, unfeasibly lengthy CVs are discouraged.” bmj.com/content/351/bm…

The #ReaderView, switched on with the book icon in #Firefox’s address bar helps with diminishing eye strain. pic.twitter.com/fi3sAqfVCj
With ggvis and other web based solutions interactive visualizations might come at last to R (#rstudio). twitter.com/Rbloggers/stat…
Lars Arvestad @arvestad: I worry more about natural stupidity than artificial intelligence.
A colleague with prior #LaTeX experience is enthusiastic about #overleaf. Others prefer #GoogleDocs. Having choices is good.
Interesting documentary, showing that some scientists do not make much use of the scientific method. twitter.com/RichardDawkins…
“Data analysis is an easy, relaxing and wasteful part of research, especially if compared with true bench work giving true results” – yeah.
Struggling for quite some time with hard to understand and remember syntax in R – could #dplyr solve that ?
Correlations, correlations. twitter.com/TRyanGregory/s…
Basics of operations on data frames in R, including how to initialize an empty one with named columns and types: blog.datacamp.com/15-easy-soluti…
Strange that major e-ink #ebook readers do not use an #Android-based OS, since #CoolReader, #FBReader are such great reading apps. #DRM?
#RStudio knows now about code snippets: writing the dreadful *stringsAsFactors=F* becomes so much easier. support.rstudio.com/hc/en-us/artic…
Learn how to obtain best estimate for the length of the nose of the China’s emperor, from Feynman’s autobiography (frama.link/feynman)
RNA degradation intermediates give post-mortem details about how many ribosomes had once travelled along an mRNA. dx.doi.org/10.1016/j.cell…
Carefully crafted tweets with no links are like advice being given without being asked. Makes one think. Pocket literature.

Funny how RSS feeds become redundant after setting up a Twitter account.
Including Python results in a LaTeX document – did not know about Pweave. twitter.com/TeXtip/status/…
Graphics from code – optical illusion images made with LaTeX and TikZ/PSTricks – bit.ly/1SDFVuL twitter.com/DrHammersley/s…
Promising tool (overleaf.com) for shared document editing. Helps students to see how nice reports can be generated with LaTeX.
Science article data analysis suggests fraud and will likely lead to retraction. twitter.com/lakens/status/…
Explaining p value misuse, doing stats in science, also in book form (found on O’Reilly) statisticsdonewrong.com
Managing and recruiting advice from Google (2013) nyti.ms/11MuanK

Pervasiveness of quantitative data in biology raises questions about abusing statistical significance. twitter.com/generalising/s…
Showing more data and using better plots. Yes! twitter.com/KamounLab/stat…
Brilliant, even for the most reluctant students. twitter.com/Rbloggers/stat…

Tweeting more to do better science ? bit.ly/1DlDeZK
“Do the best experiments you can, and always tell the truth. That’s all.” -Sydney Brenner – nautil.us/issue/21/infor…

Column order and decimal point changes with awk and sed

Recently, I’ve been confronted with a simple problem that I usually solve in a spreadsheet application. In a text file, change the order of columns and shift from “,” to “.” as a decimal point separator. However, since I had many similarly formatted text files and did want to speed up the conversion, I searched for tools that could help me do theses simple tasks without too much hassle. Welcome awk, a programming language and tool for text processing and sed, a line-by-line editor, both available by default in MacOS X and Linux. As usual, stackexchange answers and question were extremely useful to quickly find a solution.

While I was awking happily around, I became aware of a problem that I did not expect – it uses line feed characters (LF) as line terminator and if the file comes from Window, it has carriage return (CR) and LF at the end of lines. Thus, the first step needed to get a clean file, was to remove those annoying CR (well visible in the following screenshot, on Geany, a fantastic text editor):

inputfile

This can be done with the following awk command that removes CR characters while leaving LF in the file:

awk '{ sub(/\r$/,""); print }' infile.txt >outfile.txt

The result is, as expected:

crremoved

Now, we can proceed with the following step, which is a change in column order. What I needed was to move 4th column in second position. Awk comes to the rescue here as well:

awk -F\t '{print $1,$4,$2,$3}' OFS=$'\t' infile.txt > outfile.txt

The result looks good, no more CRs and the order of columns is fine:

columnorderchanged

Finally, the numeric values that used “,” as decimal separator were not correctly interpreted by the clustering program. However, changing all the commas to dots was not very nice, because column 2 now contains useful text commas. Sed provided a very simple command to do that:

sed 's/\([0-9]\)\,\([0-9]\)/\1.\2/g' < infile.txt > outfile.txt

To understand how sed does its thing, one must be familiar with regular expressions.

In the end, the file looks exactly as I wanted it to be:

commadotchanged

There is no need to keep intermediate files and these commands can be chained using the “|” pipe operator. Alternatively, they can be put together in a small shell script.

R – using functions as parameters of other functions

The beauty of R, the programming language and environment, lays in its flexibility. Using functions as arguments of other functions is nothing new, but I did not know exactly how to do it in R. One way was mentioned in a discussion, found with Google search. Since it worked well, I’m giving here an example.

Let’s say you want to count how many values from a vector are larger or smaller than a threshold – and do this while varying the threshold values. You could imagine having two different functions ‘count_larger’ and ‘count_smaller’ but why not use the comparison function as a parameter ?

count_x<-function(data_vector, comp_function, threshold){
   comp_f <- match.fun(comp_function)
   return(length(which(comp_f(data_vector, threshold))))
}

> data <- seq(1, 100, by=2)
> count_x(data, "<", 10)
[1] 5
> count_x(data, ">", 10)
[1] 45
> count_x(data, "!=", 10)
[1] 50
> count_x(data, "==", 10)
[1] 0

Using match.fun is pretty flexible. Maybe there are other ways to do the same thing, any comment is appreciated.

Edit: to format the code, I used the javascript tool on the format my source code page.