Custom keyboard tweaks in 2017 for Ubuntu

Linux for the desktop changes with every year going by. So does the ability to change the keyboard layout. For many years, xmodmap was the way to go, if a Linux user wanted to alter the configuration of his or her keyboard. This is what I did, a few years ago, when I slightly altered my FRench keyboard layout to invert A with Q, J with O and K with E as described in one of my previous posts: Custom keyboard layouts on Linux and the jokeft version.

If wondering why such a change would be of any benefit at all, here is the heatmap of using the virtual keyboard of an Android phone after about 3000 words typed in a mix of French, Romanian and English (as provided by SwiftKey):

Typing_Heatmap_3000words

It is easy to see that A sees more use than Q, and that K and J are much less frequently used than E or O. Now, how can we change the keyboard layout on Linux, when xmodmap is no longer an option ?

It is both easy and complicated. Easy, because we only need to alter two text files, and complicated, because the alterations require reading and understanding, at least partially, what the manager of the keyboard, xkb, does. Of great help for the tweaks I describe was a post by Michał Kosmulski (http://michal.kosmulski.org/computing/articles/custom-keyboard-layouts-xkb.html).

The recipe was only tested on Ubuntu 16.04 and Ubuntu 17.04, and if you use another Linux system, the path to the files to modify might be different. In my case, I modified the file that contains the correspondence between the physical keyboard and the characters that are sent to the system through xkb. For my french keyboard, all the different layouts are found in the fr text file in the folder:

/usr/share/X11/xkb/symbols/

Make sure to keep a copy of the original file somewhere, if something goes wrong with the tweaks. Since I was only making small changes to the layout, I modified the original file by adding the following section at the end:

// My own keyboard changes
partial alphanumeric_keys
xkb_symbols "latin9_jokeqa" {

    // Modifies the basic fr-latin9 layout to invert aq, ek, oj

    include "fr(latin9)"

    name[Group1]="French (legacy, alternative, invert aq, ek, oj)";
    
    key     { [   q,    Q,  acircumflex,    adiaeresis ] };
    key     { [   k,    K,     EuroSign,    cent ] };
    key     { [   j,    J,  ocircumflex,    odiaeresis ] };

    key     { [   a,    A,  Acircumflex,    Adiaeresis ] };
    key     { [   o,    O,  Ucircumflex,    Udiaeresis ] };
    key     { [   e,    E,  Icircumflex,    Idiaeresis ] };
};

The rows of the keyboard are A, B, C, D, from the space bar row, so that AD01 corresponds to the first character key in the fourth row (a, on the azerty French keyboard). To get an idea about what this layout change does (colored pairs correspond to the modified keys):

frmodifiesvg

The unique identifier of this layout is xkb_symbols "latin9_jokeqa"

The name[Group1] value is important and describes the keyboard layout, as it will be seen by the different GUIs for regional settings in GNOME, KDE or XFCE, here “French (legacy, alternative, invert aq, ek, oj)”.

Once the layout was defined, it needs to be referred in an xml description file, called evdev.xml, situated in a different folder:

/usr/share/X11/xkb/rules/

Here, we need to insert the information that was specific to the newly defined tweak to the keyboard layout in the following form:

<variant>
 <configItem>
 <name>latin9_jokeqa</name>
 <description>French (legacy, alternative, invert aq, ek, oj)</description>
 </configItem>
</variant>

Once the new files were saved in the original locations (as sudo), I’m not sure that the following command is required, but it could help:

sudo dpkg-reconfigure xkb-data

It helps also to reboot the system, because without a reboot, the keyboard indicator did not recognize the two versions of the keyboard.

language_keyboard

That’s it. Compared with the xmodmap method, this one is persistent and works by default in KDE, GNOME and XFCE. It is also somewhat futureproof, as it should work with the Wayland protocol.

Advertisements

PPA repositories for Ubuntu 16.04

Here are a few repositories that I found useful for recent versions of programs on Ubuntu 16.04 LTS. Using them allow one to have stable recent versions of R, LibreOffice, MS Visual Studio, NodeJS or Zotero stand-alone.

deb https://mirror.ibcp.fr/pub/CRAN/bin/linux/ubuntu xenial/ for R

deb http://ppa.launchpad.net/mozillateam/thunderbird-next/ubuntu xenial main for Thunderbird

deb http://ppa.launchpad.net/ne0sight/chrome-gnome-shell/ubuntu xenial main for GNOME extension shell

deb [arch=amd64] https://packages.microsoft.com/repos/vscode stable main for VisualStudio

deb http://ppa.launchpad.net/neovim-ppa/stable/ubuntu xenial main for NEOVIM

deb http://ppa.launchpad.net/numix/ppa/ubuntu xenial main for NUMIX

deb https://deb.nodesource.com/node_5.x xenial main for NODEJS

deb http://ppa.launchpad.net/dhor/myway/ubuntu xenial main for XNVIEW and other GRAPHICS

deb http://ppa.launchpad.net/kubuntu-ppa/backports/ubuntu xenial main for KUBUNTU

deb http://APT.spideroak.com/ubuntu-spideroak-hardy/ release restricted for SPIDEROAK

deb http://ppa.launchpad.net/libreoffice/ppa/ubuntu xenial main for LIBREOFFICE

deb http://ppa.launchpad.net/smathot/cogscinl/ubuntu xenial main for ZOTERO standalone

deb https://deb.opera.com/opera-stable/ stable non-free #Opera Browser (final releases) for OPERA

deb http://ppa.launchpad.net/inkscape.dev/stable/ubuntu xenial main for INKSCAPE

The list was obtained with a simple command line (from https://askubuntu.com/questions/148932/how-can-i-get-a-list-of-all-repositories-and-ppas-from-the-command-line-into-an):

grep -r --include '*.list' '^deb ' /etc/apt/sources.list /etc/apt/sources.list.d/

To add any of the PPAs to the list of available software sources:

sudo add-apt-repository ppa:<name of the PPA>

Installing a deb package can also automatically add a repository to the list. Some repositories are more complicated (example for Opera browser):

sudo add-apt-repository 'deb https://deb.opera.com/opera-stable/ stable non-free'
wget -qO- https://deb.opera.com/archive.key | sudo apt-key add -

 

Twitter archive for 2016

A collection of my tweets in the past two years (only a very limited number of retweets), recovered from Twitter, with the option to be found in everyone’s profile. Chronologically reversed order.

Will probably edit later to highlight what looks most important for me.

2016-December

A ‘get things done’ file manager for any Java-enabled platform, recently revived is muCommander (with pretty ghosts) mucommander.com/index.html
Teachers need to use a historical perspective when teaching mathematics, but who will teach the teachers ? https://t.co/ic7ZVEduES
Sad how Twitter thinks that the most *popular* tweets are the *best* tweets. No, they are the most *popular* and should be labeled as such.
If politicians do not understand the language of science, scientists should learn the language of politics and translate for them.
If interested in science, mostly physics, its history and science-fiction, a blog to follow is Greg Gbur’s :skullsinthestars.com
Creating a Python3 environment is painless: >conda create -n py3k python=3 anaconda …wait >source activate py3k continuum.io/blog/developer…
Maybe I know myself less than the #Google+ algorithm knows me, but no, I’m not interested that much in the Bitcoin G+ community. Thank you. pic.twitter.com/qI6l89mCk7
Music streaming services, like subscribing to ‘unlimited books’ offers have the side-effect that *no* sharing of music or books can occur.
#Spyder is the #Python equivalent of #RStudio, easy to configure, with completions, block comments and #iPython console. Slow on OSX though.
The fact that tweets most retweeted contain a higher proportion of images does not mean that adding an irrelevant image will increase impact
I like the #StarWars saga because its broad context is the struggle of democratic Rebels to replace a dictatorship, the Empire.
@ToKTeacher Naive questions: are there testable predictions the multiverse model makes that other theories do not ? Has it more reach ?
Realistic human face rendering in movies still not there in 2016. #RogueOne
@Elysee @fhollande @najatvb Le numerique, oui, mais mettre l’accent sur la methode scientifique aurait plus d’impact à long terme.
To manage strains, plasmids or reagents, an open and mature solution is @LabKey . Used every day in our laboratory labkey.org/home/project-b…
Wondering what is the ratio bot/human in my Twitter #followers. Is this ratio variable, in general ?
Some people spend tremendous amounts of energy to share knowledge, without any obvious immediate benefit for themselves. Why ?
A nice explanation about the hard to grasp concept of wave-particle duality: https://t.co/blojbN35Wa
Open source success: lots of scientific presentations use the open source mouse image by @Lemmling on @openclipart openclipart.org/detail/17558/s…
When nothing else works, or you just want to select columns in a text file, try #jEdit, a great text editor: jedit.org
Have questions about globalization ? Interesting answers and predictions: https://t.co/b2VrNVk0Ef
Some tweets should be retweeted twice: Ten rules for structuring papers @biorxivpreprint https://t.co/22kUubEASv
‘In 2016, lies, the whole lies and nothing but the lies’ Joel Stein’s title of ‘The awesome column’, Time December 2016
@TIME – sad to see your #CRISPR story failing to mention that the discovery was the result of many years of publicly funded research.
Deep learning and expert systems are not artificial intelligence but sophisticated and useful tools for complex data.
My best teachers helped me understand the value of well crafted books for learning chemistry, physics or mathematics.
Going slightly mad with GFFs, GTFs, #gffutils, #Salmon, #bedtools, etc. So many tools, so short the time …
Calling coding sequences “cDNA” is misleading in the #Ensemblgenomes, please correct README files (e.g. ftp.ensemblgenomes.org/pub/fungi/curr…)
«Tweeting is teaching, for better or for worse.» Anonymous
Sur BFM – les ‘experts’ pensent qu’apprendre une langue etrangere à l’ecole diminue le temps pour le francais… En fait, c’est le contraire
Will not click on links in tweets that begin with “Scientists have discovered a causal link between something and else.” Just #correlations.
@TIME – please refrain from using axes with arbitrary starting values for percentages. It distords the message. pic.twitter.com/mpyzWAMetF
Learnt today one of my long time interests is called epistemiology – a scary word designating the study of ways to gain knowledge.
Sun Tzu’s 2500 years old ‘The Art of War’ is probably famous because it reads like a scientific report. Concise, precise and pragmatic.
Wondering how much equivalent of Random Access Memory, #RAM, humans have. Most efficient ways to increase it ?
The innate desire of humans to teach others was probably selected during evolution. *What* we teach is a deliberate choice.
Video tutorials in which a guy explains his slides are a waste of time compared with an equivalent written document.
When you can’t see the top of the head for men in cropped pictures on the net, probability is high that they are bald.Apprendre l’anglais à l’école permettrait l’accès à des superbes livres, non-traduits, comme celui de John Stillwell: Maths and its history.

2016-November

Fantastic open-source Android note taking application: OmniNotes. play.google.com/store/apps/det… pic.twitter.com/Z7CqG7Vtn9
Read 16 out of the 17 science fiction ‘classics’ listed by lifehacker: lifehacker.com.au/2016/09/17-sci… . But why 17 ?
After lots of phylosophy today, pragmatic observations on how to touch-type on an Android device with Multiling O: aboutfoto.wordpress.com/2016/11/21/and…
Looked yesterday into a book about happiness through meditation. Scientists should all be cheerful because they never stop meditating.
People think economic progress depends on making more and better ‘products’, when in reality all progress depends on advancing ‘knowledge’.
Est-ce-que c’est normal d’aller voter pour soi-même quand on est candidat ? Surtout quand le seul enjeux est de choisir une personne ?
About impact: even the most influential books, like those by Darwin or Popper, needed people, active proponents, to spread the word.
‘Refocus journal club discussion from findings to methods and approaches.’ Great advice. mbio.asm.org https://t.co/T4b19quaIW
“It is easy to obtain confirmations, or verifications, for nearly every theory – if we look for confirmations.” stephenjaygould.org/ctrl/popper_fa…
Converting one font format to another works beautifully with #FontForge, a complex program for font design (fontforge.github.io/en-US/)
Philosophers are to science what critics are to art. Good ones are true scientists, just like good art critics are great writers. pic.twitter.com/LlLTxxeDiU
Bizarre comment je suis content de voir une place de parking libre même si je suis à pied…

2016-September-October

The historical origins of ‘seconds’ and ‘minutes’ explained by a reddit user: reddit.com/comments/5a1mh… pic.twitter.com/mVljHDNZYe
Human intervention *still* required for #Sanger DNA #sequencing. There are only two “T” bases there. pic.twitter.com/jeyaWeqQt9
“Science is what we have learned about how to keep from fooling ourselves.” Richard Feynman
Dommage qu’avec #Bayard et #jaimelirestore les e-book pour enfants sont disponibles seulement sur une app iPad. ePub sans DRM resoudrait ça.
Not satisfied by an #eBay seller, I tend to leave no evaluation. A lack of evaluations might be a robust indicator of not that great sellers
Great to see how yeast genetic screens with fungal viruses led to insights into mRNA degradation in humans (#SKIcomplex) #EESmRNA
Removing those annoying newsletters does not require any special program to assist you in clicking on the ‘unsubscribe’ link at msg end. pic.twitter.com/Ee7pw0e7M7
Many kids today believe that #Amazon is a kind of god that answers prayers done on your home computer. Next day delivery.
The person most likely to re-read this tweet some day is my future self. The same with blog entries.
The smartwatch of 1981, with perpetual calendar until … 2019. Pricely #Citizen at that time. 8940 movement. pic.twitter.com/arnlXqTyoR
Batch renaming files is done in OS X #Finder through right click on a list of selected files. Should be a default option in other OSes. pic.twitter.com/IiwU9NXG1M
Cut-throat academia leads to ‘natural selection of bad science’, claims study theguardian.com/science/2016/s…
Wondering if everyone stops dreaming about flying, with age.
Learning to program by making a game is like trying to get a driving licence with a 2 inch replica car.
Most annoying error in simple #bash scripts so far: COMMA not used in array variables.
Instead of dealing with with GEO sra files, search for the equivalent fastq files in the European Nucleotide Archive ebi.ac.uk/ena/browse
Why not allow comments on #GooglePlay for apps without having to give “stars” ? Constructive criticism could be done without a rating change
Un exemple des limites des methodes de “deep-learning”, completer l’information manquante deforme trop l’image. twitter.com/dlouapre/ https://t.co/02eyS79WFh

2016-July-August

BBC News – How to be mediocre and be happy with yourself bbc.com/news/business-…
If only R could warn stupid/ignorant/beginner users when they push “Enter” on a command that risks to run forever …
I dream of smarter command line interfaces. Imagine a shell that can correct typing mistakes and gives usage hints when things go wrong.
Only 85.5% of viewers will like this cartoon #smbc #hiveworks smbc-comics.com/comic/explosiv…
#PokemonGo illustrates in the real what Vernor Vinge described as widespread in the Hugo awarded Rainbows end book. en.wikipedia.org/wiki/Rainbows_…
“The Feynman Lectures on Physics,” The Most Popular Physics Book Ever Written, Now Online goo.gl/TRsYFg pic.twitter.com/Ifpu4Pl5cV
A beautiful large-scale study of the impact of combinations of codons on gene expression in yeast. cell.com/cell/fulltext/… https://t.co/siszOQN2Dy

2016-May-June

Ca fait plaisir de voir #Musnet en vitrine. Superbe bande déssinée de #Kickliy. pic.twitter.com/dqP88IF1Su
CRISPR-directed mitotic recombination enables genetic mapping without crosses science.sciencemag.org/content/early/…
Switching the Linux kernel from 4.4.8 to 4.6 in #Ubuntu is easy and dramatically affected idle energy load from 15 to 9W.
L”application #Android #OrangeRadio est excellente pour les radios FM par Internet. radio.orange.com/home
If #Gimp or #Inkscape look fuzzy on a high resolution screen in Windows 10, inactivate scaling in the “compatibility” tab of properties.
Switching back to #Windows after 10 years of Linux desktop use. 5 hours of battery life for Windows 10 vs 1.5 hours on #Ubuntu 16.04.
Translation dynamics of single mRNAs in live cells and neurons (Singer lab) science.sciencemag.org/content/early/…

2016-March-April

#DoubleCommander can rename a bunch of files with or without regular expressions. Works well and replaces other batch file renamers.
Trouver un trèfle à quatre feuilles n’est pas une question de chance mais de persévérance. pic.twitter.com/R2JE6CVX2E
UpSetR, our #rstats based alternative to venn diagrams has now more than 5k downloads! github.com/hms-dbmi/UpSetR pic.twitter.com/flqDLtXhrd
App stores should have separate ratings for buginess, annoyance and ergonomy of apps.
Most scientific, ‘expert’, reviews repeat the conclusions of published papers instead of discussing actual data and potential pitfalls.
Feather: A Fast On-Disk Format for Data Frames for R and Python. This may interest you @biocs. blog.rstudio.org/2016/03/29/fea… #rstats #python
Comment #SFR va garder ses clients tout en augmentant leur tarifs? En proposant des services dont on n’a pas besoin? pic.twitter.com/MXZkHjg7TX
Upgrading to the latest version of the #TiddlyWiki is simple like drag and drop: tiddlywiki.com/upgrade.html
PIN digits distribution, math puzzles and more: datagenetics.com/blog.html
Authentic mail excerpt: “X salutes you for your compendium of writings which immensely help the global society and their #descendants…”
“[Scientific] work is not done for the sake of an application. It is done for the excitement of what is found out.” R. Feynman
#SQL on delim #textfiles as databases. Ex: $ cat some.txt | q -H -t “select * from – where Type like ‘%pattrn%'” harelba.github.io/q/index.html
There is a choice between Tree Style Tab and Tab Tree for tab handling in #Firefox on a wide screen. pic.twipic.twitter.com/X4uHYKSmk4
#Inkscape and #Sozi work well together for presentations nicely rendered by any web browser: aboutfoto.wordpress.com/2016/03/28/svg…
One of the reasons why using #TeX is more cumbersome that it could be. We should not need tips to write “32°F”. twitter.com/TeXtip/status/…
Essential #genes might be essential because there was no need to duplicate them in evolution. “Essential” is different from “Important”.
After Ubuntu’s cloud storage, #Copy.com goes extinct. I hope #Hubic and #SpiderOak will continue to operate. pic.twitter.com/QQ8ZrowpSC
“The X journal invites you to view your publication performance” – I didn’t know my poor publication was in a competition for “performance”.
This is when you know you are followed by #bots. pic.twitter.com/0Pu5J1SWRb
Testing #Sozi sozi.baierouge.fr for an #Inkscape based presentation shown in a web browser. Looks promising so far.
Thank you #PlayOnLinux and #wine for the easy install of #AdobeReader on a #Ubuntu machine. Sad that #Adobe dropped support for Linux.
A useful dock in #XFCE is #DockBarX, in its panel integrated version. Install on #Ubuntu: webupd8.org/2015/09/dockba… pic.twitter.com/xO2dgSGxQ6
All JS libraries should be authored in TypeScript staltz.com/all-js-librari… via @andrestaltz
/Soft drop shadows/ is a feature that would greatly improve the aspect of #LibreOffice presentations pic.twitter.com/IDEXaP8E1s
Scientists tend to discover beautiful equations in maths and physics because the brain is rewarded by “beauty” project-syndicate.org/commentary/why…
If a 7 yo kid asks about what fractions are, tell them they were invented. #Mathematics is a mind construct and should be taught as such.
Another highly readable #font for #ebooks mobile devices is #Fontin: exljbris.com/fontin.html pic.twitter.com/tYF9pzrZ3J
Font preferences in #Thunderbird may need the “Other Writing Systems” option to be modified (not only Latin). pic.twitter.com/HamU4jTMz2
Recovered my tweets from #Twitter and found that none of the uploaded images were present in the archive, just links to them. Frustrating…
Quick list of customizations that can be done on a laptop after installing #Ubuntu 14.04. #selfpromotion aboutfoto.wordpress.com/2016/03/07/cus…
You never know for how long you can read your bought #DRMed books. Sadly, #Nook #ebooks and tablets cease operation.
#wordpressdotcom seems to have a technical bug with two-factor authentication. Sending messages to phones all over the world might be hard..
Once well implemented, #Ubuntu #Unity features are time and screen space-saving: the launcher, integrated menus and the dock.
The comment for the article about how basic maths are known and used are very interesting too. twitter.com/treycausey/sta…

2016-January-February

#Zootopie est l’equivalent moderne de #Monstresetcie. Drole, positif et bourré de blagues visuelles et textuelles. #films
Non-Long Term Support versions of #Ubuntu should be called Experimental. Best tested in a virtual environment of an LTS version.
Back to #Ubuntu after a year of using a Mac. Waiting for the next Unity release, to be able to adjust global menus. pic.twitter.com/K5qeCxsdJu
Funny how Windows needs installation of a dozen of drivers when Linux mostly just works. Unless you happen to have an M.2 SSD drive :-((
#Acer built into the #Z630 phone a perfect set of endurance and expandabiliy for 200€. Crop of a ISO 280 image, f/2. pic.twitter.com/nlVd8Xfgzv
Taking notes on #Android is made easy with #ColorNote. play.google.com/store/apps/det…. Plenty of nice ergonomy features. pic.twitter.com/IxfG8xPrHg
Savoir les dates des vacances scolaires par zone, c’est plus facile avec des calendriers officiels: data.gouv.fr/fr/datasets/le…
Hard to decide if the home (and work) laptop should run an #ArchLinux-based (#Manjaro) or a #Debian-based distribution (#Ubuntu, #Mint).
The only thing I learnt from lousy lectures delivered by boring teachers was that I should not inflict the same treatment to my students.
Code: “Good style is important because while your code only has one author, it will usually have multiple readers.” r-pkgs.had.co.nz/r.html
The #Outlook web app asks users to wait until the attachments are uploaded. Much less elegant than #Gmail.
Making interactive plots with #ggiraph and #ggplot2: davidgohel.github.io/ggiraph/introd…. Other extensions here: (ggplot2-exts.github.io/index.html).
@TiddlyWiki is wonderful as a note-keeper – sync on any platform (e.g. Hubic) and easy to use with #TiddlyFox. pic.twitter.com/X1QuhLMKpU
The system font of the old Nokia #Symbian S60 phones is one of the most readable for #ebooks. pic.twitter.com/6X7FWWgUHT
Update on my blog post about sharing folders between host and #VirtualBox machines (important -o gid=1000,uid=1000) aboutfoto.wordpress.com/2016/02/01/vir…
#Doublecommander, a file manager that can synchronize the content of two folders #LinMacWin. doublecmd.sourceforge.net pic.twitter.com/K6vgm4SgJo
Bravo pour l’enseignant qui montre aux enfants de CE1 l’edition des documents avec #LibreOffice. Ce n’est pas une initiative du ministere…
Wondering if there is any dual pane file manager that can do folder content synchronization #rsync-like.
“The criterion /upon which we base our decision to reject/ the hypothesis” -> The criterion /used to reject/ the hypothesis. #Gopen/writing
Stable #Debian that plays nice in virtualbox and allows easy install of recent versions of many apps: MX15 Linux. mepiscommunity.org/mx
The worse error of an aspiring statistician is to look at models instead of visualizing the data.
#opensuse is the first distribution tested in the last monhs that would not install on a virtualbox architecture
Qui a dit que #windowsxp est périmé? Toujours en vie et vaillant sur des machines ouvertes au public… en 2016. pic.twitter.com/4cWznMEm3K
One day, regular users might use #containers on their operating systems, to test new features without adding bloat. linuxcontainers.org/lxc/getting-st…
Visualize how different means vary together; one of the gems in “Street-fighting maths” mitpress.mit.edu/books/street-f… pic.twitter.com/akwz12U4wV
“Quotidiennitude” – une perle entendue à la radio #OMGlangue
‘Trying is the first step towards failure.’ — Homer Simpson
“…updating a blog post with new content makes more sense than creating a new post that continues the discussion.” wehart.blogspot.fr/2012/01/differ…
“[…]Much of an expert’s competence stems from having learned to avoid the most common bugs.[…]” – Marvin Minsky web.media.mit.edu/~minsky/OLPC-1…
#Conversations add-on for @mozthunderbird allows easy finding of mails and replies, better than Gmail’s thread view addons.mozilla.org/en-US/thunderb…
“Never use a complex word when a simple word will do.” cgi.duke.edu/web/sciwriting…
At last the Apple FR keyboard behaving on #Linux: setxkbmap -layout fr -model macintosh. Make it persistent by adding to .profile.
Efficient data #backup in #Linux is easier from a specific disk partition than from a /home directory overcrowded by hidden files.
Un excellent investissement pour 10 euros – “Les clés de l’#orthographe” par André Porquet. Petit livre, excellentes explications.
Not checking mail for a few work hours could be effective. Doing this is not likely to require a special service. twitter.com/hadleywickham/…
A post about how I move columns of a text file or do small but global changes fast in text files with #awk and #sed: aboutfoto.wordpress.com/2016/01/13/col…
“Improving the quality of writing actually improves the quality of thought.” Not only in science. americanscientist.org/issues/id.877,… Gopen, Swan 1990
Keeping offline copies of my #GoogleCalendar and #gmail is pretty easy (takeout.google.com/settings/takeo…).
For an #Xfce panel that overlays windows (webupd8.org/2011/05/get-mi…) use and add a False ‘disable-struts’ to panel.
#Geany is a cross-platform alternative to #TextWrangler for editing or source code print (geany.org). pic.twipic.twitter.com/IGS4FPHPWG
#Thunderbird is a more reliable IMAP mail client than #Outlook, on Mac. N=3, but still…
If you like the no-nonsense approach of #Xfce, #Manjaro is an alternative to #Xubuntu. Derived and user-friendlier than #ArchLinux.
#Archlinux is an interesting choice when you need #Linux in a virtual env. Very reasonable minimal install size. pic.twitter.com/yhQRXArjGO
‘With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.’ — John von Neumann

2015

What’s the secret of good writing? gu.com/p/4efj9/stw
En verlan, le “iPhone” devient “Phonei” (phoney = “imposteur” en anglais).
Using +scale+ when saving #ggplots, and other tips twitter.com/leejoeyk/statu…
Slide changed before I could manage to understand a #Venn diagram. Images with 3 or more Venn regions are utterly unreadable.
Un “sociologue” sur Fr info: “Les classes pop votent moins malgré le fait qu’ils ont plus de temps libre le weekend.” #abstention (????)
Programmer un bon outil en #bioinfo, et en général en #informatique, n’est possible que si le developpeur veut cet outil pour soi-même.
Trying #Rodeo, a #Python IDE that makes an #RStudio – like interface with #Electron/#Jupyter. Promising but freezes at first tab key press.
Just like a redditor said: reading tweets and reddit front page in the morning replaced the daily newspaper.
“This is a sickness in Europe,” Galbraith says: “The past is valued more than the future.” gu.com/p/4dgvf/stw
Humor: “Users can be very dumb but it’s still the developer’s fault for not making something easy enough.” from davidwalsh.name/impostor-syndr…
#Coding for one hour provides about 0.01% of a programmer’s knowledge. “Teach Yourself Programming in Ten Years” norvig.com/21-days.html
#TiddlyWiki (tiddlywiki.com) became more user-friendly with the #TiddlyFox extension (addons.mozilla.org/en-US/firefox/…). Personal or shared.
#Zotero is used by one third of scientists who answered an online survey (N=882). 101innovations.wordpress.com/2015/06/23/fir…
A cheap way to promote clicks is to hide essential content of a tweet by truncation. This is why
Dismissed #plotly as a commercial tool in the past, but it recently became open source, seems very promising plot.ly/javascript/ope…
Mail clients should focus on standards and very long term support and compatibility instead of ‘revolutionary’ user interfaces.

Reading ‘Tom Sawyer’ for my son is like rediscovering a beautiful landscape filled with delightful details.
#IGV, #ImageJ, #JavaTreeView, #LabKey – some of the most used programs for research are written in #Java. Should reconsider attitude on J.
Défi: expliquer à un enfant de 7 ans ce que c’est un “groupe nominal” et surtout à quoi ça sert de connaitre ça… #écoleprimaire
Well written and thorough explanations on the benefits of SVG for animated graphs or icons. twitter.com/newsycombinato…
@mozilla developers – please read and take into account the users. @zotero is a major tool for science. twitter.com/adam42smith/st…
Excellent explanations on stat overinterpretations. twitter.com/ClausWilke/sta…
#ElCapitan makes the pointer larger if you wiggle the mouse fast enough. No mouse pointer lost anymore!
I thought I knew English from reading on the Internet, but then I started reading a #book. pic.twitter.com/zJ7mPwaYI4
Moins d’éducation corrélé à une méfiance envers la vaccination parmi les participants à l’etude @GrippeNet. pic.twitter.com/FvFoAAFZQn
Are people more likely to attend a talk knowing that sweets and coffee is offered afterwards ? An experiment to test this would be fun.
#ElCapitan – slight lack of attention to the detail even before installing it. Polished and nice otherwise. pic.twitter.com/uaPA5IRbxF
Videos and postcasts should not be used as replacements of good old written articles.
How nice would it be if #Zotero could tell me if the article I’m viewing right now is already in the database or not…
Seeing the pixels of the glorious qHD 534 ppi screen of the #LGG3. Because water droplets work as lenses.
Identical size selection in #Gimp is easy by writing the size in the tool option dialog. Handy with #scientific figs pic.twitter.com/rMNj4RiThF
Invitations to ‘guest edit’ an issue of an obscure journal are like ‘free lunch, bring your food’.
The #Nexus5X is available but, at 479€, too expensive for the number of compromises they made (no FM, no SD card, no user-replaceble batt).

How git works? No idea! twitter.com/xkcdComic/stat…
There is a need for an #antitweet – to cancel tweets or blog posts that are notoriously made just to go viral and for that, distord reality.
For an avid #reader, replacing a #smartphone with one with a bigger screen is like changing from the paperback to #hardcover edition.
#Themartian me: ‘Nobody will like this movie, there are no fights, no action’ 7 yo: ‘But dad, he’s fighting with the entire planet’ me: …
“Motivation: It’s Not the Carrot or the Stick” by @callmemrmorris medium.com/synapse/motiva…
Current reading font: #Andada – beautiful and highly lisible. google.com/fonts/specimen…
Apparently #DavMail can be used to access goodies from an Exchange server. Good do know but setup seems cplx: davmail.sourceforge.net/thunderbirdima…
Scored an estimated 43 wpm writing speed with #MessagEase #keyboard on #Android. bit.ly/hAjG4W. Without predictive mode!
Entering the second phase of #twittership: ‘blasé’
“Great products happen when people build a product for themselves.” Eric Schmidt medium.com/cs183c-blitzsc…
Did not really feel like a hacker when copy-pasting commands from a forum in a terminal to gain root access to a phone. But it worked.
A few simple and not that simple data analysis tasks done with #R AND #Python and various packages. twitter.com/newsycombinato…
It would make sense to have Twitter as a #publicservice rather than a private company.
Wondering if there is some weekly periodicity when loosing weight (with the help of #R and #ggplot2). pic.twitter.com/3nhy55hIFk
Elephants: Large, Long-Living and Less Prone to Cancer nyti.ms/1OnYzVE
Understand why there is no FM radio in so many new devices including #GalaxyS6, #iPhone6s or #Nexus phones: npr.org/sections/allte…
L’information des affiches électorales frôle le zéro car 99% de leur surface n’est qu’un grand #sourirevotepourmoi. Et ca marche?
Laziness must correlate pretty well with the retweets to tweets ratio. Or is is lack of self-confidence?
The advanced search dialog in #Thunderbird should be promoted as the only search system, easily accesible from the toolbar.
Fast shoe lace tying – works with reasonably long laces: fieggen.com/shoelace/iankn…
Funny online tool to see what a short script does: Learning programming at scale radar.oreilly.com/2015/08/learni… via @radar
Realized today that I was following a prolific Twitter bot (#YCombinator). Impressed.
Just discovered that the Google #keyboard can assist in orthograph corrections on a previously written text. play.google.com/store/apps/det…
Le campus de Stanford a commencé à utiliser #WordPress il y a 8 ans, maintenant ils ont un accès privilegié à #Overleaf. A réfléchir.
Potentially a way to use a single image file and load various resolutions of it on a web page or on screen. twitter.com/newsycombinato…
The #nexus5x has a nice camera but non-removable #battery, no #SD card an no #FM radio. Too expensive in Europe too.

Nice discussion and study on the ergonomy of screens while driving, for lovers of #typography and #design : ustwo.com/blog/cluster/
Great advice for faster weight loss: Don’t.
My post about #Substance.io, the library that serves as the basis of Lens viewer in the #eLife journal and others aboutfoto.wordpress.com/2015/09/23/any…
CEO who raised price of old pill more than $700 calls journalist a ‘moron’ for asking why wpo.st/ePSb0
Eating less without knowing it: smaller portions in smaller plates. cochrane.org/news/portion-p…
Nobody looks smart when driving a Ferarri in a traffic jam.
Reseautage – moche mot ‘moderne’ pour ‘se creer des relations’ – entendu à la radio.
Reminder: a tweet a day keeps the bad doctor away.
@Science_Open Thank you. So, the term ‘peer review’ is recent, not the process in itself. It is an entirely different perspective.
Admiring scientists who seem to know what questions are worth exploring.
Me: Have you heard of Ebola ? Someone: Oh, yes! Me: It’s an RNA virus and I’m working on RNA. (better intro to my research?)
Still far from the time when the smartphone would automatically search for the bus schedule when you walk towards the bus station.
The humble #Nikon E 50mm f/1.8 is compact light and perfectly usable in low light for a scientific meeting. pic.twitter.com/4sfXVeZJhQ
Introducing computers in schools does not lead to improved performance for pupils, probably the opposite. phys.org/news/2015-09-t…
Wondering why online sellers would still send untracked parcels. Cheaper is definitely bad for the bussiness.
Bourré de trouvailles et d’images surprenantes, “Le tout nouveau testament”. Effectivement iconoclaste.
Possible solution pour inclure les annotations et descriptions des données dans un fichier #csv blog.datacite.org/using-yaml-fro… via @mfenner
Amusant – kibi, mébi, et autres unités de mesures des multiples des octets … twitter.com/ptilouk/status…
“Humans desire certainty, and science infrequently provides it.” sciencemag.org/content/349/62…
Ediacarans extinction was probably due to ‘new kids on the block’ organisms, currently known as animals. phy.so/360415286
There is a reason why 12, 60 or 360 are very much in use – they are all “collosally abundant numbers”. en.wikipedia.org/wiki/Colossall…

Interesting analogy between automatic elevators and Google cars – @NPR: n.pr/1eFebp9
Messy HTML that cannot be removed in a #gmail “Reply”, can be avoided by highlighting some text before clicking Reply gizmodo.com/5963768/the-be…
“Before the introduction of laboratory mice, scientists endured many difficult years with lab bears”. tinyurl.com/ncam8ca
Would love to see pre-defined shapes like arrows and Pacmans in #Inkscape.
Bioinformatics is about extracting meaning from data. Software engineering principles do not apply, mostly (liorpachter.wordpress.com/2015/07/10/the…)
Shouldn’t have invested any money in an e-book reader. An Android smartphone is a better reading device. #FBReader, #AlReader or #ZXReader.
#Unity interface of #Ubuntu fails completely for several colleague scientists. Their conclusion: “#Linux is hard to use and unfriendly”.
Diets are harmful both physically and emotionally. Slight mods of lifestyle are not. medium.com/@NoelDickover/…
Maigrir durablement c’est maigrir lentement – le propos bien documenté de “L’Anti-régime” de Michel #Desmurget (enquete-debat.fr/archives/inter…).
“Pour un sommeil reparateur … dormez sur un drap de cuivre.” Pub pour les fakirs embetes par les clous en fer.. pic.twitter.com/cGmfS2HEwA
“A key accomplishment of evolution is … piling up endless new ways of doing the same old thing”. Amazon wisdom: amazon.com/review/R1QHCXV…
#Ubuntu 14.04 with default Unity does not have a way to easily set up a time for programmed shutdown on non-laptops. Pretty annoying.

Et si MongoDB et NodeJS n’étaient pas le bon choix … Une présentation: mcfunley.com/choose-boring-… pic.twitter.com/ejHA1UtYF3
Le prince de Motordu est un grand plastique de la litterature francaise pour enfants. A deguster à partir de 6-7 ans.
For those scared of the flood of data in science, excellent reference. twitter.com/vsbuffalo/stat…
Retweeting a link gives credit to the one who found it but means more clicks for those who are interested in the original story.
For those who want a glimpse into the world of drug addiction : Junk de Burgess leslibraires.fr/livre/7123443-…
Comparing versions of text with #Meld (meldmerge.org) is both cool and efficient. Complements LaTeX and git. pic.twitter.com/2kybtGfd8f
Comments with examples about efficiency of numpy use and difficulties in using an alternative like Pypy or Cython (frama.link/numpy)
Pocket spectrophotometer paper – parallel intensity measurements of ligh filtered through an array of quantum dots: rdcu.be/dimT
“need a change of institutional culture so that, instead of being rewarded, unfeasibly lengthy CVs are discouraged.” bmj.com/content/351/bm…

The #ReaderView, switched on with the book icon in #Firefox’s address bar helps with diminishing eye strain. pic.twitter.com/fi3sAqfVCj
With ggvis and other web based solutions interactive visualizations might come at last to R (#rstudio). twitter.com/Rbloggers/stat…
Lars Arvestad @arvestad: I worry more about natural stupidity than artificial intelligence.
A colleague with prior #LaTeX experience is enthusiastic about #overleaf. Others prefer #GoogleDocs. Having choices is good.
Interesting documentary, showing that some scientists do not make much use of the scientific method. twitter.com/RichardDawkins…
“Data analysis is an easy, relaxing and wasteful part of research, especially if compared with true bench work giving true results” – yeah.
Struggling for quite some time with hard to understand and remember syntax in R – could #dplyr solve that ?
Correlations, correlations. twitter.com/TRyanGregory/s…
Basics of operations on data frames in R, including how to initialize an empty one with named columns and types: blog.datacamp.com/15-easy-soluti…
Strange that major e-ink #ebook readers do not use an #Android-based OS, since #CoolReader, #FBReader are such great reading apps. #DRM?
#RStudio knows now about code snippets: writing the dreadful *stringsAsFactors=F* becomes so much easier. support.rstudio.com/hc/en-us/artic…
Learn how to obtain best estimate for the length of the nose of the China’s emperor, from Feynman’s autobiography (frama.link/feynman)
RNA degradation intermediates give post-mortem details about how many ribosomes had once travelled along an mRNA. dx.doi.org/10.1016/j.cell…
Carefully crafted tweets with no links are like advice being given without being asked. Makes one think. Pocket literature.

Funny how RSS feeds become redundant after setting up a Twitter account.
Including Python results in a LaTeX document – did not know about Pweave. twitter.com/TeXtip/status/…
Graphics from code – optical illusion images made with LaTeX and TikZ/PSTricks – bit.ly/1SDFVuL twitter.com/DrHammersley/s…
Promising tool (overleaf.com) for shared document editing. Helps students to see how nice reports can be generated with LaTeX.
Science article data analysis suggests fraud and will likely lead to retraction. twitter.com/lakens/status/…
Explaining p value misuse, doing stats in science, also in book form (found on O’Reilly) statisticsdonewrong.com
Managing and recruiting advice from Google (2013) nyti.ms/11MuanK

Pervasiveness of quantitative data in biology raises questions about abusing statistical significance. twitter.com/generalising/s…
Showing more data and using better plots. Yes! twitter.com/KamounLab/stat…
Brilliant, even for the most reluctant students. twitter.com/Rbloggers/stat…

Tweeting more to do better science ? bit.ly/1DlDeZK
“Do the best experiments you can, and always tell the truth. That’s all.” -Sydney Brenner – nautil.us/issue/21/infor…

sozi presentation image

Svg based presentations with Inkscape and Sozi

Up

Have you ever heard of deck.js, Impress or Reveal ? These are tools that allows one to create nice web-based presentations with plenty of animations. As strange as it may sound, I first looked at these apps because I wanted some smooth drop shadows for the pictures in my own LibreOffice presentations. CSS can be used to get that effect. I tried all the above mentioned libraries and tools and was disappointed by the fact that adding images and graphics seems to be an after-thought. These tools are excellent for text-based presentations but placing vector graphics, like svg, in a particular position and making graphically-rich presentations is not their main domain of use. Users circumvented these limitations by writing extensions, see examples for deck.js here.

Wandering on the web led me to a tool that allows graphic rich presentations to be made with the best open source tool for vector graphics creation: Inkscape. One of the integrated plug-ins, called JessyInk, allows one to use a succesion of layers to create an svg file that, when opened with a web browser, are shown one after the other. JessyInk is an impressive tool, allowing one to annotate ‘live’ the presentation or to show an overview by clever keyboard shortcuts, but I did not like the fact that the original svg had to superpose all the layers. Additionally, the svg file is modified by JessyInk with some elements not being rendered at all in the obtained presentation. Not good.

In search for an alternative, I found Sozi, an application that takes a different approach from JessyInk. It allows something that I did not thought possible – take a poster, done with Inkscape or other svg creator, and make a presentation based on succesive views of different parts of the large image. Everything is integrated in a html/json pair of files that integrate the original svg with JavaScript magic. While the Sozi interface is in need of some ergonomy improvements, the tool is robust enough to allow me to create a 60 slides presentations in an evening, from a large svg file that I already had in Inkscape. A few of the slides  can be viewed on my bitbucket site (although some images lack from the presentation). Scrolling allows zooming in and out of the slides and clicking on the slide number opens a nice menu with all the slide titles, for fast navigation. The first slide, as a screenshot of my current Ubuntu desktop (clicking on the image goes to the presentation page):

sozi presentation image
Sozi presentation, first slide

I would very likely use Sozi and Inkscape for presentations in the future. There are a few things to consider though:

  1. Images inserted in the SVG file are better handled through relative links and should be kept in a directory that is close to the svg file itself. Moving the presentation around becomes possible if the html, json and img folder are moved together.
  2. Sozi does not allow for the moment to easily have several versions of the presentation based on the same svg but this feature might come in the future. It will be also very useful to allow the existence of svgs where text was converted to paths, so that one does not need to carry some exotic fonts around just to be able to show a presentation. If you want to duplicate a presentation, you have to duplicate the html, json and original svg and change their name so that they match each other.
  3. One has to be familiar with Inkscape and its quirks.
  4. Web browsers are not exactly familiar to the idea of presentations, the mouse pointer, for example, insists on showing the title of the web page.
  5. Do not use layers in the original Inkscape presentation. Somehow, Sozi becomes confused and might transform the layers differently, giving a very strange result. Better to use blocked elements, if you want to make a pattern of rectangular regions as template for the graphical elements of the presentation.
  6. Always do a backup of your presentation in PDF – sozi-to-pdf does it gracefully. It is a module that has to be installed separately. It generates a rather large pdf file because it is composed of high quality graphic png. While it is large, it will not depend on locally installed fonts or any additional resources to be displayed as you expect it to be.

Update (April 2017). For the type of presentation I usually do, better distribute the images or drawings in separate svg files. This helps a lot in finding a particular image or result, more difficult with very large SVG files, as those used for a Sozi presentation.

Customize Ubuntu 14.04 on a HP Zbook

A few notes about things I did to be able to work with Linux on a HP Zbook 14 G2. Dual boot was problematic because gparted had difficulties with the M.2 SATA SSD. In the end, Linux lives happily on the HDD and a NTFS partition ensures bidirectional communication with Windows.

Once Ubuntu LTS 14.04 was installed, several changes were needed to adapt the display resolution (920×1080 pixels), adjust battery usage, add some useful utilities and hide or show disk partitions at will:

1. Make fonts larger on high dpi screen for the grub menu (from http://b.wardje.eu/2014/08/increase-font-in-grub-for-high-dpi.html):
sudo grub-mkfont --output=/boot/grub/fonts/DejaVuSansMono24.pf2 --size=24 /usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf

Add the following lines to the grub configuration file (can be edited with “sudo gedit /etc/default/grub”):
## More readable font on high dpi screen, generated with
## sudo grub-mkfont --output=/boot/grub/fonts/DejaVuSansMono24.pf2 \
## --size=24 /usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf
GRUB_FONT=/boot/grub/fonts/DejaVuSansMono24.pf2

Update grub to take into account the new configuration:
sudo update-grub

2. Install LibreOffice 5:
sudo add-apt-repository ppa:libreoffice/libreoffice-5-0
sudo apt-get update
sudo apt-get dist-upgrade

3. Install a very capable FTP client (lftp) and an alternative file manager (Double Commander):
sudo apt-get install lftp
sudo add-apt-repository ppa:alexx2000/doublecmd
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install doublecmd-gtk

4. Install exFAT support
sudo apt-get install exfat-fuse exfat-utils

5. Command line information on battery usage:

upower -i /org/freedesktop/UPower/devices/battery_BAT0

Get information on battery usage (11.5 W average):
sudo apt-get install powerstat
powerstat -d 1

Test if hibernate works; no it does not work well, forget about it.
sudo pm-hibernate

6. Optimize energy consumption with tlp:

Install tlp from ppa (will be included in the main repository for further versions of ubuntu)
sudo apt-add-repository ppa:linrunner/tlp
sudo apt-get update
sudo apt-get install tlp

Configure TLP:
cd /etc/default/
cp tlp ~/tlpconfig.backup
sudo gedit tlp
# modify SATA option, from http://refugeeks.com/use-tlp-to-optimize-the-power-consumption-in-ubuntu/
# modify SATA_LINKPWR_ON_BAT=min_power to max_performance

7. Hide windows partitions from XFCE and Unity and automount partitions:
cd /etc/udev/rules.d/
sudo gedit 99-hide-disks.rules
# use the right names for the disks or their identifiers
#KERNEL=="sda1", ENV{UDISKS_IGNORE}="1"
#KERNEL=="sdb2", ENV{UDISKS_IGNORE}="1"
# at reboot the sda1 and sdb2 partitions will not be visible any more
# inactivate fast boot option in Windows, otherwise shared disks are unmountable in Linux

# Automatically mount removable partitions
# https://help.ubuntu.com/community/AutomaticallyMountPartitions
# First, get the identifier of the partition with "mount" -> /dev/sda3
# Get the uiid:
ls -al /dev/disk/by-uuid/
# lrwxrwxrwx 1 root root 10 mars 2 08:31 someuiid...
# Add to startup items:
/usr/bin/udisksctl mount --block-device /dev/disk/by-uuid/someuiid

8. Packages required to install R on Ubuntu:
# Various packages required for installation of R parts, if packages complain about missing libraries:
sudo apt-get install libxml2-dev
sudo apt-get install libssl-dev
sudo apt-get install libcurl4-gnutls-dev

9. Display customization and other tricks:

Change dpi settings in the “Displays” preference > scale 1.38
Install Unsettings to change Unity settings
sudo add-apt-repository ppa:diesch/testing
sudo apt-get update
sudo apt-get install unsettings

Change the brown color of selected item to something else (blue)
sudo apt-get install gtk-theme-config

Make text and other things larger on high dpi for Firefox
Write: *about:config* in the address bar
Then change the layout.css.devPixelsPerPx value to 1.4 (user set string)
The same can be done for Thunderbird (Preferences > Advanced > Edit config)

 

VirtualBox Linux shared folders how-to

Switching from Ubuntu/Debian to Manjaro/Arch linux as VirtualBox guest operating systems has consequences.

First, Manjaro/Arch already has the virtualbox guest extensions pre-installed, which is very convenient. These extensions allow running the virtual machine full screen with adjusted resolution and mounting shared folders with the host OS.

Second, an additional manipulation is required for Ubuntu to make shared folders accesible (as described here):

Question: I configured a shared folder between the Windows host and Ubuntu guest. The folder mounts at start up but its empty. It also has a padlock sign.

Answer: You, as a user, were not added to the vboxsf group, to which the mounted share folder belongs. To allow access to the shared folders permanently, in a Terminal window on the new Ubuntu:

sudo usermod -G vboxsf -a username

where you replace username with your own user name. This command appends the user “username” to the vboxsf group, which is the owner of the shared folders (found in /media/ on the Ubuntu box). You won’t need to run a script at startup once you own the mounted shared folder.

On Manjaro/Arch Linux (with LXDE), the mount does not happen automagically. I created a ‘shared’ folder in my user home directory. Next, I mounted the shared folder using the virtualbox mount command:

sudo mount.vboxsf temporary /home/myusername/shared

EDIT: the command allows some read access but no write access. To have access in both directions as a user follow the instructions detailed in a superuser answer:

sudo mount.vboxsf -o gid=1000,uid=1000,rw temporary /home/myusername/shared

I wrote this blog post as a reminder for myself and in the hope that it could be useful for someone else (even if I already wrote some of this information in a previous blog post)

Column order and decimal point changes with awk and sed

Recently, I’ve been confronted with a simple problem that I usually solve in a spreadsheet application. In a text file, change the order of columns and shift from “,” to “.” as a decimal point separator. However, since I had many similarly formatted text files and did want to speed up the conversion, I searched for tools that could help me do theses simple tasks without too much hassle. Welcome awk, a programming language and tool for text processing and sed, a line-by-line editor, both available by default in MacOS X and Linux. As usual, stackexchange answers and question were extremely useful to quickly find a solution.

While I was awking happily around, I became aware of a problem that I did not expect – it uses line feed characters (LF) as line terminator and if the file comes from Window, it has carriage return (CR) and LF at the end of lines. Thus, the first step needed to get a clean file, was to remove those annoying CR (well visible in the following screenshot, on Geany, a fantastic text editor):

inputfile

This can be done with the following awk command that removes CR characters while leaving LF in the file:

awk '{ sub(/\r$/,""); print }' infile.txt >outfile.txt

The result is, as expected:

crremoved

Now, we can proceed with the following step, which is a change in column order. What I needed was to move 4th column in second position. Awk comes to the rescue here as well:

awk -F\t '{print $1,$4,$2,$3}' OFS=$'\t' infile.txt > outfile.txt

The result looks good, no more CRs and the order of columns is fine:

columnorderchanged

Finally, the numeric values that used “,” as decimal separator were not correctly interpreted by the clustering program. However, changing all the commas to dots was not very nice, because column 2 now contains useful text commas. Sed provided a very simple command to do that:

sed 's/\([0-9]\)\,\([0-9]\)/\1.\2/g' < infile.txt > outfile.txt

To understand how sed does its thing, one must be familiar with regular expressions.

In the end, the file looks exactly as I wanted it to be:

commadotchanged

There is no need to keep intermediate files and these commands can be chained using the “|” pipe operator. Alternatively, they can be put together in a small shell script.