MINIMAL COMPUTING

The Most Science from the Least Software


1 MOTIVATIONS

1.1 Having Tools Adapted to the Craft

Being a researcher, most of my work is done in front of a computer:

  • Reading and writing articles;
  • Reading and writing emails;
  • Remotely or on-site operating a telescope;
  • Treating raw data;
  • Analyzing processed data;
  • Performing model computation;
  • Preparing and giving oral talks;
  • And more an more often, interacting with students and attending meetings.

The computer is thus our work bench and we should adapt its ergonomy to the tasks we are achieving. In particular, it is crucial to work our way through typing as effortlessly as possible in order to avoid altering our train of thought.

im_tools.png

1.2 Optimizing the Environment

There are several general requirements to have a truly ergonomic computer environment.

  1. The ability to minimize muscle movements is essential to type fast and work during long hours. For that reason, we should avoid using the mouse. It is possible if we have a series of shortcuts to toggle between workspaces and applications, and operate these applications without having to click on icons. It means that we need to memorize a lot of shortcuts. For efficiency, we thus need to have the least possible different sets of key bindings (avoiding duplicates such as C-f and C-s for searching, C-c, C-S-c or M-w for copying or C-q, C-w or C-x C-c for closing). It implies having essentially one application for all text editing purposes (including coding and emailing).
  2. It is also important to have a clear display with the least visual clutter. Surprisingly, I found the best solution is having a mostly text-based display, with color highlighting and indentation. It also implies that we want no icons and no Graphical User Interface (GUI).
  3. Finally, it is important, in order to optimize our focus, to completely suppress sound alerts and pop-up windows. Having a sound informing us of an email is the best way to break our precious train of thought.

1.3 The Inconsistency in Modern Software Development

Unfortunately, the design of modern software and operating systems goes in every way against these requirements. This is mostly because major companies design computers to be used by non experts. There is also a marketing motivation: having fancy graphical interfaces and an environment that imposes on the user its own ergonomy justifies selling more licenses and presenting every new release as a © revolution

A consequence of this design is that updating software or an operating system, as well as changing a computer, are time-consuming and end up by forcing us into modifying our set-up. If, like myself, one works simultaneously on several computers and frequently changes them, this becomes an unacceptable waste of time. It also forces us to work with a set-up that hampers our productivity.

Finally, commercial software is often a graphical interface spawning simple commands (e.g. back-up softwares spawning rsync). Most of the tasks I perform on computers can be done with free, open source software. It even provides a gain of productivity on the long term, a better portability and what is nowadays improperly called reproducibility1.


2 PERSONAL IMPLEMENTATION

2.1 Advantages of a Unix Command Line Set-Up

I use Ubuntu on each one of my computers. This free, open source operating system is customizable enough to implement the requirements listed in Sect. 1. I use a minimal desktop (xubuntu) and remove every icons, docks, etc. I manage my files in bash using an X-terminal. I launch all my applications from the xterm. I go as far as performing simple image manipulations on the command line, using ImageMagick.

One of the advantages of this set-up is that it is instantly portable from one machine to another. I even use the same set-up files, for all my computers. They are automatically synchronized with ownCloud a service similar to Dropbox or GoogleDrive. Bash has a very good backward compatibility. I started writing these configuration files when I was a student, twenty years ago, and never had a problem.

  • My Ubuntu detailed installation: HTML or PDF.
  • My bash setting files: HTML.

2.2 Almost Everything with Emacs

Emacs is an obvious text editor to fill the requirements of Sect. 1.

  • It implements efficient ASCII/UTF8 text editing, with a lot of interesting functions, such as:
    • key-binded text selection, including rectangular selection, for cutting and pasting;
    • complex searches and replacement, using regular expressions;
    • parenthesis matching, etc.
  • It is fully customizable, including its key bindings.
  • It has a nice syntax highlighting and other functions for coding in most languages.
  • Scientific articles and complex documents can be typed in LaTeX within emacs. The produced PDF file can be viewed in a separate window and we can easily go back and forth between the code and the PDF. Writing LaTeX documents with emacs means that we need to manage our bibliography within emacs, too. This is conveniently done with helm-bibtex, which allows the user to manage the references, the PDF file of each article and the reading notes.
  • I make my talk slides in LaTeX, using the beamer package. I also make my posters in LaTeX. Apart from the high quality of the result, the fact that LaTeX is heavily template-based forces me to make rather simple slides with few text lines.
  • The crown jewel of emacs is its ORG mode. It is a sophisticated mark-up mode allowing me to quickly take notes structured into sections, lists, tables, etc. The source remains a simple, readable text file, that can be exported to various formats, including HTML or PDF. All my web pages have been written in ORG mode and exported to HTML or PDF (even the musical scores here, using org-lilypond).
  • ORG mode also allows us writing complex lists of tasks and schedule them at different dates. Such a scheduled-task calendar can be viewed as a regular agenda. External calendars, such as Google Agenda, can be imported into emacs and viewed together with the scheduled tasks. Everyone I know who started using ORG mode to organize their work days have seen a tremendous increase in productivity. This way of working allows me to efficiently organize my dissipative activities (meetings, administrative duties, etc.) that are time consuming but do not require a deep focus, giving me better quality time to devote to the hard stuff (Science).
  • Finally, emails can be efficiently read and written with emacs. ORG mode can be used to write emails that will be exported to HTML before being sent. We can thus effortlessly write emails with lists, tables, mathematical notations, and code snippets. Several mailboxes can be read within the same session. I read the same mailboxes from different computers without any issue. Attachments can be removed before storing the email. The mu utility is one of the most efficient applications to search through thousands of emails.

Here is my emacs configuration file: HTML or PDF or ORG. A typical session looks like the following (xterm and emacs, the winning combo).

front_Capture.jpg

I use emacs only for text editing. I am more a "tool-box guy" than a "Swiss-army-knife guy". However, it is possible to do more than text with emacs, e.g., see: Everything with Emacs.

2.3 Python, the Unavoidable Bad Solution

We can not do everything in bash and awk. More complex tasks require higher-level computing languages. In particular, an astrophysicist needs in his everyday life a language to:

  • read, manipulate and export data of any type;
  • make figures;
  • manipulate astronomical images and spectra;
  • perform non-CPU-intensive calculations.

Ten years ago, IDL was probably the most popular language to do that. It had however one disadvantage: the price of its license. People have slowly moved to another language: python. Why the community all converged toward this solution is a mystery to me. A more suitable solution to my mind would have been to invest in GDL.

Python indeed presents a lot of problems.

  1. It has a terrible backward compatibility. The syntax of its essential modules often changes. Sometimes, the interface changes, sometimes a function is moved to another module. Also, the python2 to python3 transition has been apocalyptic. In my own collaborations, important codes written by students in python2 only a few years ago are unusable, even within Anaconda. And these students, now working on other projects, do not have time to maintain them (talking about reproducibility…). This is the most problematic issue. It seriously questions the ability of researchers, whose job is not to spend their time maintaining software, to use python to build well-tested libraries of scientific modules, over several years.
  2. It is rather slow and uses a lot of RAM. Looping is inefficient. I understand that it is better to parallelize operations, but this is not always possible.
  3. It has probably the stupidest array indexing convention. To take all the elements of an array of size N, one has to type: a[0:N]. Most other languages either assume a[1:N] (e.g. Fortran) or a[0:N-1] (e.g C++), which is the way you would write it mathematically.
  4. The fact that arrays, which are one of the most fundamental ways of representing scientific data, have to be imported from a module (numpy) shows that this language was not originally intended for scientific use.
  5. The error messages can sometimes be enigmatic.

Yet, it is impossible to avoid using python, because of its massive community support and the large amount of libraries developed. I thus designed my own python library to avoid as much as possible the issues listed above.

  • I try to use the least python functions in my programs. Instead, I define my own modules calling python functions. This way, if the function changes after the next update, I will only have to update my module, instead of potentially having to update hundreds of programs using this function.
  • I try to avoid using too many external modules. In particular, I code my own version of simple functions. For instance, it is useless to call astropy's blackbody function, when it takes only one line to code it.
  • I use python mostly for office automation, basic data manipulation and plotting. I move to fortran (a language with an exemplary backward compatibility and the best array handling) each time I have to make a serious calculation.

2.4 Open Source Codes

My Gitlab repository:
https://gitlab.com/fredericgalliano.
SwING
(SoftWare for Investigating Nebulae and Galaxies). It is my library of fortran codes for performing SED fitting.
Python de Panurge:
my multi-purpose python library.
Standalone utilities
in python:
  1. structured_output;
  2. robust_moments;
  3. edit_bbl;
  4. diceware_fr.

3 CHEAT SHEETS

Below are my notes on different languages, protocols and editors.

3.1 System

Ubuntu:
HTML or PDF.
Bash:
HTML or PDF.
Awk:
HTML or PDF.

3.2 Editors

Elisp:
HTML or PDF.
Emacs:
HTML or PDF.
Vi:
HTML or PDF.

3.3 Coding

Html:
HTML or PDF.
Makefile:
HTML or PDF.
Git:
HTML or PDF.

3.4 Miscellaneous

Google search:
HTML or PDF.

Footnotes:

1

Nowadays, the term reproducibility, which is an important condition of the scientific method, is being systematically misused. This term, in its original epistemological acceptance, meant that, for a study to belong to empirical sciences, anyone with the necessary resources and skills should be able to reproduce the different steps presented by this study and arrive at the same conclusion. This was in particular advocated by Popper as a way of addressing the problem of intersubjectivity. Nowadays, a lot of conceited people advocate for reproducibility in the sense that every study must be accompanied by a computer code that could allow anyone running it to reproduce the figures and numerical values presented in the paper. Releasing software is not a bad practice. It however presents several problems.

  1. This does not ensure true, epistemological reproducibility. For instance, if there is a bug in the code, or if the code is not fully open source, the results might not be reproducible. One of the interests of reproducibility is allowing different researchers to go through all the different steps to eventually find issues.
  2. Truly reproducible studies might be excluded from this category if they do not provide their code.
  3. Forcing researchers to systematically release their codes can be counter-productive.
    • Releasing poorly-written codes, that will not be maintained is already leading to an overload of information, where well-written and maintained software become less visible.
    • Releasing a code before it has been fully used may see other quick-look studies or more handsomely-funded groups scoop the people who invested time writing it.

Author: F. Galliano
Last update: 05 janv. 2024