Little Programming Note

Some good practices on R, Python and bash programming, as well as a little note for myself.

R

  • Use .rds when saving only one object instead of .rdata, as readRDS() will allow you to assign a name for the object, while loading the .rdata will maintain its own name, which can be inconvenient when you are running many replicates of simulation using one set of core code.

  • When combining the many .rds into one final output for performance evaluation, list.files() is often useful. With a little bit of knowledge in regular expression matching, your code can be made both generalizable and precise.

  • Code should be made generalizable - use as little hard-coded parameters as possible. Instead, these can be passed to code using command line arguments.

  • Use library() instead of require() when loading packages, as require() is essentially equivalent to try(library()), and may not always work if the required library is installed. It will only throw a logical value indicating whether the package is loaded or not. The code will not stop here, which means you will only realize the problem much later when you call a function from the package that is not loaded. library() instead will throw an error.

  • Use relative path instead of absolute path. Hard-coding paths can make it difficult for your collaborators to replicate your work, as they do not have exactly the same folder organization with you. Instead, use ./ to denote the current folder, and ../ to denote the upper level folder. For even upper level folders, use ../../ or more, depending on your need.

  • sample(N) will generate a random permutation of integers 1 to N.

  • If a specific level of a factor is to be used as the reference level, use relevel() to re-define the reference level before model fitting.

  • To obtain the matrix product of a vector with its transpose, giving a square matrix, use tcrossprod().

  • Explore get() and assign(). They may be useful at some future moment.

  • When subsetting columns from a matrix, the resulting matrix may be automatically transformed to a column vector. To maintain the two-dimensional shape, use mymatrix[ , i, drop = FALSE]

  • Make good use of the outer() function to avoid two-layer loops.

Bash

  • grep -i keyword *.R helps one to list all R files under a certain directory that contain the keyword string
  • pdftotext myfile.pdf - | wc -w helps count how many words are in myfile.pdf
Yishu Xue
Yishu Xue
Data Scientist / Coder / Novice Sprinter / Gym Enthusiast

The night is dark and full of terrors.