GIT tutorial

I finally decided to use GIT for one of my new project. I know this sounds odd given that everybody nowadays seems to use it… Having used SVN for a while the transition should be smooth though. There are a bunch of tutorials on the web and I don’t pretend to create a better one (see e.g. link at the bottom of this page). This post does NOT explain was GIT is or how it compares to other concurrent tools. Instead, it provides some commands I used to start setting my first GIT project.

Creating a project and the basics

First, I created an account on bitbucket (I needed a private repo) where instructions to set up the GIT repository were provided, which was definitely useful.

Starting from scratch, create a directory locally and make it GIT compatible:

    mkdir projectName
    cd projectName
    git init

Then, you need to connect your local git (projectName) to the server repository

git remote add origin https://username@bitbucket.org/username/projectName

Here remote stands for any remote repositories that are hosted on the Internet, locally or in the network.

Note also that if you clone a repository, Git implicitly creates a remote named origin by default. The origin remote links back to the cloned repository. This is not the case here since I started a project from scratch using “git init”.
Therefore, the origin remote has not been created automatically hence the “add origin”.

You can then add new files, add and commit them as if you were using SVN:

echo "# This is my README" >> README.txt
git add README.txt
# commit your file to the local repository
git commit -m "Adding a README file."

The difference with SVN is that the commit is performed locally. You then need to “push” your repository on the server itself:

git push -u origin master

Here “master” stands for the name of the branch.

A collaborator can then get the server repository by “cloning” it on his/her local machine:

git clone https://bitbucket.org/username/projectName/

And obtain latest updates by “pulling”

git pull

Like in SVN, you can see the status as follows:

git status

GIT configuration

Some configuration can be done to have nicer output

git config --global color.ui true
git config --global color.status auto
git config --global color.branch auto

By default Git uses the system default editor which is taken from the VISUAL or EDITOR environment variables if set. You can configure a different one via the following setting.

# setup vim as default editor for Git (Linux)
git config --global core.editor vim

Reverting back to previous versions

If you’ve done some local changes (not yet commited) and want to cancel those changes, use:

    git checkout HEAD <filename>

You can replace HEAD by a commit version (found with git log )

Merging branches

Here is a nice and complete documentation: GIT doc

Posted in Software development | Tagged | Leave a comment

Entropy and information

In everyday parlance, entropy refers to the inevitable deterioration of a system (including a society). As you may remember from a physics course, there is a far more formal definition that originates from thermodynamics. I’ll spare you the equation but in brief, entropy is a measure of the amount of thermal energy (heat) that is NOT available to do work.

Entropy is used in many other context such as cosmology, chemistry and for what interests us here, it is also used in information theory.

In information theory, entropy could refer to a measure of the uncertainty in a random variable or unpredictability or information content. Let us consider the latest. What means information ?

Let consider a simple example of a standard set of 52 cards. Consider 3 events E1, E2 and E3.

  • E1 means the card is a heart. E1 probability is 1/4
  • E2 means the card is a 7. E2 probability is 1/13
  • E3 means the card is a seven of heart. The union of E1 and E2 has a probability of 1/52

An event that has a low probability is interesting from the information point of view. It has a high information content. Conversely, a high probability event has less information content. We are interested by low probability events.

Let us try to get a feeling of what kind of information contain the 3 events mentioned abov. If we try to sort them, it makes sense to write:

  • I(E3) >= I(E2) >= I(E1) (based on the probabilities)

In addition, by pure intuition:

  • I(E3) = I(E2) + I(E1)

Finally, an information function would require

  • I(Ei) >= 0

It appears that there are few functions that satisfies the 3 conditions above and it can be shown that that they takes the following form:

I(E) = – K log_a P(E)

Posted in Notes | Tagged | Leave a comment

latex: same footnote but different reference places (with hyperlink)

I had the following issue with LaTeX: I first created a footnote. Later on in the text I wanted to refer to the same footnote. The issue is that the footnote contains a hyperlink. After reading the few pages from the reference here below, I finally decided to go for this solution that avoids complicated solution and usage of special package.

Place a label inside the first footnote, and for the second reference, simply add superscript that contains a reference to the label:

text \footnote{some text\label{xx}} blablabla \textsuperscript{\ref{xx}}

:reference: http://tex.stackexchange.com/questions/35043/reference-different-places-to-the-same-footnote

Posted in Computer Science | Tagged | 11 Comments

Python: ValueError: unknown locale: UTF-8

When installing bioservices (a python package) using pip or easy_install, I got this error Mac OS X 10.7 – Lion:

ValueError: unknown locale: UTF-8

Although the installation is successful, the error will appear as soon as we try:

from bioservices import *.

On solution is to fix your environment by typing the following code in a shell:

export LANG="it_IT.UTF-8"
export LC_COLLATE="it_IT.UTF-8"
export LC_CTYPE="it_IT.UTF-8"
export LC_MESSAGES="it_IT.UTF-8"
export LC_MONETARY="it_IT.UTF-8"
export LC_NUMERIC="it_IT.UTF-8"
export LC_TIME="it_IT.UTF-8"
export LC_ALL=

You can check if it works by typing

python -c 'import locale; print(locale.getdefaultlocale());'

If this works without error, then it is fixed and you should be able to import bioservices. If so, make this solution persistent by adding the code into your environment. For that, just copy and paste the code in a file called .bashrc_profile (or .bashrc)

Reference: http://patrick.arminio.info/blog/2012/02/fix-valueerror-unknown-locale-utf8/

Posted in Uncategorized | Leave a comment

Using valgrind to debug R programs

If you want to debug your R program because of a segmentation fault or memory leak, you can use valgrind as follows in a shell:

R -d "valgrind --leak-check=full --show-reachable=yes" -f your_script.R
Posted in R | Tagged , | Leave a comment

ipython error

If you get this kind of error when starting ipython:

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/kernmagic/__init__.py in activate(ip, *args)
     14             continue
     15         magic_name = name[len('magic_'):]
---> 16         ip.shell.define_magic(magic_name, getattr(mymagics, name))
     17 
     18 def activate_aliases(ip, *args):
 
AttributeError: 'TerminalInteractiveShell' object has no attribute 'shell'

you may solve the problem by deleting the special directory .ipython in your home directory. It worked for me at least. Of course you lose you previous ipython configuration. A new directory will be created by ipython.

Posted in Python | Tagged | Leave a comment

Accessing Life Science Web Services from Python

In a previous post, I started to discuss about Web Services in Life Science and what a maze it is to navigate between them.

Some Web Services provides Python code to access to their databases but not all of them. Besides the proposed code may be just a subset of what you need or simply not homogeneous.

I wrote a Python package called BioServices (with the help of other people; see link below) that allows an easy access to some major Web Services including uniprot, kegg, psicquic.

If you are interested, please visit the main page hosted on Pypi: https://pypi.python.org/pypi/bioservices/

As an example, here is an example on how to access to the uniprot web services to retrieve the unique identifier of a protein known by its gene name (MEK1) only:

>>> from bioservices import UniProt
>>> u = UniProt()
>>> d = u.quick_search("MEK1+and+taxonomy:human", limit=5)
>>> for entry in d.keys():
...    print("Entry name: %s, d[entry][Entry name"])
MP2K2_HUMAN
MP2K1_HUMAN
WNK2_HUMAN
BIRC6_HUMAN
MK03_HUMAN
Posted in biology, Life Science | Tagged , , , , | Leave a comment

LSF cluster commands

From Wikipedia, Load Sharing Facility (or simply LSF) is a commercial computer software, job scheduler sold by Platform Computing. It can be used to execute batch jobs on networked Unix.

Here is a summary of some commands I’ve been using. Every command has quite a lot of options so I will not enter into details. Just use the “man” command to get all the details about a LSF command (e.g., man bsub).

Running a single script with bsub

We consider a script called “script.sh”. To launch it just type

bsub script.sh

When the job is submitted you should get information about the job id:

Job <9222520> is submitted to queue XXX.

Following the job status with bjobs

Now, you can follow the job status:

bjobs -l 9222520

Killing jobs with bkill

You want to delete the job ? Use the bkill command

bkill 9222520

If you have many jobs, you can kill them all as follows:

bkill 0
Posted in Computer Science | Tagged | Leave a comment

Python: pip installation

References:

http://www.pip-installer.org/en/latest/installing.html

The installation of pip depends on your system. For instance under ubuntu it is as easy as

sudo apt-get install python-pip

Another way to install it if you are familiar with virtualenv tool consists in creating a virtual environment since pip is installed into it automatically.

This does not require root access or modify your system Python installation. For instance:

virtualenv my_env
cd my_env
source ./bin/activate
# test if pip works
pip install pip

If you do not have virtualenv, you can still install pip easily by installing setuptools or distribute from your distribtution.

For instance you could try:

wget http://python-distribute.org/distribute_setup.py
python distribute_setup.py

Replace wget tool by curl under MacOsX

Under windows, you can google it. I have not tried myself but here is an example:

pip on windows

Posted in Linux, Python | Tagged , | Leave a comment

R warnings when starting rpy2 and installation

On Fedora 15

When I start rpy2 using this command

import rpy2
from rpy2 import robjects

I was getting these annoying warnings, which do not appear when starting a pure R session:

During startup - Warning messages:
1: package ‘methods’ was built under R version 2.15.1 
2: package ‘datasets’ was built under R version 2.15.1 
3: package ‘utils’ was built under R version 2.15.1 
4: package ‘grDevices’ was built under R version 2.15.1 
5: package ‘graphics’ was built under R version 2.15.1 
6: package ‘stats’ was built under R version 2.15.1

I decided to install the newest version of R (2.15.2), which is not available under my current distribution (Fedora 15). So, I got the source files from the R web page and unpack the source into a directory. I then configure and compile it:

./configure --enable-R-shlib
make

Then, just type

make install

Note: I first started without the –enable-R-shlib option and abruptly stop the compilation to reconfigure. After typing make and waiting a while I got this error:

/usr/bin/ld: CConverters.o: relocation R_X86_64_32S against `a local symbol' can not be used when making a shared object; recompile with -fPIC
  CConverters.o: could not read symbols: Bad value
  collect2: ld returned 1 exit status

The solution was to restart the installation from scratch.

Then, I started ipython and tried to import rpy2 again but got an error related to a missing libR.so (despite the usage of –enable-R-shlib option)

>>> import rpy2.robjects
ImportError: libR.so: cannot open shared object file: No such file or directory

quit ipython and fix the issue by telling your system where to find the library (use locate libR.so):

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib64/R/lib/

Starting again ipython and the rpy2 import, I got another error message….:

>>> import rpy2.robjects
cannot find system Renviron
Fatal error: unable to open the base package

Here, the R_HOME is missing:

R_HOME=/usr/local/lib64/R

and now everything seems to work without warnings, which was the original issue…

Posted in Computer Science | Tagged , | 4 Comments