All about RNA and DNA sequencing

Great entry with description of RNA and DNA sequencing: enseqlopedia. If you want to know about RNA-seq method but also smart-seq, MARS-seq and all the -seq methods, this is a good starting point.

Posted in bioinformatics | Tagged , , | Leave a comment

missing SAM header with minimap2 and samtools

When using minimap2 to map sequencing reads onto a reference, you can use this kind of command (be careful, this is wrong as you will see later):

The command is verbose and prints this kind of information. Note here the WARN%ING:

Then, if you try to convert or read this file, you will most problaby get an error. For instance, to convert this SAM file into a BAM format (using samtools), you will get this error message:

The solution took me a while but is very simple: if you check the help message of minimap2, you will see that the reference should be provided first. So the top command should be:

that is the reference comes first and then the data.

Posted in bioinformatics | Tagged , | Leave a comment

How to get pypi statistics about package download

A while ago, I designed pypiview, a Python package used to fetch the number of downloads for a package hosted on pypi website.

It used to work decently but according to pypi itself the values stored are not reliable and indeed sometimes it looks wierd. Besides, it looks like
the number are updated for a given release. So if you have no release for a year, you have no downloads associated.

There are now more alternatives such as those associated with bigquery. One such tool called pypinfo uses bigquery.

There is also a google interface via biquery.com available here:

https://bigquery.cloud.google.com/welcome

Posted in Python | Tagged , , , | Leave a comment

How to prevent wget from creating duplicates

wget is used to download file from internet. For instance:

So far so good but two things may happen. First, you may interrupt the download. Second, you may load the file again. Sometimes, files are huge and you do not want to download the same file again.

In the first case, this is eve worse: imagine you have downloaded half of the file and you interrupt the process. Then, you call wget again, you wait, it is over and your are happy. However, because there was already a file called “test.csv” locally, wget downloaded the new file into test.csv.1 ! Moreover, it started the download from srcratch.

So, the solution is to used the two options -c and -N .

The first one tells to continue an interrupted download where it was stopped. And, the -N option checks the timestamps to prevent the download of the same file.

Posted in Linux | Tagged | Leave a comment

Meaning of Real, User and Sys time statistics

Under Linux, the time command is quite convenient to get the elapsed time taken by a command call. It is very simple to use: just type your command preceded by the time command itself. For instance:

The output looks like

In brief, Real refers to actual elapsed time including other processes that may be running at the same time; User and Sys refer to CPU time used only by the process (here the df command).

More precisely:

  • Real is wall clock time – time from start to finish of the call including time used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).
  • User is the actual CPU time used in executing the process. Other processes and time the process spends blocked do not count.

  • Sys is the amount of CPU time spent in the kernel within the process.

So, User + Sys is the actual CPU time used by your process

For more details, you can consult this quite precise description

https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1

Posted in Linux | 2 Comments

git : How to remove a big file wrongly committed

I added a large file to a git repository (102Mb), commited and push and got an error due to size limit limitations on github

Here, you see the path of the file (coverage/sensitivity/simualted.bed).

So, the solution is actually quite simple (when you know it): you can use the filter-branch command as follows:

Posted in Computer Science | Tagged , | Leave a comment

git and github : skip password typing with https

If you clone a github repository using the https:// method (instead of ssh), you will have to type your username and passwor all the time.

In order to avoid having to type you password all the time, you can use the credential helpers since git 1.7.9 and later.

where

means “keep the credentials cached for 2 hours. (default is 15 minutes).

You can also store the credentials permanently using

Posted in Computer Science | Tagged | Leave a comment

failed to convert from cram to bam (parse error CIGAR character)

In order to convert a bioinformatic file from CRAM to BAM format, I naively used the samtools command available on a cluster but got this error:

After a few commands trying to fix the issue, I realised that the error message contained the SAM label. This indicates that samtools version is a bit old. And indeed it was. I then used version 1.6 of samtools and it worked out of the box.

Posted in bioinformatics | Tagged | Leave a comment

How to mount and create a partition on a hard drive dock (fedora)

I got a new hard drive (2.7Tb) but wanted to use it with a docking station. Here are the steps required to use it under my Fedora box.

First, I naively went into the Nautilus File Browser hoping to see the hard drive mounted automatically. Of course it was not there: the hard drive is new and has no partition.

So, first, let us discover and check that the drive can be seen. We can use the fdisk command:

You can see in this case that the disk is on device /dev/sdb.

I then started the tool gparted and in the top right corner you can see the /dev/sdb device that should also indicate the size of your hard drive as shown in this image:

As you can see the partition and file system are unallocated. First, you need to go to the menu

Device/Create Partition Table

to create a partition table on this hard drive.

Then, you can create a new partition by going to

Partition/New

Here, you get a new window that looks like:

I allocated the entire space to one partition. In the menu you need to give a label and a name. The name is for you, the label is for the system so for the label remain simple and do not use special characters (except if you know what you are doing).

For the filesystem I kept the default (gpt). Finally, once you are done, you need to press the apply button. You should be ready in a few seconds.

Go back to Nautilus File Browser and here you can see the new hard drive partition (in theory).

Change permission

Finally, you will see that in Nautilus, you can not create any folder or files: you do not have permissions. To change this, you need to be in the list of sudo users. Then, go the path where your hard disk is mounted and type:

Posted in Linux | Tagged , | Leave a comment

AWK: convert into lower or upper cases

In order to convert a bash variable to lower case with awk, just use this command:

If you want to convert the content of a file (called data.csv) to lower case:

Of course to convert into upper case, simply use the function toupper() instead of tolower().

Note also that a better tool to avoid issues with special characters might be the tr unix command:

Posted in Linux | Tagged , | Leave a comment