git error solved: The remote end hung up unexpectedly

Sometimes, when you clone a git repository you may end up with a fatal error “The remote end hung up unexpectedly”.

For instance, typing this git command to clone a repository of mine:

git clone

led to this message ending with the fatal error :

Cloning into 'sequana-0.8.0'...
remote: Enumerating objects: 58, done.
remote: Counting objects: 100% (58/58), done.
remote: Compressing objects: 100% (40/40), done.
error: RPC failed; result=18, HTTP code = 200KiB | 5.00 KiB/s   
fatal: The remote end hung up unexpectedly
fatal: early EOF

One solution is to increase the buffer used by git from the unix command by exporting this environment variable:


and then it worked out of the box:

Cloning into 'sequana-0.8.0'...
remote: Enumerating objects: 58, done.
remote: Counting objects: 100% (58/58), done.
remote: Compressing objects: 100% (40/40), done.
remote: Total 26318 (delta 31), reused 38 (delta 18), pack-reused 26260
Receiving objects: 100% (26318/26318), 17.93 MiB | 8.00 KiB/s, done.
Resolving deltas: 100% (18760/18760), done.
Checking connectivity... done.

You may do that once for all using the git config command:

git config --global http.postBuffer 100000000
Posted in Linux | Tagged | 5 Comments

How to find differences between two directories using diff command

When you copy a directory with many files, you may want to check whether the copied directory is indeed an exact copy of the original. You can do that by comparing all files. You can write your own script checking every single file. You can also use standard tools under linux such as the diff command:

diff -r folder folder2 | grep -v "^diff -r " | sort

Here, the diff command check for all files recursively (-r). It returns a so-called diff file format including the differences. If we are interested by which files are different only, then we can use the grep command to simply return the file names that differ.

Posted in Linux | Tagged | Leave a comment

pypi upload failed due to non-existent authentication information

After trying to upload a package on Pypi website using

python sdist upload

I got this error message:

Submitting dist/XXX-0.7.5.tar.gz to
Upload failed (403): Invalid or non-existent authentication information.
 error: Upload failed (403): Invalid or non-existent authentication information.

Despite having registered and succesfully uploaded package before. What happened is that I updated my account on Pypi website by changing the password. So, in such case, just update your .pypirc file, which can be found in your home directory.

index-servers =
username: blabla
password: blabla
Posted in Uncategorized | Tagged | Leave a comment

format USB key under fedora

I created a bootable USB key and needed to format the key so that others could use it again.

Using the format option (see image below) in the KDE file environment, I got this error:

    This partition cannot be modified because it contains a partition table; >please reinitialize layout of the whole device. (udisks-error-quark, 11)

I then tried to use the gparted tool. You can install it easily using

dnf install gparted

and starting the tool as sudo (sudo gparted).

You can select the partition you want to change in the top right corner (see image here below for a snapshot of the interface).

You can check the name of the partition using the linux command

df -h

Once the USB stick is selected, click on the “Device” tab > Create Partition Table.
However, there was several errors telling me that the partition could not be changed.

So, I then tried the mkfs linux command line tool:

 sudo umount /dev/sdc1
 sudo mkfs.ext4 /dev/sdc1

This seemed to work but wait a minute: the disk is 1.7Gb as if it was not really cleanup from the fedora bootable partition. I know that the USB disk is actually 16Gb large. So 13G is unallocated.

Somehow, going back to the KDE environement (image at the top), I click the format option, umount / mount and now I can see the 16G. This can be check visually using gparted (image below).

Posted in Linux | Tagged , | Leave a comment

All about RNA and DNA sequencing

Great entry with description of RNA and DNA sequencing: enseqlopedia. If you want to know about RNA-seq method but also smart-seq, MARS-seq and all the -seq methods, this is a good starting point.

Posted in bioinformatics | Tagged , , | Leave a comment

missing SAM header with minimap2 and samtools

When using minimap2 to map sequencing reads onto a reference, you can use this kind of command (be careful, this is wrong as you will see later):

minimap2 -a -x map-pb test.fastq reference.fasta > minimap.sam

The command is verbose and prints this kind of information. Note here the WARN%ING:

[M::mm_idx_gen::0.338*0.98] collected minimizers
[M::mm_idx_gen::0.464*1.19] sorted minimizers
[WARNING] For a multi-part index, no @SQ lines will be outputted.
[M::main::0.464*1.19] loaded/built the index for 863 target sequence(s)

Then, if you try to convert or read this file, you will most problaby get an error. For instance, to convert this SAM file into a BAM format (using samtools), you will get this error message:

[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 2
[main_samview] truncated file.

The solution took me a while but is very simple: if you check the help message of minimap2, you will see that the reference should be provided first. So the top command should be:

minimap2 -a -x map-pb reference.fasta test.fastq > minimap.sam

that is the reference comes first and then the data.

Posted in bioinformatics | Tagged , | Leave a comment

How to get pypi statistics about package download

A while ago, I designed pypiview, a Python package used to fetch the number of downloads for a package hosted on pypi website.

It used to work decently but according to pypi itself the values stored are not reliable and indeed sometimes it looks wierd. Besides, it looks like
the number are updated for a given release. So if you have no release for a year, you have no downloads associated.

There are now more alternatives such as those associated with bigquery. One such tool called pypinfo uses bigquery.

There is also a google interface via available here:

SELECT COUNT(*) AS download_count
WHERE file.project="spectrum"
Posted in Python | Tagged , , , | Leave a comment

How to prevent wget from creating duplicates

wget is used to download file from internet. For instance:

wget http://url/test.csv

So far so good but two things may happen. First, you may interrupt the download. Second, you may load the file again. Sometimes, files are huge and you do not want to download the same file again.

In the first case, this is eve worse: imagine you have downloaded half of the file and you interrupt the process. Then, you call wget again, you wait, it is over and your are happy. However, because there was already a file called “test.csv” locally, wget downloaded the new file into test.csv.1 ! Moreover, it started the download from srcratch.

So, the solution is to used the two options -c and -N .

wget -c -N http://url/test.csv

The first one tells to continue an interrupted download where it was stopped. And, the -N option checks the timestamps to prevent the download of the same file.

Posted in Linux | Tagged | Leave a comment

Meaning of Real, User and Sys time statistics

Under Linux, the time command is quite convenient to get the elapsed time taken by a command call. It is very simple to use: just type your command preceded by the time command itself. For instance:

time df

The output looks like

real	0m3.905s
user	0m2.408s
sys	0m1.238s

In brief, Real refers to actual elapsed time including other processes that may be running at the same time; User and Sys refer to CPU time used only by the process (here the df command).

More precisely:

  • Real is wall clock time – time from start to finish of the call including time used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).
  • User is the actual CPU time used in executing the process. Other processes and time the process spends blocked do not count.

  • Sys is the amount of CPU time spent in the kernel within the process.

So, User + Sys is the actual CPU time used by your process

For more details, you can consult this quite precise description

Posted in Linux | 2 Comments

git : How to remove a big file wrongly committed

I added a large file to a git repository (102Mb), commited and push and got an error due to size limit limitations on github

remote: error: GH001: Large files detected. You may want to try Git Large File Storage -
remote: error: Trace: 7d51855d4f834a90c5a5a526e93d2668
remote: error: See for more information.
remote: error: File coverage/sensitivity/simulated.bed is 102.00 MB; this exceeds GitHub's file size limit of 100.00 MB

Here, you see the path of the file (coverage/sensitivity/simualted.bed).

So, the solution is actually quite simple (when you know it): you can use the filter-branch command as follows:

git filter-branch --tree-filter 'rm -rf path/to/your/file' HEAD
git push
Posted in Computer Science | Tagged , | 40 Comments