Thomas Cokelaer's blog | Notes on Data Analysis, Computer Science, Python, Biology, …

Python: how to copy a list

Posted on December 27, 2017 by Thomas Cokelaer

To explain how to create a copy of a list, let us first create a list. We will use a simple list of 4 items:

list1 = [1, 2, "a", "b"]

Why do we want to create a copy anyway ? Well, because in Python, this assignement creates a reference (not a new independent variable):

list2 = list1

To convince yourself, change the first item of list2 and then check the content of list1, you should see that the two lists have been modified and contain the same items.

So, to actually copy a list, you have several possibilities. From the simplest to the most complex:

you can slice the list.

list2 = 1ist1[:]

you can use the list() built in function

list2 = list(1ist1)

you can use the copy() function from the copy module. This is slower than the previous methods though.

import copy
list2 = copy.copy(list1)

finally, if items of the list are objects themselves, you should use a deep copy (see example below):

import copy
list2 = copy.deepcopy(list1)

To convince yourself about the interest of the latter method, consider this list:

import copy
list1 = [1, 2, [3, 4]]
list2 = copy.copy(list1)
list2[2][1] = 40

you should see that changing list2, you also changed list1. If this is not the intended behviour, you should consider using the deepcopy.

Posted in Python | Tagged copy, deepcopy | 2 Comments

Python: ternary operator

Posted on December 27, 2017 by Thomas Cokelaer

In C language (and many other languages), there is a compact ternary conditional operator that is a compact if-else conditional construct. For instance, in C, a traditional if-else construct looks like:

if (a &gt; b) {
    result = x;
} else {
    result = y;
}

and the equivalent ternary operator looks like:

result = a>b ? x : y;

As in the if-else code, only one expression x or y is evaluated.

In Python, from version 2.5, you would write:

results = x if a > b else y

More formally the ternary operator is written as:

x if condition else y

So condition is evaluated first then either x or y is returned based on the boolean value of condition.

You can use ternary operator within list comprehension. For example:

[1 if item > else -1 for item in [0,1,-5,2]]

Posted in Python | Leave a comment

Difference between repr and str in Python

Posted on December 26, 2017 by Thomas Cokelaer

When implementing a class in Python, you usually implement the __repr__ and __str__ methods.

__str__ should print a readable message
__repr__ should print a message that is unambigous (e.g. name of an identifier, class name, etc).

You can see __str__ as a method for users and __repr__ as a method for developers.

Here is an implementation example for a class that simply stores an attribute (data).

class Length():
    def __init__(self, data):
        self.data = data

__str__ is called when a user calls the print() function while __repr__ is called when a user just type the name of the instance:

>>> l = Length([1,2,3])
>>> print(l)    # should call __str__ if it exists
<__main__.Length at 0x7faf240acc18>
>>> l
<__main__.Length object at 0x7faf240acc18>

By default when no __str__ or __repr__ methods are defined, the __repr__ returns the name of the class (Length) and __str__ calls __repr__.

Now, let us define the __repr__ method ourself to be more explicit:

class Length():
    def __init__(self, data):
        self.data = data
    def __repr__(self):
        return "Length(%s) " % (len(self.data))

we could use it as follows:

>>> l = Length([1,2,3])
>>> print(l)     # calls __str__
Length(3)
>>> l            # calls __repr__
Length(3, 140175447410224)

When using the print() function in Python, the __str__ is called (if found) and otherwise, __repr__.

class Length():
    def __init__(self, data):
        self.data = data
    def __repr__(self):
        return "Length(%s, %s) " % (len(self.data), id(self))
    def __str__(self):
        return "Length(%s) " % (len(self.data))

so now __repr__ and __str__ have different behaviours:

>>> l = Length([1,2,3])
>>> print(l)     # calls __str__
Length(3)
>>> l            # calls __repr__
Length(3, 140175447410224)

Posted in Python | Tagged print, python | 3 Comments

python: how to merge two dictionaries

Posted on December 26, 2017 by Thomas Cokelaer

Let us suppose two dictionaries storing ages of different individuals:

list1 = {"Pierre": 28, "Jeanne": 27}
list2 = {"Marc": 32, "Helene": 34}

If you do mind losing the contents of either list1 or list2 variable, you can update one of the other as follows:

list1.update(list2)

Now list1 variable contains:

{"Pierre": 28, "Jeanne": 27, "Marc": 32, "Helene": 34}

while list2 is unchanged.

Usually, this is not what you want though. Instead, you would prefer to create a third variable keeping list1 and list2 unchanged.

In Python 3.5 or greater, you can use the following syntax:

fulllist = {**list1, **list2}

In Python 2 or 3.4 and below, you need to copy one of the variable and update it:

full_list = list1.copy()  # this keeps list1 unchanged
full_list.update(list2)   # inplace update of the variable full_list

The second method is more generic and would be more backward compatible (if you plan to provide your code to Python 2 users. Indeed, it would work for Python 2 and 3. However, it would be slower for Python 3.5 users (and above).

Posted in Python | Tagged dictionary | Leave a comment

Search for a pattern in a set of files using find and grep commands

Posted on December 26, 2017 by Thomas Cokelaer

A common task for developers is to search for a pattern amongst a bunch of files that are in different directories.

For instance, you are looking for the pattern “import sys” within a set of Python files. Those files are in sub directories mixed with other documents.

You can use the find command (to look for files ending with the py extension) and redirect those files to the grep command to search for the pattern “import sys” within all files found by the find command:

find . -name "*py" | xargs grep "import sys"

Note the double quotes and the use of the xargs command to scan the content of the files (not their names).

Of course, you can use all kind of wildcards:

find . -name "*py" | xargs grep "import sys"

Posted in Linux | Tagged find, grep | Leave a comment

okular: export annotations in the PDF file

Posted on October 27, 2017 by Thomas Cokelaer

One open source software to add annotations under Linux is okular (https://okular.kde.org/).

One can add annotations easily (go to Tools, tick review, or just type F6).

Then, it is time to save your document or to send it to a collaborator but wait a minute: we do not see the annotations when using acroread reader !! No worries, plenty of resources tell you to go to File/save as

Seems to work indeed. You quit, open the file and there you can see the annotations. Now, I use xpdf to read the PDF file. And here nothing. Oh, and I send the PDF to a journal review; they include the PDF inside another one and there no annotations either….

Final solution: instead of File/Save as, just print the document in … a PDF file: File/Print

This worked for me.

Posted in Linux | Tagged PDF | 3 Comments

No more space left on /tmp under Fedora

Posted on October 16, 2017 by Thomas Cokelaer

Under Fedora, one of my software requires more than 4Gb of temporary space and I realised that the /tmp directory is limited to 4Gb. In order to increase the /tmp directory, just edit the /etc/fstab file and add this line (to extend to 8Gb instead of 4Gb):

 none /tmp tmpfs size=8G 0 0

Then, as root:

mount -a

That’s it. You can check that you have now 8Gb available in /tmp by typing

df -h

Posted in Linux | Leave a comment

blasr (pacbio) installation on fedora box

Posted on October 9, 2017 by Thomas Cokelaer

I wanted to use blasr tool for Pacbio mapping and had difficulties in using or installing the tool. I first use a local installation of the tool on the provided cluster and it look like the installation was quite old. I then use a bioconda version. The latest version (5.3) was not working on my system (missing library) and this was reported. The previous version worked (5.2) but was missing a needed configuration flag. So, I decided to follow the instructions from pacbio to install a local version. That was not straightforward but finally got it to work. My platform is Fedora 23 and the instructions were given for ubuntu or centos 6.

First, download the source code:

git clone https://github.com/PacificBiosciences/pitchfork.git

cd pitchfork
make PREFIX=/tmp/mybuild blasr

This command installs zlib, bzip2, boost and hdf5 to start with.
The first issue arised from an error in the compilation of the hdf5 dependency due to missing iostream in the compilation of the hdf5 library

H5 Attribute.cpp fatal error iostream.h no such file

The solution was to edit the Makefile in workspace/hdf5-1.8.16/c++
and to comment this line (adding a # character in front))

AM_CXXFLAGS =  -DOLD_HEADER_FILENAME -DH5_NO_NAMESPACE -DNO_STATIC_CAST

Come back to the main pitchfork directory, type the make command again (see above). Another failure due to similar compilation error related to the namespace occured. Again, I edited the Makefile in
workspace/hdf5-1.8.16/c++/test/Makefile and commented the same line.

Next, I got a linking issue

/usr/bin/ld: cannot find -lstd++

This was solved by installing the stdc++ static library. I figure out the solution by typing:

ld -lstdc++

to see that none of the standard path could find the library despite the presence of the /usr/lib/libstd++.so.6

yum install libstdc++-static

Again, type

make hdf5 PREFIX=/tmp/mybuild

I got

ptableTest.cpp:20:19: error: expected namespace-name before ‘;’ token
 using namespace H5;
                   ^
Makefile:745: recipe for target 'ptableTest.o' failed
make[5]: *** [ptableTest.o] Error 1

so again, needed to find the culprit: a Makefile and it was in
In workspace/hdf5-1.8.16/c++/hl/test/Makefile . Again comment the same line as shown above.

Back to blasr, the make then failed when trying to install ncurses.

Here I tried a different strategy and tried to use the packages installed with my conda environement. To do so, I edited the settings.mk and added:

HAVE_NCURSES=$CONDA_PREFIX

Then, same issue with samtools, so added

HAVE_SAMTOOLS=$CONDA_PREFIX

Then, there was an error in the ./bin/pitchfork module due to a Python 3 issue (I had already switch all print “” to print(“”). Here, the issue was

ptableTest.cpp:20:19: error: expected namespace-name before ‘;’ token
 using namespace H5;
                   ^
Makefile:745: recipe for target 'ptableTest.o' failed
make[5]: *** [ptableTest.o] Error 1

Just replace _out[0] with str(_out[0])

Then, blasr compiles successfully…time to run it; The executable seems to be in workspace/blasr/blasr:

./workspace/blasr/blasr: error while loading shared libraries: libpbihdf.so: cannot open shared object file: No such file or directory

Here, you need to add a bunch of path to your LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PF/blasr_libcpp/hdf/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PF/blasr_libcpp/alignment
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PF//blasr_libcpp/pbdata
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PF/pbbam/build/lib/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PF/hdf5-1.8.16/c++/src/.libs/

Posted in biology | Tagged blasr, pacbio | Leave a comment

swapping two columns with awk keeping tabulation

Posted on September 2, 2017 by Thomas Cokelaer

Assuming you have a data file with N columns and you want to swap the first and second one, just type:

awk -F $'\t' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$'\t' input_file

To keep the tabulation intact, you need to specify the separator.

Incoming separator is defined by -F $’\t’ and the separator for the output is defined by OFS=$’\t’.

Posted in Linux | Tagged awk | Leave a comment

set a default version to an environment module

Posted on August 31, 2017 by Thomas Cokelaer

Context: “The environment modules package provides for an easy dynamic modification of a user’s environment via modulefiles. which typically instruct the module command to alter or set shell environment variables such as PATH, MANPATH, etc. as well as define aliases over a variety of shells.”

Reference: modules.sourceforge

When you type “module av PKGNAME” you get a list of available version for the PKGNAME. You may see the (default) one.

By default, the highest lexicographically sorted modulefile under the directory is set as the default. However, you can change this behaviour by creating a .version file in that directory which has a format like the following

#%Module1.0
set ModulesVersion  "native"

Posted in Linux | Tagged modules | Leave a comment

Search for:
Follow me
Recent Posts
calendar
April 2025

M T W T F S S

1 2 3 4 5 6

7 8 9 10 11 12 13

14 15 16 17 18 19 20

21 22 23 24 25 26 27

28 29 30

« Apr
Recent Comments
Archives
Archives
Categories
- amusement
- bioinformatics
- biology
- Computer Science
- Data analysis
- Internet related
- Life Science
- Linux
- Notes
- Others
- photos
- Python
- R
- raspberryPi
- Software development
- Uncategorized
- wordpress
Meta

Follow me

Recent Posts

calendar

Recent Comments

Archives

Categories

Meta