Pandas dataframe: grouping column by name

Let us consider this data set in CSV format:

DIG1, DIG1, DIG1, DIG2
1, 3, 5, 8
2, 4, 6, 9
3, 5, 7,  10

The problem is to read the data and average the columns that have the same name.

You can read the CSV file with pandas.read_csv function or build the data frame manually as follows:

import pandas as pd
df = pd.DataFrame({"DIG2":[8,9,10], "DIG1": [1,2,3], "DIG1.1": [3,4,5], "DIG1.2": [5,6,7]})

Note that pandas appends suffix after column names that have identical name (here DIG1) so we will need to deal with this issue. First, let us transpose the data

>>> df = df.transpose()
>>> df
        0  1  2
DIG1    1  2  3
DIG1.1  3  4  5
DIG1.2  5  6  7
DIG2    8  9  10

Let us call reset_index so that indices are now in a column

>>> df = df.reset_index()
>>> df
    index  0  1  2
0    DIG1  1  2  3
1  DIG1.1  3  4  5
2  DIG1.2  5  6  7
3    DIG2  8  9  10

We then rename the index column to get rid of the extra dots

df['index'] = [this.split(".")[0] for this in df['index']]

and finally, we can group by name and transpose back

>>> df.groupby("index").mean().transpose()
index  DIG1  DIG2
0         3     8
1         4     9
2         5    10

There may be a better solution to do that but it works for now.

Please follow and like us:
This entry was posted in Python and tagged . Bookmark the permalink.

5 Responses to Pandas dataframe: grouping column by name

Leave a Reply

Your email address will not be published. Required fields are marked *