Let us consider this data set in CSV format:
DIG1, DIG1, DIG1, DIG2 1, 3, 5, 8 2, 4, 6, 9 3, 5, 7, 10 |
The problem is to read the data and average the columns that have the same name.
You can read the CSV file with pandas.read_csv function or build the data frame manually as follows:
import pandas as pd df = pd.DataFrame({"DIG2":[8,9,10], "DIG1": [1,2,3], "DIG1.1": [3,4,5], "DIG1.2": [5,6,7]}) |
Note that pandas appends suffix after column names that have identical name (here DIG1) so we will need to deal with this issue. First, let us transpose the data
>>> df = df.transpose() >>> df 0 1 2 DIG1 1 2 3 DIG1.1 3 4 5 DIG1.2 5 6 7 DIG2 8 9 10 |
Let us call reset_index so that indices are now in a column
>>> df = df.reset_index() >>> df index 0 1 2 0 DIG1 1 2 3 1 DIG1.1 3 4 5 2 DIG1.2 5 6 7 3 DIG2 8 9 10 |
We then rename the index column to get rid of the extra dots
df['index'] = [this.split(".")[0] for this in df['index']] |
and finally, we can group by name and transpose back
>>> df.groupby("index").mean().transpose() index DIG1 DIG2 0 3 8 1 4 9 2 5 10 |
There may be a better solution to do that but it works for now.
Please follow and like us:
5 Responses to Pandas dataframe: grouping column by name