When you work with a large database in pandas, you often have a column with rows/characters and you want to know how many unique elements there are in the column. The Pandas to Python library makes it easy to find unique values. In this lesson we examine examples of obtaining unique column values using two building functions.
We will first use the Pandas unique() function to get the unique values of the column, and then the Pandas drop_duplicates() function to get the unique values of the column.
Let’s start with examples from a current dataset.
Recording of data from loading hatches
1
2 3 4 5 6 |
pd
gapminder = pd.read_csv(gapminder_csv_url) |
Let’s review the basic information in the data framework. We see that continental and national variables are objects/strings and we can find some unique values for them.
1
2 3 4 5 6 7 8 9 10 11 12 13 14 |
print (gapminder.info())
Class ‘pandas.core.frame.dataFrame’ > RangeIndex: 1704 records, 0 to 1703 Data columns (6 columns in total) : 1704 Writable object 1704 the year is not fully int64 1704 Non-zero float64 Non-zero continental object 1704 lifeExp 1704 Non-zero fleet64 gdpPercap 1704 non-zero cargo fleet64 d types: float64(3), int64(1), object(2) Use of memory : 79,9+ KB |
How to obtain unique column values for pandas?
We can apply the Pandas unique() function to the variable of interest to obtain the unique values of the column.
For example, suppose we want to find a single column value for a continent in a data frame that shows all continents in the data frame. The column we are interested in can make use of a unique panda function. And it returns a range of NumPy’s with unique column values.
1
2 |
>gapminder [continent].unique()
array([Asia, Europe, Africa, America, Oceania], dtype=object) |
We can also use the panda chain method and apply it to the panda series corresponding to the column, which allows us to obtain unique values.
1
2 |
>gapminder.continent.unique()
array([Asia, Europe, Africa, America, Oceania], dtype=object) |
If we need unique column values in the panda data frame in list form, we can easily use the tolist() function by linking it to the previous function.
1
2 |
>gapminder [continent].unique().
Asia, Europe, Africa, America, Oceania… |
If you try a single function for the country column of the data frame, the result is a large Numpy table.
1 | > …unique(). |
Instead, we can simply calculate the number of unique values in a country column and find 142 countries in the dataset.
1
2 |
>len(gapminder[‘land’].unique().
142 |
How to get unique column values with the drop_duplicates() function
Another, somewhat intuitive way to obtain unique column values is to use the Pandas drop_duplicates() function in Pandas. The function drop_duplicates() removes all double values on the variable/column and returns the pandace sequence.
For example, to obtain unique values for continental variables, we will use the Pandas drop_duplicates() function as follows.
1
2 3 4 5 6 7 8 9 |
gapminder.continent.drop_duplicates()
0 Asia 12 Europe 24 Africa 48 America 60 Oceania Name: Continent, type: Subject. |
pandas count unique values in column,pandas dataframe unique