Connect with us

Hi, what are you looking for?

Latest

How do I get Unique Values from a Column in the Pandas Data Frame?

When you work with a large database in pandas, you often have a column with rows/characters and you want to know how many unique elements there are in the column. The Pandas to Python library makes it easy to find unique values. In this lesson we examine examples of obtaining unique column values using two building functions.

We will first use the Pandas unique() function to get the unique values of the column, and then the Pandas drop_duplicates() function to get the unique values of the column.

Let’s start with examples from a current dataset.

Recording of data from loading hatches

1

2

3

4

5

6

pd

gapminder = pd.read_csv(gapminder_csv_url)

Let’s review the basic information in the data framework. We see that continental and national variables are objects/strings and we can find some unique values for them.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

print (gapminder.info())

Class ‘pandas.core.frame.dataFrame’ >

RangeIndex: 1704 records, 0 to 1703

Data columns (6 columns in total) :

1704 Writable object

1704 the year is not fully int64

1704 Non-zero float64

Non-zero continental object 1704

lifeExp 1704 Non-zero fleet64

gdpPercap 1704 non-zero cargo fleet64

d types: float64(3), int64(1), object(2)

Use of memory : 79,9+ KB

How to obtain unique column values for pandas?

We can apply the Pandas unique() function to the variable of interest to obtain the unique values of the column.

For example, suppose we want to find a single column value for a continent in a data frame that shows all continents in the data frame. The column we are interested in can make use of a unique panda function. And it returns a range of NumPy’s with unique column values.

1

2

>gapminder [continent].unique()

array([Asia, Europe, Africa, America, Oceania], dtype=object)

We can also use the panda chain method and apply it to the panda series corresponding to the column, which allows us to obtain unique values.

1

2

>gapminder.continent.unique()

array([Asia, Europe, Africa, America, Oceania], dtype=object)

If we need unique column values in the panda data frame in list form, we can easily use the tolist() function by linking it to the previous function.

1

2

>gapminder [continent].unique().

Asia, Europe, Africa, America, Oceania…

If you try a single function for the country column of the data frame, the result is a large Numpy table.

1 > …unique().

Instead, we can simply calculate the number of unique values in a country column and find 142 countries in the dataset.

1

2

>len(gapminder[‘land’].unique().

142

How to get unique column values with the drop_duplicates() function

Another, somewhat intuitive way to obtain unique column values is to use the Pandas drop_duplicates() function in Pandas. The function drop_duplicates() removes all double values on the variable/column and returns the pandace sequence.

For example, to obtain unique values for continental variables, we will use the Pandas drop_duplicates() function as follows.

1

2

3

4

5

6

7

8

9

gapminder.continent.drop_duplicates()

0 Asia

12 Europe

24 Africa

48 America

60 Oceania

Name: Continent, type: Subject.

 

 

 pandas count unique values in column,pandas dataframe unique

You May Also Like