Connect with us

Hi, what are you looking for?

Latest

How do I loop through the Pandas Rows? Or how do you iterate over the Pandas Rows?

Sometimes you have to draw/learn loops on a frame of panda data and perform certain operations on each line. Pandace has at least two iterations of the strings in the data frame.

Let’s take a look at some examples of cyclic passing of the panda data frame. First we use the Pandas iterrows function to iterate the lines of the panda data frame. In addition to iterations, pandas also have a useful function itertuples(). We will also look at examples of using the itertuples() function to iterate through strings of panda data frames. There are subtle differences in the use of each of them, and we will see them.

We use an interesting dataset available in vega_datasets on Python.

1

2

3

4

from the import data of vega_datasets

pd

Let’s look at the records available in vega_datasets and use the record flights_2k.

1

2

3

data.list_records()

flights=data.flights_2k()

It contains information on departures, arrivals and distances for 2000 flights.

1

2

3

4

5

Stealing. Responsible

Date of delay Distance from destination

0 2001-01-14 21:55:00 0 CMF 480 SAN

1 2001-03-26 20:15:00 -11 SLC 507 PHX

2 2001-03-05 14:55:00 -3 LAX 714 ELP

How to scroll strings with Pandas()Iterations

Pandas has the iterrows() function that helps you search for every line in the data frame. The iterator of Pandas() gives an iterator which contains the index of each line and the data of each line as a single line.

Since iterrows() returns an iterator, you can use the function of the following to display the content of the iterator. You can see that the iteration lines return the tupel with a string index and string data as a serial object.

1

2

3

4

5

6

7

>next(flights.iterrows())

(0, date 2001-01-14 21:55:00)

Delay 0

Objectives of the SMF

Distance of 480

Source SAN

Name: 0, d-type: object)

The content line can be obtained by taking the second element of the tuple.

1

2

3

4

5

6

7

8

line = next (flights.iterrows()) [1]

Line

Date 2001-01-14 21:55:00

Delay 0

Objectives of the SMF

Distance of 480

Source SAN

Name: 0, d-type : Subject

We can easily go through the Pandas data frame and access the index of each line and the content of each line. Here we print the iterrows() and see that for each line we get an index and a series.

1

2

for the index, the line in flight.head(n=2).iterrows() :

Legal provisions

1

2

3

4

5

6

7

8

9

10

11

12

0 Date 2001-01-14 21:55:00

Delay 0

Objectives of the SMF

Distance of 480

Source SAN

Name: 0, d-type : Subject

1 Date 2001-03-26 20:15:00

Latent period -11

SLC objective

Distance 507

Origin PHX

Name: 1, d-type : Subject

Since these rows are returned in series, we can use the column names to access the values of each column in a row. Here we go through each row and assign an index and a string to the variables, assigning the name of the index and the string to the row index and the data. We then access the string data using the column names in the data frame.

1

2

3

4

for the index, the line in flight.head().iterrows() :

Printing (index, line [delay], line [distance], line [origin]).

1

2

3

4

5

0 0 0 0 480 SAN

1 -11,507 PHX

2 -3,714 PEL

3 12,342 SJC

4 2,373 SMF

Since iterrows() returns a string for each row, the data types are not stored for all rows. However, the data types are stored in columns for DataFrames. Let’s look at a simple example to illustrate this.

Let’s make a simple data frame with a row of two columns, where one column – int and the other – floats.

1

2

3

4

>df = pd.DataFrame([[3, 5.5]]], columns=[‘int_column’, ‘float_column’]).

>print(df)

int_column float_column

0 3 5.5

Let’s retrieve the content of the string using the iterrows() function and run the data type of the int_column column. In the source data framework int_column – integer. However, if you display the data type with the iterrows() function, the int_column column is an object with a floating point.

1

2

3

>Line = next(df.iterrows()) [1].

>print(line[‘int_column’].dtype)

float64

How to draw panda lines using iterators()

The best way to iterate/loop the panda data strings is to use the itertuples() function available in Pandas. As the itertuples() function indicates, itertuples() run through the data strands in the frame and return a name tuple.
The first element of the tuple is the index of the row and the other values of the tuple – the data of the row. Unlike iterations, string data is not stored in the series.

Let’s look at the content of the data frame and print each line of iterations.

1

2

3

4

5

6

7

for the line in flight.head().itertuples() :

Print step

Pandas (index=0, date=time stamp (‘2001-01-14 21:55:00′), delay=0, destination=’SMF’, distance=480, origin=’SAN’)

Pandas (index=1, date=time stamp (‘2001-03-26 20:15:00′), time=-11, destination=’SLC’, distance=507, origin=’PHX’)

Pandas (index=2, date=time stamp (‘2001-03-05 14:55:00′), delay=-3, destination=’LAX’, distance=714, origin=’ELP’)

Pandas (Index=3, Date=Time (‘2001-01-07 12:30:00′), Delay=12, Destination=’SNA’, Distance=342, Origin=’SJC’)

Pandas (index=4, date=time stamp (‘2001-01-18 12:00:00′), delay=2, destination=’LAX’, distance=373, origin=’SMF’)

We see that the iterators simply return the contents of a row as a tuple with the corresponding column names. So we can easily access the data with the column names and the index, for example

1

2

for the line in flight.head().itertuples() :

Expression (line.index, line date, line delay)

We get every line as

1

2

3

4

5

0 2001-01-14 21:55:00 0

1 2001-03-26 20:15:00 -11

2 2001-03-05 14:55:00 -3

3 2001-01-07 12:30:00 12

4 2001-01-18 12:00:00 2

Another advantage of iterrows is that they are faster than iterrows().

 

 

 pandas iterate over rows and columns,pandas iterate over rows and create new column

You May Also Like