Day 6. Frequent operations with pandas - subsetting, filtering, delegation

2019. 6. 15. 20:43

#subsetting , filtering , insertion, deletion, aggregation

In [2]:

from pandas import *

In [3]:

df = read_csv("./ml/movies.csv", sep = ",")
df.head()

Out[3]:

	movieId	title	genres
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy
2	3	Grumpier Old Men (1995)	Comedy\|Romance
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
4	5	Father of the Bride Part II (1995)	Comedy

In [6]:

df[['title', 'genres']].head()  # extract specific coulmns

Out[6]:

	title	genres
0	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	Jumanji (1995)	Adventure\|Children\|Fantasy
2	Grumpier Old Men (1995)	Comedy\|Romance
3	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
4	Father of the Bride Part II (1995)	Comedy

In [9]:

df[ df['movieId'] > 20 ].head()  #filtering out based on conditions

Out[9]:

	movieId	title	genres
20	21	Get Shorty (1995)	Comedy\|Crime\|Thriller
21	22	Copycat (1995)	Crime\|Drama\|Horror\|Mystery\|Thriller
22	23	Assassins (1995)	Action\|Crime\|Thriller
23	24	Powder (1995)	Drama\|Sci-Fi
24	25	Leaving Las Vegas (1995)	Drama\|Romance

In [10]:

df['movieId2'] = df['movieId'] + 1  #adding a new column
df.head()

Out[10]:

	movieId	title	genres	movieId2
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy	2
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy	3
2	3	Grumpier Old Men (1995)	Comedy\|Romance	4
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance	5
4	5	Father of the Bride Part II (1995)	Comedy	6

In [14]:

df.loc[0] = [1, "newRow", "newGenres",None] #replacing first row with new contents
df.head()

Out[14]:

	movieId	title	genres	movieId2
0	1	newRow	newGenres	NaN
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy	3.0
2	3	Grumpier Old Men (1995)	Comedy\|Romance	4.0
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance	5.0
4	5	Father of the Bride Part II (1995)	Comedy	6.0

In [19]:

df = df.drop(df.index[[0]])  #drop rows
df.head()

Out[19]:

	movieId	title	genres	movieId2
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy	3.0
2	3	Grumpier Old Men (1995)	Comedy\|Romance	4.0
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance	5.0
4	5	Father of the Bride Part II (1995)	Comedy	6.0
5	6	Heat (1995)	Action\|Crime\|Thriller	7.0

In [20]:

del df['movieId2'] #delete a column
df.head()

Out[20]:

In [24]:

df['groupName'] = df['movieId'] % 10
df.head()

Out[24]:

	movieId	title	genres	groupName
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy	2
2	3	Grumpier Old Men (1995)	Comedy\|Romance	3
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance	4
4	5	Father of the Bride Part II (1995)	Comedy	5
5	6	Heat (1995)	Action\|Crime\|Thriller	6

In [25]:

df.groupby('groupName').mean()

Out[25]:

In [ ]:

Day 6. Frequent operations with pandas - merging (0)	2019.06.15
Day 6. Frequent operations with pandas - aggregation (0)	2019.06.15
Day 6. Simple visualization with pandas (0)	2019.06.15
Day 6.Movie Data Analysis Part.2 (0)	2019.06.15
Day 5.Movie Data Analysis Part.1 (0)	2019.06.13

Software knowledge worth spreading