Day 6. Frequent operations with pandas -Summary

2019. 6. 16. 00:11

from pandas import *

ratings = read_csv("./ml/ratings.csv")
movies  = read_csv("./ml/movies.csv")
tags    = read_csv("./ml/tags.csv")


#as_index = False generate new sequential indices
avg_ratings = ratings.groupby('movieId', as_index = False).mean()
avg_ratings.head()
del avg_ratings[ 'userId']
del avg_ratings[ 'timestamp']

In [7]:

avg_ratings.head()

Out[7]:

	movieId	rating
0	1	3.920930
1	2	3.431818
2	3	3.259615
3	4	2.357143
4	5	3.071429

In [8]:

box_office = movies.merge(avg_ratings, on = 'movieId', how = 'inner')
box_office.tail()

Out[8]:

	movieId	title	genres	rating
9719	193581	Black Butler: Book of the Atlantic (2017)	Action\|Animation\|Comedy\|Fantasy	4.0
9720	193583	No Game No Life: Zero (2017)	Animation\|Comedy\|Fantasy	3.5
9721	193585	Flint (2017)	Drama	3.5
9722	193587	Bungo Stray Dogs: Dead Apple (2018)	Action\|Animation	3.5
9723	193609	Andrew Dice Clay: Dice Rules (1991)	Comedy	4.0

In [10]:

is_highly_rated = box_office['rating'] >= 4.0
is_highly_rated.head()

Out[10]:

0    False
1    False
2    False
3    False
4    False
Name: rating, dtype: bool

In [13]:

box_office[is_highly_rated][ : 10]

Out[13]:

	movieId	title	genres	rating
27	28	Persuasion (1995)	Drama\|Romance	4.227273
28	29	City of Lost Children, The (Cité des enfants p...	Adventure\|Drama\|Fantasy\|Mystery\|Sci-Fi	4.013158
36	40	Cry, the Beloved Country (1995)	Drama	4.250000
46	50	Usual Suspects, The (1995)	Crime\|Mystery\|Thriller	4.237745
48	53	Lamerica (1994)	Adventure\|Drama	5.000000
50	55	Georgia (1995)	Drama	4.000000
52	58	Postman, The (Postino, Il) (1994)	Comedy\|Drama\|Romance	4.027027
66	74	Bed of Roses (1996)	Drama\|Romance	4.000000
69	77	Nico Icon (1995)	Documentary	4.000000
72	80	White Balloon, The (Badkonake sefid) (1995)	Children\|Drama	4.000000

In [15]:

is_commedy = box_office['genres'].str.contains('Comedy')

box_office[is_commedy & is_highly_rated][-10 : ]

Out[15]:

	movieId	title	genres	rating
9680	184997	Love, Simon (2018)	Comedy\|Drama	4.0
9694	188189	Sorry to Bother You (2018)	Comedy\|Fantasy\|Sci-Fi	4.5
9697	188751	Mamma Mia: Here We Go Again! (2018)	Comedy\|Romance	4.5
9698	188797	Tag (2018)	Comedy	4.0
9699	188833	The Man Who Killed Don Quixote (2018)	Adventure\|Comedy\|Fantasy	4.5
9708	190209	Jeff Ross Roasts the Border (2017)	Comedy	4.0
9713	191005	Gintama (2017)	Action\|Adventure\|Comedy\|Sci-Fi	4.5
9716	193571	Silver Spoon (2014)	Comedy\|Drama	4.0
9719	193581	Black Butler: Book of the Atlantic (2017)	Action\|Animation\|Comedy\|Fantasy	4.0
9723	193609	Andrew Dice Clay: Dice Rules (1991)	Comedy	4.0

저작자표시 (새창열림)

'Python Library > Pandas' 카테고리의 다른 글

Day 6. Handling Timestamps with Pandas (0)	2019.06.16
Day 6. String Operations with Pandas (0)	2019.06.16
Day 6. Frequent operations with pandas - merging (0)	2019.06.15
Day 6. Frequent operations with pandas - aggregation (0)	2019.06.15
Day 6. Frequent operations with pandas - subsetting, filtering, delegation (0)	2019.06.15

Software knowledge worth spreading

Day 6. Frequent operations with pandas -Summary

'Python Library > Pandas' 카테고리의 다른 글

+ Recent posts

티스토리툴바