Day 6.Movie Data Analysis Part.2

2019. 6. 15. 19:46

from pandas import *

In [42]:

ratings.describe()

Out[42]:

	userId	movieId	rating
count	100836.000000	100836.000000	100836.000000
mean	326.127564	19435.295718	3.501557
std	182.618491	35530.987199	1.042529
min	1.000000	1.000000	0.500000
25%	177.000000	1199.000000	3.000000
50%	325.000000	2991.000000	3.500000
75%	477.000000	8122.000000	4.000000
max	610.000000	193609.000000	5.000000

In [44]:

ratings.corr()

Out[44]:

	userId	movieId	rating
userId	1.000000	0.006773	-0.049348
movieId	0.006773	1.000000	-0.004061
rating	-0.049348	-0.004061	1.000000

In [48]:

ratings['rating'].describe()

Out[48]:

count    100836.000000
mean          3.501557
std           1.042529
min           0.500000
25%           3.000000
50%           3.500000
75%           4.000000
max           5.000000
Name: rating, dtype: float64

In [49]:

ratings['rating'].mean()

Out[49]:

3.501556983616962

In [50]:

ratings.mean()

Out[50]:

userId       326.127564
movieId    19435.295718
rating         3.501557
dtype: float64

In [51]:

ratings['rating'].min()

Out[51]:

0.5

In [52]:

ratings['rating'].std()

Out[52]:

1.0425292390605359

In [53]:

ratings['rating'].mode()  # what occur most frequantly

Out[53]:

0    4.0
dtype: float64

In [54]:

ratings.corr()

Out[54]:

	userId	movieId	rating
userId	1.000000	0.006773	-0.049348
movieId	0.006773	1.000000	-0.004061
rating	-0.049348	-0.004061	1.000000

In [56]:

filter_l = ratings['rating'] > 5  # create boolean series
filter_l.any()                    #check if any true in series

Out[56]:

False

In [62]:

movies.shape

Out[62]:

(9742, 3)

In [63]:

movies.head()

Out[63]:

	movieId	title	genres
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy
2	3	Grumpier Old Men (1995)	Comedy\|Romance
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
4	5	Father of the Bride Part II (1995)	Comedy

In [65]:

movies.isnull().any()

Out[65]:

movieId    False
title      False
genres     False
dtype: bool

In [66]:

ratings.shape

Out[66]:

(100836, 3)

In [67]:

ratings.isnull().any()

Out[67]:

userId     False
movieId    False
rating     False
dtype: bool

In [68]:

tags.shape

Out[68]:

(3683, 3)

In [69]:

tags.isnull().any()

Out[69]:

userId     False
movieId    False
tag        False
dtype: bool

In [70]:

tags = tags.dropna()
tags.head()

Out[70]:

	userId	movieId	tag
0	2	60756	funny
1	2	60756	Highly quotable
2	2	60756	will ferrell
3	2	89774	Boxing story
4	2	89774	MMA

In [ ]:

저작자표시 (새창열림)

'Python Library > Pandas' 카테고리의 다른 글

Day 6. Frequent operations with pandas - aggregation (0)	2019.06.15
Day 6. Frequent operations with pandas - subsetting, filtering, delegation (0)	2019.06.15
Day 6. Simple visualization with pandas (0)	2019.06.15
Day 5.Movie Data Analysis Part.1 (0)	2019.06.13
Day 5.Pandas (0)	2019.06.13

Software knowledge worth spreading

Day 6.Movie Data Analysis Part.2

'Python Library > Pandas' 카테고리의 다른 글

+ Recent posts

티스토리툴바