Day 5.Movie Data Analysis Part.1

2019. 6. 13. 22:59

!ls

'Day 1.introduction.ipynb'		     'Day 3.Pandas.ipynb'
'Day 2. numpy.ipynb'			      ml
'Day 2. satellite image data analyis.ipynb'   wifire
'Day 3.Movie Data Analysis.ipynb'

In [2]:

!ls ./ml

links.csv  movies.csv  ratings.csv  README.txt	tags.csv

In [4]:

!cat ./ml/movies.csv | wc -l

In [6]:

!head -5 ./ml/movies.csv

In [7]:

!tail -5 ./ml/movies.csv

In [8]:

!head -5 ./ml/ratings.csv

In [10]:

from pandas import *

In [11]:

movies = read_csv('./ml/movies.csv', sep = ',')

In [12]:

type(movies)

Out[12]:

pandas.core.frame.DataFrame

In [14]:

movies.head(15)

Out[14]:

	movieId	title	genres
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy
2	3	Grumpier Old Men (1995)	Comedy\|Romance
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
4	5	Father of the Bride Part II (1995)	Comedy
5	6	Heat (1995)	Action\|Crime\|Thriller
6	7	Sabrina (1995)	Comedy\|Romance
7	8	Tom and Huck (1995)	Adventure\|Children
8	9	Sudden Death (1995)	Action
9	10	GoldenEye (1995)	Action\|Adventure\|Thriller
10	11	American President, The (1995)	Comedy\|Drama\|Romance
11	12	Dracula: Dead and Loving It (1995)	Comedy\|Horror
12	13	Balto (1995)	Adventure\|Animation\|Children
13	14	Nixon (1995)	Drama
14	15	Cutthroat Island (1995)	Action\|Adventure\|Romance

In [20]:

tags = read_csv('./ml/tags.csv', sep = ',')
tags.head()

Out[20]:

	userId	movieId	tag	timestamp
0	2	60756	funny	1445714994
1	2	60756	Highly quotable	1445714996
2	2	60756	will ferrell	1445714992
3	2	89774	Boxing story	1445715207
4	2	89774	MMA	1445715200

In [18]:

ratings = read_csv("./ml/ratings.csv", sep = ",", parse_dates =['timestamp'])
ratings.head()

Out[18]:

	userId	movieId	rating	timestamp
0	1	1	4.0	964982703
1	1	3	4.0	964981247
2	1	6	4.0	964982224
3	1	47	5.0	964983815
4	1	50	5.0	964982931

In [24]:

del ratings['timestamp']
del tags['timestamp']

In [25]:

row_0 = tags.iloc[0]
type(row_0)

Out[25]:

pandas.core.series.Series

In [26]:

row_0

Out[26]:

userId         2
movieId    60756
tag        funny
Name: 0, dtype: object

In [28]:

row_0.index

Out[28]:

Index(['userId', 'movieId', 'tag'], dtype='object')

In [29]:

row_0['userId']

Out[29]:

In [30]:

'rating' in row_0

Out[30]:

False

In [32]:

tags.head()

Out[32]:

	userId	movieId	tag
0	2	60756	funny
1	2	60756	Highly quotable
2	2	60756	will ferrell
3	2	89774	Boxing story
4	2	89774	MMA

In [33]:

tags.index

Out[33]:

RangeIndex(start=0, stop=3683, step=1)

In [34]:

tags.columns

Out[34]:

Index(['userId', 'movieId', 'tag'], dtype='object')

In [37]:

tags.iloc[[0, 11, 2000]]

Out[37]:

	userId	movieId	tag
0	2	60756	funny
11	18	431	gangster
2000	474	5450	women

In [ ]:

저작자표시

'Python Library > Pandas' 카테고리의 다른 글

Day 6. Frequent operations with pandas - aggregation (0)	2019.06.15
Day 6. Frequent operations with pandas - subsetting, filtering, delegation (0)	2019.06.15
Day 6. Simple visualization with pandas (0)	2019.06.15
Day 6.Movie Data Analysis Part.2 (0)	2019.06.15
Day 5.Pandas (0)	2019.06.13

Software knowledge worth spreading

Day 5.Movie Data Analysis Part.1

'Python Library > Pandas' 카테고리의 다른 글

+ Recent posts

티스토리툴바