Day 6. Handling Timestamps with Pandas

2019. 6. 16. 01:21

from pandas import *

In [2]:

tags = read_csv("./ml/tags.csv")

In [3]:

# Unit Of The Input Declared To Be Second
tags['parsed_tim'] = to_datetime(tags['timestamp'], unit = 's')

In [4]:

tags.head()

Out[4]:

	userId	movieId	tag	timestamp	parsed_tim
0	2	60756	funny	1445714994	2015-10-24 19:29:54
1	2	60756	Highly quotable	1445714996	2015-10-24 19:29:56
2	2	60756	will ferrell	1445714992	2015-10-24 19:29:52
3	2	89774	Boxing story	1445715207	2015-10-24 19:33:27
4	2	89774	MMA	1445715200	2015-10-24 19:33:20

In [5]:

greater_than_t = tags['parsed_tim'] > '2015-02-01'

In [6]:

selected_rows = tags[greater_than_t]

In [7]:

selected_rows.head()

Out[7]:

	userId	movieId	tag	timestamp	parsed_tim
0	2	60756	funny	1445714994	2015-10-24 19:29:54
1	2	60756	Highly quotable	1445714996	2015-10-24 19:29:56
2	2	60756	will ferrell	1445714992	2015-10-24 19:29:52
3	2	89774	Boxing story	1445715207	2015-10-24 19:33:27
4	2	89774	MMA	1445715200	2015-10-24 19:33:20

In [9]:

# Sort Values By Pasred_Time
tags.sort_values(by = 'parsed_tim', ascending = True)[ : 10]

Out[9]:

	userId	movieId	tag	timestamp	parsed_tim
1756	474	3181	Shakespeare	1137179352	2006-01-13 19:09:12
2212	474	6912	Rita Hayworth can dance!	1137179371	2006-01-13 19:09:31
1636	474	2494	Hungary	1137179426	2006-01-13 19:10:26
1635	474	2494	Holocaust	1137179426	2006-01-13 19:10:26
1497	474	1836	No DVD at Netflix	1137179444	2006-01-13 19:10:44
1961	474	4969	In Netflix queue	1137179563	2006-01-13 19:12:43
2409	474	26242	In Netflix queue	1137179570	2006-01-13 19:12:50
2413	474	27741	In Netflix queue	1137179587	2006-01-13 19:13:07
2231	474	7025	In Netflix queue	1137179593	2006-01-13 19:13:13
2485	474	41997	In Netflix queue	1137179603	2006-01-13 19:13:23

In [11]:

tags = read_csv("./ml/tags.csv")
tags.dtypes

Out[11]:

userId        int64
movieId       int64
tag          object
timestamp     int64
dtype: object

In [14]:

tags['parsed_time'] = to_datetime(tags['timestamp'], unit = 's')

In [17]:

tags['parsed_time'].dtype
# M8 is a data type for indicatng date type

Out[17]:

dtype('<M8[ns]')

In [18]:

tags.sort_values(by = 'parsed_time', ascending = True)[ : 10]

Out[18]:

	userId	movieId	tag	timestamp	parsed_time
1756	474	3181	Shakespeare	1137179352	2006-01-13 19:09:12
2212	474	6912	Rita Hayworth can dance!	1137179371	2006-01-13 19:09:31
1636	474	2494	Hungary	1137179426	2006-01-13 19:10:26
1635	474	2494	Holocaust	1137179426	2006-01-13 19:10:26
1497	474	1836	No DVD at Netflix	1137179444	2006-01-13 19:10:44
1961	474	4969	In Netflix queue	1137179563	2006-01-13 19:12:43
2409	474	26242	In Netflix queue	1137179570	2006-01-13 19:12:50
2413	474	27741	In Netflix queue	1137179587	2006-01-13 19:13:07
2231	474	7025	In Netflix queue	1137179593	2006-01-13 19:13:13
2485	474	41997	In Netflix queue	1137179603	2006-01-13 19:13:23

In [33]:

ratings = read_csv("./ml/ratings.csv")
average_rating = ratings[['movieId', 'rating']].groupby('movieId', as_index = False).mean()

In [34]:

average_rating.tail()

Out[34]:

	movieId	rating
9719	193581	4.0
9720	193583	3.5
9721	193585	3.5
9722	193587	3.5
9723	193609	4.0

In [35]:

movies = read_csv("./ml/movies.csv")
joined = movies.merge(average_rating, on = 'movieId', how = 'inner')
joined.head()

Out[35]:

	movieId	title	genres	rating
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy	3.920930
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy	3.431818
2	3	Grumpier Old Men (1995)	Comedy\|Romance	3.259615
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance	2.357143
4	5	Father of the Bride Part II (1995)	Comedy	3.071429

In [36]:

joined.corr()

Out[36]:

	movieId	rating
movieId	1.000000	0.027841
rating	0.027841	1.000000

In [37]:

joined.head()

Out[37]:

	movieId	title	genres	rating
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy	3.920930
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy	3.431818
2	3	Grumpier Old Men (1995)	Comedy\|Romance	3.259615
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance	2.357143
4	5	Father of the Bride Part II (1995)	Comedy	3.071429

In [38]:

joined['year'] = joined['title'].str.extract(".*\((.*)\).*")

In [39]:

joined

Out[39]:

	movieId	title	genres	rating	year
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy	3.920930	1995
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy	3.431818	1995
2	3	Grumpier Old Men (1995)	Comedy\|Romance	3.259615	1995
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance	2.357143	1995
4	5	Father of the Bride Part II (1995)	Comedy	3.071429	1995
5	6	Heat (1995)	Action\|Crime\|Thriller	3.946078	1995
6	7	Sabrina (1995)	Comedy\|Romance	3.185185	1995
7	8	Tom and Huck (1995)	Adventure\|Children	2.875000	1995
8	9	Sudden Death (1995)	Action	3.125000	1995
9	10	GoldenEye (1995)	Action\|Adventure\|Thriller	3.496212	1995
10	11	American President, The (1995)	Comedy\|Drama\|Romance	3.671429	1995
11	12	Dracula: Dead and Loving It (1995)	Comedy\|Horror	2.421053	1995
12	13	Balto (1995)	Adventure\|Animation\|Children	3.125000	1995
13	14	Nixon (1995)	Drama	3.833333	1995
14	15	Cutthroat Island (1995)	Action\|Adventure\|Romance	3.000000	1995
15	16	Casino (1995)	Crime\|Drama	3.926829	1995
16	17	Sense and Sensibility (1995)	Drama\|Romance	3.776119	1995
17	18	Four Rooms (1995)	Comedy	3.700000	1995
18	19	Ace Ventura: When Nature Calls (1995)	Comedy	2.727273	1995
19	20	Money Train (1995)	Action\|Comedy\|Crime\|Drama\|Thriller	2.500000	1995
20	21	Get Shorty (1995)	Comedy\|Crime\|Thriller	3.494382	1995
21	22	Copycat (1995)	Crime\|Drama\|Horror\|Mystery\|Thriller	3.222222	1995
22	23	Assassins (1995)	Action\|Crime\|Thriller	3.125000	1995
23	24	Powder (1995)	Drama\|Sci-Fi	3.125000	1995
24	25	Leaving Las Vegas (1995)	Drama\|Romance	3.625000	1995
25	26	Othello (1995)	Drama	3.500000	1995
26	27	Now and Then (1995)	Children\|Drama	3.333333	1995
27	28	Persuasion (1995)	Drama\|Romance	4.227273	1995
28	29	City of Lost Children, The (Cité des enfants p...	Adventure\|Drama\|Fantasy\|Mystery\|Sci-Fi	4.013158	1995
29	30	Shanghai Triad (Yao a yao yao dao waipo qiao) ...	Crime\|Drama	3.000000	1995
...	...	...	...	...	...
9694	188189	Sorry to Bother You (2018)	Comedy\|Fantasy\|Sci-Fi	4.500000	2018
9695	188301	Ant-Man and the Wasp (2018)	Action\|Adventure\|Comedy\|Fantasy\|Sci-Fi	3.666667	2018
9696	188675	Dogman (2018)	Crime\|Drama	3.500000	2018
9697	188751	Mamma Mia: Here We Go Again! (2018)	Comedy\|Romance	4.500000	2018
9698	188797	Tag (2018)	Comedy	4.000000	2018
9699	188833	The Man Who Killed Don Quixote (2018)	Adventure\|Comedy\|Fantasy	4.500000	2018
9700	189043	Boundaries (2018)	Comedy\|Drama	2.500000	2018
9701	189111	Spiral (2018)	Documentary	3.000000	2018
9702	189333	Mission: Impossible - Fallout (2018)	Action\|Adventure\|Thriller	3.750000	2018
9703	189381	SuperFly (2018)	Action\|Crime\|Thriller	2.500000	2018
9704	189547	Iron Soldier (2010)	Action\|Sci-Fi	1.000000	2010
9705	189713	BlacKkKlansman (2018)	Comedy\|Crime\|Drama	2.500000	2018
9706	190183	The Darkest Minds (2018)	Sci-Fi\|Thriller	3.500000	2018
9707	190207	Tilt (2011)	Drama\|Romance	1.500000	2011
9708	190209	Jeff Ross Roasts the Border (2017)	Comedy	4.000000	2017
9709	190213	John From (2015)	Drama	1.000000	2015
9710	190215	Liquid Truth (2017)	Drama	1.500000	2017
9711	190219	Bunny (1998)	Animation	1.000000	1998
9712	190221	Hommage à Zgougou (et salut à Sabine Mamou) (2...	Documentary	1.000000	2002
9713	191005	Gintama (2017)	Action\|Adventure\|Comedy\|Sci-Fi	4.500000	2017
9714	193565	Gintama: The Movie (2010)	Action\|Animation\|Comedy\|Sci-Fi	3.500000	2010
9715	193567	anohana: The Flower We Saw That Day - The Movi...	Animation\|Drama	3.000000	2013
9716	193571	Silver Spoon (2014)	Comedy\|Drama	4.000000	2014
9717	193573	Love Live! The School Idol Movie (2015)	Animation	4.000000	2015
9718	193579	Jon Stewart Has Left the Building (2015)	Documentary	3.500000	2015
9719	193581	Black Butler: Book of the Atlantic (2017)	Action\|Animation\|Comedy\|Fantasy	4.000000	2017
9720	193583	No Game No Life: Zero (2017)	Animation\|Comedy\|Fantasy	3.500000	2017
9721	193585	Flint (2017)	Drama	3.500000	2017
9722	193587	Bungo Stray Dogs: Dead Apple (2018)	Action\|Animation	3.500000	2018
9723	193609	Andrew Dice Clay: Dice Rules (1991)	Comedy	4.000000	1991

9724 rows × 5 columns

In [40]:

yearly_average = joined[['year','rating']].groupby('year', as_index = False).mean()

In [41]:

yearly_average.head()

Out[41]:

	year	rating
0	1902	3.5000
1	1903	2.5000
2	1908	4.0000
3	1915	2.0000
4	1916	3.5625

In [43]:

yearly_average[-20 : ].plot(x = 'year' , y = 'rating' , figsize = (15,10), grid = True)

Out[43]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f2b533f77b8>

저작자표시

'Python Library > Pandas' 카테고리의 다른 글

Day 7. Machine Learning [ K - Means ] ( Local Clustering ) (0)	2019.06.16
Day 7. Machine Learning [ Decision Trees ] ( Weather Classification ) (0)	2019.06.16
Day 6. String Operations with Pandas (0)	2019.06.16
Day 6. Frequent operations with pandas -Summary (0)	2019.06.16
Day 6. Frequent operations with pandas - merging (0)	2019.06.15

Software knowledge worth spreading

Day 6. Handling Timestamps with Pandas

'Python Library > Pandas' 카테고리의 다른 글

+ Recent posts

티스토리툴바