Day 6. String Operations with Pandas

2019. 6. 16. 00:39

from pandas import *

In [2]:

d = { 'one' : Series(['city_0', 'city_1']), 'two' : Series(['user_0', 'user_1'])}

df = DataFrame(d)
df

Out[2]:

	one	two
0	city_0	user_0
1	city_1	user_1

In [3]:

df['one'].str.split('_')

Out[3]:

0    [city, 0]
1    [city, 1]
Name: one, dtype: object

In [4]:

type(df['one'].str.split('_'))  #Series that contain indices and String List

Out[4]:

pandas.core.series.Series

In [6]:

df['one'].str.contains('1') #Check If There is '1' in each rows

Out[6]:

0    False
1     True
Name: one, dtype: bool

In [8]:

df['one'].str.replace('_', '##')

Out[8]:

0    city##0
1    city##1
Name: one, dtype: object

In [9]:

df['one'].str.extract('(_[0-9])')

Out[9]:

	0
0	_0
1	_1

In [11]:

movies = read_csv("./ml/movies.csv")

In [12]:

movies.head()

Out[12]:

	movieId	title	genres
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy
2	3	Grumpier Old Men (1995)	Comedy\|Romance
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
4	5	Father of the Bride Part II (1995)	Comedy

In [18]:

movies['genres'].str.split("|", expand = True).head()

Out[18]:

In [20]:

movie_genres = movies['genres'].str.split("|", expand = True)
movie_genres['isComedy'] = movies['genres'].str.contains('Comedy')

In [21]:

movie_genres[ : 10]

Out[21]:

	0	1	2	3	4	5	6	7	8	9	isComedy
0	Adventure	Animation	Children	Comedy	Fantasy	None	None	None	None	None	True
1	Adventure	Children	Fantasy	None	None	None	None	None	None	None	False
2	Comedy	Romance	None	None	None	None	None	None	None	None	True
3	Comedy	Drama	Romance	None	None	None	None	None	None	None	True
4	Comedy	None	None	None	None	None	None	None	None	None	True
5	Action	Crime	Thriller	None	None	None	None	None	None	None	False
6	Comedy	Romance	None	None	None	None	None	None	None	None	True
7	Adventure	Children	None	None	None	None	None	None	None	None	False
8	Action	None	None	None	None	None	None	None	None	None	False
9	Action	Adventure	Thriller	None	None	None	None	None	None	None	False

In [22]:

movies[:5]

Out[22]:

	movieId	title	genres
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy
2	3	Grumpier Old Men (1995)	Comedy\|Romance
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
4	5	Father of the Bride Part II (1995)	Comedy

In [28]:

movies['title'].str.extract('.*\((.*)\).*', expand = True).head()

Out[28]:

In [29]:

movies['year'] = movies['title'].str.extract('.*\((.*)\).*', expand = True)

In [30]:

movies.tail()

Out[30]:

	movieId	title	genres	year
9737	193581	Black Butler: Book of the Atlantic (2017)	Action\|Animation\|Comedy\|Fantasy	2017
9738	193583	No Game No Life: Zero (2017)	Animation\|Comedy\|Fantasy	2017
9739	193585	Flint (2017)	Drama	2017
9740	193587	Bungo Stray Dogs: Dead Apple (2018)	Action\|Animation	2018
9741	193609	Andrew Dice Clay: Dice Rules (1991)	Comedy	1991

In [31]:

Day 7. Machine Learning [ Decision Trees ] ( Weather Classification ) (0)	2019.06.16
Day 6. Handling Timestamps with Pandas (0)	2019.06.16
Day 6. Frequent operations with pandas -Summary (0)	2019.06.16
Day 6. Frequent operations with pandas - merging (0)	2019.06.15
Day 6. Frequent operations with pandas - aggregation (0)	2019.06.15

Software knowledge worth spreading