Pandas

Series節選

In [772]:
import numpy as np
import pandas as pd

創建

創建空Series

In [773]:
s = pd.Series()
print(s)
Series([], dtype: float64)

使用常數創建

In [774]:
print(type(5))
pd.Series(5)
<class 'int'>
Out[774]:
0    5
dtype: int64

使用list創建

In [775]:
l=[1,3,5,np.nan,45]
print(type(l))
s = pd.Series(l)
s
<class 'list'>
Out[775]:
0     1.0
1     3.0
2     5.0
3     NaN
4    45.0
dtype: float64

使用dict創建

In [776]:
d = {'b' : 1, 'a' : 0, 'c' : 2}
print(type(d))
pd.Series(d)
<class 'dict'>
Out[776]:
b    1
a    0
c    2
dtype: int64
In [777]:
d = {'a' : 0., 'b' : 1., 'c' : 2.}  
pd.Series(d)  # 索引順序按照字典次序
Out[777]:
a    0.0
b    1.0
c    2.0
dtype: float64
In [778]:
pd.Series(d, index=['b', 'c', 'd', 'a'])  # 按照给定的index生成索引顺序
Out[778]:
b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

根據ndarray創建

In [779]:
index=list('abcde')
s = pd.Series(np.random.randn(5), index=index)
print(type(np.random.randn(5)))
s
<class 'numpy.ndarray'>
Out[779]:
a    0.418042
b   -0.454595
c   -0.402959
d    1.897117
e    0.558703
dtype: float64

創建時同時添加索引

In [780]:
l=[1,3,5,np.nan,45]
index=list('ABCDE')
s = pd.Series(l, index=index)
s
Out[780]:
A     1.0
B     3.0
C     5.0
D     NaN
E    45.0
dtype: float64
In [781]:
s.index
Out[781]:
Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

API excerpts

命名 s.name()

  • Allows to give a name to a Series object, i.e. to the column
In [782]:
series = pd.Series([2, 43, 9, 27, np.nan], name='Jasper')
series.name
Out[782]:
'Jasper'

更名 s.rename()

In [783]:
series = pd.Series([2, 43, 9, 27, np.nan], name='Jasper')
series2 = series.rename("Steven")  # Note that series and series2 refer to different objects.
print(series.name)
series2.name
Jasper
Out[783]:
'Steven'

比較 等於 s.eq() ==

  • Series.eq(other, level=None, fill_value=None)
  • The results are returned on the basis of comparison caller series = other series.
In [784]:
series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.eq(series2, fill_value = replace_nan) 
print(result)  
print('-'*50)
result2=series1.fillna(10)==series2.fillna(10)
print(result2)
0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0    False
1    False
2    False
3     True
4    False
dtype: bool
--------------------------------------------------
0    False
1    False
2    False
3     True
4    False
dtype: bool

比較 不等於 s.ne() !=

  • Series.ne(other, level=None, fill_value=None)
  • The results are returned on the basis of comparison caller series != other series.
In [785]:
series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.ne(series2, fill_value = replace_nan) 
print(result)
result2=series1.fillna(10)!=series2.fillna(10)
print(result2)
0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0     True
1     True
2     True
3    False
4     True
dtype: bool
0     True
1     True
2     True
3    False
4     True
dtype: bool

比較 小於等於 s.le() <=

  • Series.le(other, level=None, fill_value=None, axis=0)
  • The results are returned on the basis of comparison caller series <= other series.
  • In case of strings, the comparison is made with their ASCII values.
In [786]:
series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.le(series2, fill_value = replace_nan) 
print(result)  
print('-'*50)
result2=series1.fillna(10)<=series2.fillna(10)
print(result2)
0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0     True
1    False
2    False
3     True
4     True
dtype: bool
--------------------------------------------------
0     True
1    False
2    False
3     True
4     True
dtype: bool

比較 大於等於 s.ge() >=

  • Series.ge(other, level=None, fill_value=None, axis=0)
  • The results are returned on the basis of comparison caller series >= other series.
In [787]:
series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.ge(series2, fill_value = replace_nan) 
print(result)  
print('-'*50)
result2=series1.fillna(10)>=series2.fillna(10)
print(result2)
0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0    False
1     True
2     True
3     True
4    False
dtype: bool
--------------------------------------------------
0    False
1     True
2     True
3     True
4    False
dtype: bool

比較 小於 s.lt() <

  • Series.lt(other, level=None, fill_value=None, axis=0)
  • The results are returned on the basis of comparision caller series < other series.
In [788]:
series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.lt(series2, fill_value = replace_nan) 
print(result)  
print('-'*50)
result2=series1.fillna(10)<series2.fillna(10)
print(result2)
0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0     True
1    False
2    False
3    False
4     True
dtype: bool
--------------------------------------------------
0     True
1    False
2    False
3    False
4     True
dtype: bool

比較 大於 s.gt() >

  • Series.gt(other, level=None, fill_value=None, axis=0)
  • The results are returned on the basis of comparision caller series > other series.
In [789]:
series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.gt(series2, fill_value = replace_nan) 
print(result)  
print('-'*50)
result2=series1.fillna(10)>series2.fillna(10)
print(result2)
0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0    False
1     True
2     True
3    False
4    False
dtype: bool
--------------------------------------------------
0    False
1     True
2     True
3    False
4    False
dtype: bool

比較 範圍 s.between()

  • Series.between(left, right, inclusive=True)
  • be used on series to check which values lie between first and second argument.
  • inclusive: A Boolean value which is True by default. If False, it excludes the two passed arguments while checking.
In [790]:
series =pd.Series([5, 3, 1, 1, np.nan, 9, 21, 3, 8])
series.between(2,8)
Out[790]:
0     True
1     True
2    False
3    False
4    False
5    False
6    False
7     True
8     True
dtype: bool

合並 s.combine_first()

  • Series.combine_first(other)
  • Pandas combine_first() method is used to combine two series into one. The result is union of the two series that is in case of Null value in caller series, the value from passed series is taken. In case of both null values at the same index, null is returned at that index.
  • This method is different from Series.combine() which takes a function as parameter to decide output value.
In [791]:
series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, np.nan])  
print(series2)
print('-'*50)
result = series1.combine_first(series2) 
print(result)  
0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64

合並 s.combine()

  • Series.combine(other, func, fill_value=nan)
  • Pandas Series.combine() is a series mathematical operation method. This is used to combine two series into one. The shape of output series is same as the caller series. The elements are decided by a function passed as parameter to combine() method. The shape of both series has to be same otherwise it will throw an error.
In [792]:
first =[1, 2, 5, 6, 3, 7, 11, 0, 4] 
second =[5, 3, 2, 1, 3, 9, 21, 3, 1] 
first = pd.Series(first) 
second = pd.Series(second) 
#result = first.combine(second, (lambda x1,x2: x1+x2))
result = first.combine(second, (lambda x1, x2: x1 if x1 < x2 else x2)) 
result 
Out[792]:
0     1
1     2
2     2
3     1
4     3
5     7
6    11
7     0
8     1
dtype: int64

計數(含空) s.szie

  • Returns the number of elements in the underlying data
In [793]:
first =[1, 2, 5, 6, 3, 7, 11, 0, 4] 
second =[5, 3, 2, 1, 3, 9, 21, 3, np.nan] 
first = pd.Series(first) 
second = pd.Series(second) 
print(first.size)
print(second.size)
9
9

計數(非空) s.count()

  • Returns number of non-NA/null observations in the Series
In [794]:
first =[1, 2, 5, 6, 3, 7, 11, 0, 4] 
second =[5, 3, 2, 1, 3, 9, 21, 3, np.nan] 
first = pd.Series(first) 
second = pd.Series(second) 
print(first.count())
print(second.count())
9
8

計算 s.add() s.radd()

  • Method is used to add series or list like objects with same length to the caller series
  • Series.radd(other, level=None, fill_value=None, axis=0)
  • Equivalent to other + series, but with support to substitute a fill_value for missing data in one of the inputs.
In [795]:
index=['a', 'b', 'c', 'd', 'e']
series = pd.Series(np.random.randn(5), index=index)
print(series)
print('-'*50)
print(series+series)
print('-'*50)
print(series.add(series))
print('-'*50)
print(series.radd(series))
a    2.910306
b   -0.488479
c   -0.605480
d   -0.522796
e   -0.581465
dtype: float64
--------------------------------------------------
a    5.820613
b   -0.976958
c   -1.210961
d   -1.045592
e   -1.162931
dtype: float64
--------------------------------------------------
a    5.820613
b   -0.976958
c   -1.210961
d   -1.045592
e   -1.162931
dtype: float64
--------------------------------------------------
a    5.820613
b   -0.976958
c   -1.210961
d   -1.045592
e   -1.162931
dtype: float64

計算 s.sub()

  • be used to subtract series or list like objects with same length from the caller series
  • df.sub(other, axis='columns', level=None, fill_value=None)
  • Equivalent to _Series - other__ , but with support to substitute a fill_value for missing data in one of the inputs.

  • Parameters:

other : Series, DataFrame, or constant

axis : {0, 1, ‘index’, ‘columns’}

  • For Series input, axis to match Series index on

level : int or name

  • Broadcast across a level, matching Index values on the passed MultiIndex level

fill_value : None or float value, default None

  • Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing
In [796]:
index = pd.date_range('1/1/2000', periods=8)
series1 = pd.Series(np.random.randn(8), index=index )
print(series1)
print('-'*50)
series2 = pd.Series(np.random.randn(8), index=index )
print(series2)
print('-'*50)
series1.sub(series2)
2000-01-01    0.985423
2000-01-02   -0.467066
2000-01-03    0.151035
2000-01-04    0.383096
2000-01-05   -0.655538
2000-01-06   -0.046151
2000-01-07   -0.602910
2000-01-08    0.973474
Freq: D, dtype: float64
--------------------------------------------------
2000-01-01   -0.168359
2000-01-02    2.243687
2000-01-03    0.756375
2000-01-04    0.544233
2000-01-05    1.050438
2000-01-06    0.797982
2000-01-07    0.176542
2000-01-08    0.242590
Freq: D, dtype: float64
--------------------------------------------------
Out[796]:
2000-01-01    1.153781
2000-01-02   -2.710752
2000-01-03   -0.605340
2000-01-04   -0.161137
2000-01-05   -1.705976
2000-01-06   -0.844133
2000-01-07   -0.779452
2000-01-08    0.730884
Freq: D, dtype: float64

計算 s.mul()

  • Method is used to multiply series or list like objects with same length with the caller series
In [797]:
index=['a', 'b', 'c', 'd', 'e']
series = pd.Series(np.random.randn(5), index=index)
print(series)
print('-'*50)
print(series ** 2)
series.mul(series)
a    0.792009
b    1.401266
c    0.360041
d   -0.351529
e    0.039101
dtype: float64
--------------------------------------------------
a    0.627278
b    1.963546
c    0.129629
d    0.123572
e    0.001529
dtype: float64
Out[797]:
a    0.627278
b    1.963546
c    0.129629
d    0.123572
e    0.001529
dtype: float64

計算 s.div()

  • Method is used to divide series or list like objects with same length by the caller series
In [798]:
index=['a', 'b', 'c', 'd', 'e']
series = pd.Series(np.random.randn(5), index=index)
print(series)
print('-'*50)
print(series.div(2))
series.div(series)
a    0.027431
b   -1.470867
c   -0.060773
d   -2.087408
e   -0.379648
dtype: float64
--------------------------------------------------
a    0.013715
b   -0.735433
c   -0.030387
d   -1.043704
e   -0.189824
dtype: float64
Out[798]:
a    1.0
b    1.0
c    1.0
d    1.0
e    1.0
dtype: float64

唯一值數組 s.unique()

  • be used to see the unique values in a particular column
In [799]:
first =[1, 2, 3, 1, 2, 7, 3, 0, 4] 
first = pd.Series(first) 
type(first.unique())
first.unique()
Out[799]:
array([1, 2, 3, 7, 0, 4])

唯一值計數 s.nunique()

  • be used to get a count of unique values
In [800]:
first =[1, 2, 3, 1, 2, 7, 3, 0, 4] 
first = pd.Series(first) 
first.nunique()
Out[800]:
6

唯一值判斷 s.is_unique

  • Method returns boolean if values in the object are unique.
In [801]:
first =[1, 2, 3, 1, 2, 7, 3, 0, 4] 
first = pd.Series(first) 
first.is_unique
Out[801]:
False
In [802]:
second =[1, 2, 3, 4, 5, 6, 7] 
second = pd.Series(second) 
second.is_unique
Out[802]:
True

統計 最大值 max()

  • Method to extract the value of the highest values in a Series
In [803]:
series =pd.Series([5, 3, 2, 1, 3, 9, 21, 3, np.nan])
series.max()
Out[803]:
21.0

統計 最大值索引 s.idxmax()

  • Method to extract the index positions of the highest values in a Series
In [804]:
series =pd.Series([5, 3, 2, 1, 3, 9, 21, 3, np.nan])
print(series)
print('-'*50)
print('The index of the highest value is ', series.idxmax())
0     5.0
1     3.0
2     2.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64
--------------------------------------------------
The index of the highest value is  6

統計 最小值 min()

  • Method to extract the value of the lowest values in a Series
In [805]:
series =pd.Series([5, 3, 2, 1, 3, 9, 21, 3, np.nan])
series.min()
Out[805]:
1.0

統計 最小值索引 s.idxmin()

  • Method to extract the index positions of the lowest values in a Series
In [806]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print(series)
print('-'*50)
print('The index of the lowest value is ', series.idxmin())
0     5.0
1     3.0
2     1.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64
--------------------------------------------------
The index of the lowest value is  2

統計 頻次 s.value_counts()

  • Method to count the number of the times each unique value occurs in a Series
In [807]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.value_counts()
Out[807]:
3.0     3
1.0     2
21.0    1
9.0     1
5.0     1
dtype: int64

統計 合計 s.sum()

  • Returns the sum of the values for the requested axis
In [808]:
series =pd.Series([5, 3, 1, np.nan])
series.sum()
Out[808]:
9.0

統計 相乘 s.prod()

  • Returns the product of the values for the requested axis
In [809]:
series =pd.Series([5, 3, 1, np.nan])
series.prod()
Out[809]:
15.0

統計 平均 s.mean()

  • Returns the mean of the values for the requested axis
In [810]:
series =pd.Series([5, 3, 0, 0, np.nan])
series.mean()
Out[810]:
2.0

統計 指數冪 s.pow()

  • Series.pow(other, level=None, fill_value=None, axis=0)
  • Method is used to put each element of passed series as exponential power of caller series and returned the results
In [811]:
series1 =pd.Series([2, 3, 4, np.nan, 3, 1])
series2 =pd.Series([1, 2, 3, 2, np.nan, np.nan])
series1.pow(series2)
Out[811]:
0     2.0
1     9.0
2    64.0
3     NaN
4     NaN
5     1.0
dtype: float64
In [812]:
series1 =pd.Series([2, 3, 4, np.nan, 3, 1])
series2 =pd.Series([1, 2, 3, 2, np.nan, np.nan])
series1.pow(series2,  fill_value=1)
Out[812]:
0     2.0
1     9.0
2    64.0
3     1.0
4     3.0
5     1.0
dtype: float64

統計 絕對值 s.abs()

  • Method is used to get the absolute numeric value of each element in Series/DataFrame
In [813]:
series =pd.Series([-2, -3, 4, np.nan])
series.abs()
Out[813]:
0    2.0
1    3.0
2    4.0
3    NaN
dtype: float64

統計 取商"//"與取模"%" divmod()

In [814]:
series = pd.Series(np.arange(10))
print(series.tolist())
print('-'*50)
div, rem = divmod(series, 3)
print('use divmod')
print(div.tolist())
print(rem.tolist())
print('-'*50)
print('use // & %')
div2=series//3
rem2=series%3
print(div2.tolist())
print(rem2.tolist())
print('-'*50)
div4, rem4 = divmod(series, [2, 2, 3, 3, 4, 4, 5, 5, 6, 6])  # elementwise divmod()    
print('use divmod and a list')
print(div4.tolist())
print(rem4.tolist())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
--------------------------------------------------
use divmod
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3]
[0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
--------------------------------------------------
use // & %
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3]
[0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
--------------------------------------------------
use divmod and a list
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
[0, 1, 2, 0, 0, 1, 1, 2, 2, 3]

排序 by values s.sort_values()

  • Method is called on a Series to sort the values in ascending or descending order
In [815]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.sort_values()
Out[815]:
2     1.0
3     1.0
1     3.0
4     3.0
7     3.0
0     5.0
5     9.0
6    21.0
8     NaN
dtype: float64
In [816]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.sort_values(ascending=False)
Out[816]:
6    21.0
5     9.0
0     5.0
7     3.0
4     3.0
1     3.0
3     1.0
2     1.0
8     NaN
dtype: float64

排序 by index s.sort_index()

  • Method is called on a pandas Series to sort it by the index instead of its values
In [817]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.sort_values().sort_index()
Out[817]:
0     5.0
1     3.0
2     1.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64

返回指定元素 s.get()

  • Method is called on a Series to extract values from a Series. This is alternative syntax to the traditional bracket syntax
In [818]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.get(1)
Out[818]:
3.0

返回指定元素 s.head()

  • be used to return a specified number of rows from the beginning of a Series. The method returns a brand new Series
In [819]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.head()
Out[819]:
0    5.0
1    3.0
2    1.0
3    1.0
4    3.0
dtype: float64

返回指定元素 s.tail()

  • be used to return a specified number of rows from the end of a Series. The method returns a brand new Series
In [820]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.tail()
Out[820]:
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64

返回指定範圍 s.clip()

  • Series.clip(lower=None, upper=None, axis=None, inplace=False)
  • Python Series.clip() is used to clip value below and above to passed Least and Max value. This method comes in use when doing operations like Signal processing. As we know there are only two values in Digital signal, either High or Low. Pandas Series.clip() can be used to restrict the value to a Specific Range.
In [821]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print(series)
series.clip(2,4, inplace=True) # inplace=True 表示對原Series修改
series
0     5.0
1     3.0
2     1.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64
Out[821]:
0    4.0
1    3.0
2    2.0
3    2.0
4    3.0
5    4.0
6    4.0
7    3.0
8    NaN
dtype: float64

返回指定範圍 s.clip_lower()

  • Used to clip values below a passed least value
In [822]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print(series)
series.clip_lower(2)
0     5.0
1     3.0
2     1.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64
Out[822]:
0     5.0
1     3.0
2     2.0
3     2.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64

返回指定範圍 s.clip_upper()

  • Used to clip values above a passed maximum value
In [823]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print(series)
series.clip_upper(4)
0     5.0
1     3.0
2     1.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64
Out[823]:
0    4.0
1    3.0
2    1.0
3    1.0
4    3.0
5    4.0
6    4.0
7    3.0
8    NaN
dtype: float64

改變類型 s.astype()

  • DataFrame.astype(dtype, copy=True, errors=’raise’)
  • be used to do data type conversions.
  • Cannot convert non-finite values (NA or inf) to integer, butcan use errors, ‘raise’ will raise the error and ‘ignore’ will pass without raising error.
In [824]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print('dtype is', series.dtypes)
print('dtype is', series.astype(int, errors='ignore').dtypes)
print('dtype is', series.fillna(0).astype(int).dtypes)
print('dtype is', series.fillna('A').astype(str).dtypes)
print('dtype is', series.dropna().astype(str).dtypes)
series.fillna(0).astype(int)
#series.dropna().astype(int)
dtype is float64
dtype is float64
dtype is int64
dtype is object
dtype is object
Out[824]:
0     5
1     3
2     1
3     1
4     3
5     9
6    21
7     3
8     0
dtype: int64

改變類型 s.tolist()

  • Series.tolist()
  • Converted series into List
  • used to convert a series to list. Initially the series is of type pandas.core.series.Series and applying tolist() method, it is converted to list data type.
In [825]:
series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print(type(series))
print(type(series.tolist()))
series.tolist()
<class 'pandas.core.series.Series'>
<class 'list'>
Out[825]:
[5.0, 3.0, 1.0, 1.0, 3.0, 9.0, 21.0, 3.0, nan]

位置索引 s.factorize()

  • Return: Numeric representation of array
  • pandas.factorize() method helps to get the numeric representation of an array by identifying distinct values. This method is available as both pandas.factorize() and Series.factorize().
In [826]:
series =pd.Series([5, 3, 1, 1, np.nan, 9, 21, 3, np.nan])
series.factorize()
Out[826]:
(array([ 0,  1,  2,  2, -1,  3,  4,  1, -1]),
 Float64Index([5.0, 3.0, 1.0, 9.0, 21.0], dtype='float64'))
In [827]:
# 相當與按下列Series的索引,對series的每一個元素定位,NaN爲-1
series =pd.Series([5, 3, 1, 1, np.nan, 9, 21, 3, np.nan])
print(pd.Series(series.dropna().unique()))
0     5.0
1     3.0
2     1.0
3     9.0
4    21.0
dtype: float64

映射 s.map()

  • Series.map(arg, na_action=None)
  • 对Series的每个元素执行func函数
In [828]:
series =pd.Series([5, 3, 1, 1, np.nan, 9, 21, 3, np.nan])
#series.map(lambda x: x*10, na_action='ignore')
#series.fillna(0).map(lambda x: x*10 if x < 5 else (x*2 if x < 10 else x/10))
series.map(lambda x: x*10 if x < 5 else (x*2 if x < 10 else (0 if np.isnan(x) else x/10))) # 嵌套if語句
Out[828]:
0    10.0
1    30.0
2    10.0
3    10.0
4     0.0
5    18.0
6     2.1
7    30.0
8     0.0
dtype: float64

映射 s.apply()

  • s.apply(func, convert_dtype=True, args=())
    • func: .apply takes a function and applies it to all values of pandas series.
    • convert_dtype: Convert dtype as per the function’s operation.
    • args=(): Additional arguments to pass to function instead of series.
    • Return Type: Pandas Series after applied function/operation.
  • 區別
    • apply()是一种让函数作用于列或者行操作
    • applymap()是一种让函数作用于DataFrame每一个元素的操作
    • map是一种让函数作用于Series每一个元素的操作
In [829]:
series =pd.Series([5, 3, 1, 1, np.nan, 9, 21, 3, np.nan])
series.apply(lambda x: x*10 if x < 5 else (x*2 if x < 10 else (0 if np.isnan(x) else x/10)))
Out[829]:
0    10.0
1    30.0
2    10.0
3    10.0
4     0.0
5    18.0
6     2.1
7    30.0
8     0.0
dtype: float64

位移 s.shift()

In [830]:
series = pd.Series([1,3,5,np.nan,6,8])
print(series)
series2=series.shift(2)
series2
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64
Out[830]:
0    NaN
1    NaN
2    1.0
3    3.0
4    5.0
5    NaN
dtype: float64

對齊 label alignment

In [831]:
series1 = pd.Series([1,3,5,np.nan,6,8])
print(series1)
series2=series1.shift(2)
print(series2)
series1+series2
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64
0    NaN
1    NaN
2    1.0
3    3.0
4    5.0
5    NaN
dtype: float64
Out[831]:
0     NaN
1     NaN
2     6.0
3     NaN
4    11.0
5     NaN
dtype: float64

切片 s[]

In [832]:
index=['a', 'b', 'c', 'd', 'e']
series = pd.Series(np.random.randn(5), index=index)
print(series)
print('-'*50)
print('series[0] is\n', series[0]) 
print('-'*50)
print('series[:3] is \n', series[:3])
print('-'*50)
print('series[[4, 3, 1]] is \n', series[[4, 3, 1]])   #  ndarray-like
print('-'*50)
print('series["a"] is \n', series['a'])               # dict-like
print('-'*50)
print('series["e"] is \n', series['e'])
a   -1.159010
b   -1.377384
c   -1.117303
d    0.439501
e   -1.185345
dtype: float64
--------------------------------------------------
series[0] is
 -1.1590104009506388
--------------------------------------------------
series[:3] is 
 a   -1.159010
b   -1.377384
c   -1.117303
dtype: float64
--------------------------------------------------
series[[4, 3, 1]] is 
 e   -1.185345
d    0.439501
b   -1.377384
dtype: float64
--------------------------------------------------
series["a"] is 
 -1.1590104009506388
--------------------------------------------------
series["e"] is 
 -1.1853449477967675

過濾 s[s > s.median()]

In [833]:
index=['a', 'b', 'c', 'd', 'e']
series = pd.Series(np.random.randn(5), index=index)
print(series)
series[series > series.median()]
a   -2.405354
b    0.705862
c    1.615508
d    0.176146
e    0.018139
dtype: float64
Out[833]:
b    0.705862
c    1.615508
dtype: float64

環比百分比 s.pct_change()

  • pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)
  • Percentage change between the current and a prior element.
  • s.pct_change(periods=12) 年度環比
In [834]:
s1 = pd.Series([90, 91, 85])
s2=s1.shift(periods=1)
print(s2)
print('-'*50)
s3=s1.diff(periods=1)
print(s3)
print('-'*50)
s4=s3.div(s2)
print(s4)
print('-'*50)
s1.pct_change(periods=1)
0     NaN
1    90.0
2    91.0
dtype: float64
--------------------------------------------------
0    NaN
1    1.0
2   -6.0
dtype: float64
--------------------------------------------------
0         NaN
1    0.011111
2   -0.065934
dtype: float64
--------------------------------------------------
Out[834]:
0         NaN
1    0.011111
2   -0.065934
dtype: float64

缺失值處理 s.isna()

  • Return a boolean same-sized object indicating if the values are NA.
In [835]:
ser = pd.Series([5, 6, np.NaN])
print(ser)
ser.isna()
0    5.0
1    6.0
2    NaN
dtype: float64
Out[835]:
0    False
1    False
2     True
dtype: bool

缺失值處理 s.notna()

  • Return a boolean same-sized object indicating if the values are not NA.
In [836]:
ser = pd.Series([5, 6, np.NaN])
print(ser)
ser.notna()
0    5.0
1    6.0
2    NaN
dtype: float64
Out[836]:
0     True
1     True
2    False
dtype: bool

缺失值處理 s.interpolate()

  • nterpolate values according to different methods.
  • Series.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs)
In [837]:
ser = pd.Series([0, 1, np.nan, 9, np.nan, 5])
ser.interpolate()
Out[837]:
0    0.0
1    1.0
2    5.0
3    9.0
4    7.0
5    5.0
dtype: float64

◎ 欢迎参与讨论,请在这里发表您的看法、交流您的观点。