Series节选

1 創建
2 API excerpts

import numpy as np
import pandas as pd

創建¶

創建空Series¶

s = pd.Series()
print(s)

Series([], dtype: float64)

使用常數創建¶

print(type(5))
pd.Series(5)

<class 'int'>

0    5
dtype: int64

使用list創建¶

l=[1,3,5,np.nan,45]
print(type(l))
s = pd.Series(l)
s

<class 'list'>

0     1.0
1     3.0
2     5.0
3     NaN
4    45.0
dtype: float64

使用dict創建¶

d = {'b' : 1, 'a' : 0, 'c' : 2}
print(type(d))
pd.Series(d)

<class 'dict'>

b    1
a    0
c    2
dtype: int64

d = {'a' : 0., 'b' : 1., 'c' : 2.}  
pd.Series(d)  # 索引順序按照字典次序

a    0.0
b    1.0
c    2.0
dtype: float64

pd.Series(d, index=['b', 'c', 'd', 'a'])  # 按照给定的index生成索引顺序

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

根據ndarray創建¶

index=list('abcde')
s = pd.Series(np.random.randn(5), index=index)
print(type(np.random.randn(5)))
s

<class 'numpy.ndarray'>

a    0.418042
b   -0.454595
c   -0.402959
d    1.897117
e    0.558703
dtype: float64

創建時同時添加索引¶

l=[1,3,5,np.nan,45]
index=list('ABCDE')
s = pd.Series(l, index=index)
s

A     1.0
B     3.0
C     5.0
D     NaN
E    45.0
dtype: float64

s.index

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

API excerpts¶

命名 s.name()¶

Allows to give a name to a Series object, i.e. to the column

series = pd.Series([2, 43, 9, 27, np.nan], name='Jasper')
series.name

'Jasper'

更名 s.rename()¶

series = pd.Series([2, 43, 9, 27, np.nan], name='Jasper')
series2 = series.rename("Steven")  # Note that series and series2 refer to different objects.
print(series.name)
series2.name

Jasper

'Steven'

比較等於 s.eq() ==¶

Series.eq(other, level=None, fill_value=None)
The results are returned on the basis of comparison caller series = other series.

series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.eq(series2, fill_value = replace_nan) 
print(result)  
print('-'*50)
result2=series1.fillna(10)==series2.fillna(10)
print(result2)

0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0    False
1    False
2    False
3     True
4    False
dtype: bool
--------------------------------------------------
0    False
1    False
2    False
3     True
4    False
dtype: bool

比較不等於 s.ne() !=¶

Series.ne(other, level=None, fill_value=None)
The results are returned on the basis of comparison caller series != other series.

series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.ne(series2, fill_value = replace_nan) 
print(result)
result2=series1.fillna(10)!=series2.fillna(10)
print(result2)

0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0     True
1     True
2     True
3    False
4     True
dtype: bool
0     True
1     True
2     True
3    False
4     True
dtype: bool

比較小於等於 s.le() <=¶

Series.le(other, level=None, fill_value=None, axis=0)
The results are returned on the basis of comparison caller series <= other series.
In case of strings, the comparison is made with their ASCII values.

series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.le(series2, fill_value = replace_nan) 
print(result)  
print('-'*50)
result2=series1.fillna(10)<=series2.fillna(10)
print(result2)

0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0     True
1    False
2    False
3     True
4     True
dtype: bool
--------------------------------------------------
0     True
1    False
2    False
3     True
4     True
dtype: bool

比較大於等於 s.ge() >=¶

Series.ge(other, level=None, fill_value=None, axis=0)
The results are returned on the basis of comparison caller series >= other series.

series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.ge(series2, fill_value = replace_nan) 
print(result)  
print('-'*50)
result2=series1.fillna(10)>=series2.fillna(10)
print(result2)

0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0    False
1     True
2     True
3     True
4    False
dtype: bool
--------------------------------------------------
0    False
1     True
2     True
3     True
4    False
dtype: bool

比較小於 s.lt() <¶

Series.lt(other, level=None, fill_value=None, axis=0)
The results are returned on the basis of comparision caller series < other series.

series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.lt(series2, fill_value = replace_nan) 
print(result)  
print('-'*50)
result2=series1.fillna(10)<series2.fillna(10)
print(result2)

0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0     True
1    False
2    False
3    False
4     True
dtype: bool
--------------------------------------------------
0     True
1    False
2    False
3    False
4     True
dtype: bool

比較大於 s.gt() >¶

Series.gt(other, level=None, fill_value=None, axis=0)
The results are returned on the basis of comparision caller series > other series.

series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, 54])  
print(series2)
print('-'*50)
replace_nan = 10
result = series1.gt(series2, fill_value = replace_nan) 
print(result)  
print('-'*50)
result2=series1.fillna(10)>series2.fillna(10)
print(result2)

0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4    54.0
dtype: float64
--------------------------------------------------
0    False
1     True
2     True
3    False
4    False
dtype: bool
--------------------------------------------------
0    False
1     True
2     True
3    False
4    False
dtype: bool

比較範圍 s.between()¶

Series.between(left, right, inclusive=True)
be used on series to check which values lie between first and second argument.
inclusive: A Boolean value which is True by default. If False, it excludes the two passed arguments while checking.

series =pd.Series([5, 3, 1, 1, np.nan, 9, 21, 3, 8])
series.between(2,8)

0     True
1     True
2    False
3    False
4    False
5    False
6    False
7     True
8     True
dtype: bool

合並 s.combine_first()¶

Series.combine_first(other)
Pandas combine_first() method is used to combine two series into one. The result is union of the two series that is in case of Null value in caller series, the value from passed series is taken. In case of both null values at the same index, null is returned at that index.
This method is different from Series.combine() which takes a function as parameter to decide output value.

series1 = pd.Series([2, 43, 9, 27, np.nan])
print(series1)
print('-'*50)
series2 = pd.Series([np.nan, 23, 5, 27, np.nan])  
print(series2)
print('-'*50)
result = series1.combine_first(series2) 
print(result)

0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     NaN
1    23.0
2     5.0
3    27.0
4     NaN
dtype: float64
--------------------------------------------------
0     2.0
1    43.0
2     9.0
3    27.0
4     NaN
dtype: float64

合並 s.combine()¶

Series.combine(other, func, fill_value=nan)
Pandas Series.combine() is a series mathematical operation method. This is used to combine two series into one. The shape of output series is same as the caller series. The elements are decided by a function passed as parameter to combine() method. The shape of both series has to be same otherwise it will throw an error.

first =[1, 2, 5, 6, 3, 7, 11, 0, 4] 
second =[5, 3, 2, 1, 3, 9, 21, 3, 1] 
first = pd.Series(first) 
second = pd.Series(second) 
#result = first.combine(second, (lambda x1,x2: x1+x2))
result = first.combine(second, (lambda x1, x2: x1 if x1 < x2 else x2)) 
result

0     1
1     2
2     2
3     1
4     3
5     7
6    11
7     0
8     1
dtype: int64

計數(含空) s.szie¶

Returns the number of elements in the underlying data

first =[1, 2, 5, 6, 3, 7, 11, 0, 4] 
second =[5, 3, 2, 1, 3, 9, 21, 3, np.nan] 
first = pd.Series(first) 
second = pd.Series(second) 
print(first.size)
print(second.size)

9
9

計數(非空) s.count()¶

Returns number of non-NA/null observations in the Series

first =[1, 2, 5, 6, 3, 7, 11, 0, 4] 
second =[5, 3, 2, 1, 3, 9, 21, 3, np.nan] 
first = pd.Series(first) 
second = pd.Series(second) 
print(first.count())
print(second.count())

9
8

計算 s.add() s.radd()¶

Method is used to add series or list like objects with same length to the caller series
Series.radd(other, level=None, fill_value=None, axis=0)
Equivalent to other + series, but with support to substitute a fill_value for missing data in one of the inputs.

index=['a', 'b', 'c', 'd', 'e']
series = pd.Series(np.random.randn(5), index=index)
print(series)
print('-'*50)
print(series+series)
print('-'*50)
print(series.add(series))
print('-'*50)
print(series.radd(series))

a    2.910306
b   -0.488479
c   -0.605480
d   -0.522796
e   -0.581465
dtype: float64
--------------------------------------------------
a    5.820613
b   -0.976958
c   -1.210961
d   -1.045592
e   -1.162931
dtype: float64
--------------------------------------------------
a    5.820613
b   -0.976958
c   -1.210961
d   -1.045592
e   -1.162931
dtype: float64
--------------------------------------------------
a    5.820613
b   -0.976958
c   -1.210961
d   -1.045592
e   -1.162931
dtype: float64

計算 s.sub()¶

be used to subtract series or list like objects with same length from the caller series
df.sub(other, axis='columns', level=None, fill_value=None)
Equivalent to _Series - other__ , but with support to substitute a fill_value for missing data in one of the inputs.
Parameters:

other : Series, DataFrame, or constant

axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on

level : int or name

Broadcast across a level, matching Index values on the passed MultiIndex level

fill_value : None or float value, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing

index = pd.date_range('1/1/2000', periods=8)
series1 = pd.Series(np.random.randn(8), index=index )
print(series1)
print('-'*50)
series2 = pd.Series(np.random.randn(8), index=index )
print(series2)
print('-'*50)
series1.sub(series2)

2000-01-01    0.985423
2000-01-02   -0.467066
2000-01-03    0.151035
2000-01-04    0.383096
2000-01-05   -0.655538
2000-01-06   -0.046151
2000-01-07   -0.602910
2000-01-08    0.973474
Freq: D, dtype: float64
--------------------------------------------------
2000-01-01   -0.168359
2000-01-02    2.243687
2000-01-03    0.756375
2000-01-04    0.544233
2000-01-05    1.050438
2000-01-06    0.797982
2000-01-07    0.176542
2000-01-08    0.242590
Freq: D, dtype: float64
--------------------------------------------------

2000-01-01    1.153781
2000-01-02   -2.710752
2000-01-03   -0.605340
2000-01-04   -0.161137
2000-01-05   -1.705976
2000-01-06   -0.844133
2000-01-07   -0.779452
2000-01-08    0.730884
Freq: D, dtype: float64

計算 s.mul()¶

Method is used to multiply series or list like objects with same length with the caller series

index=['a', 'b', 'c', 'd', 'e']
series = pd.Series(np.random.randn(5), index=index)
print(series)
print('-'*50)
print(series ** 2)
series.mul(series)

a    0.792009
b    1.401266
c    0.360041
d   -0.351529
e    0.039101
dtype: float64
--------------------------------------------------
a    0.627278
b    1.963546
c    0.129629
d    0.123572
e    0.001529
dtype: float64

a    0.627278
b    1.963546
c    0.129629
d    0.123572
e    0.001529
dtype: float64

計算 s.div()¶

Method is used to divide series or list like objects with same length by the caller series

index=['a', 'b', 'c', 'd', 'e']
series = pd.Series(np.random.randn(5), index=index)
print(series)
print('-'*50)
print(series.div(2))
series.div(series)

a    0.027431
b   -1.470867
c   -0.060773
d   -2.087408
e   -0.379648
dtype: float64
--------------------------------------------------
a    0.013715
b   -0.735433
c   -0.030387
d   -1.043704
e   -0.189824
dtype: float64

a    1.0
b    1.0
c    1.0
d    1.0
e    1.0
dtype: float64

唯一值數組 s.unique()¶

be used to see the unique values in a particular column

first =[1, 2, 3, 1, 2, 7, 3, 0, 4] 
first = pd.Series(first) 
type(first.unique())
first.unique()

array([1, 2, 3, 7, 0, 4])

唯一值計數 s.nunique()¶

be used to get a count of unique values

first =[1, 2, 3, 1, 2, 7, 3, 0, 4] 
first = pd.Series(first) 
first.nunique()

6

唯一值判斷 s.is_unique¶

Method returns boolean if values in the object are unique.

first =[1, 2, 3, 1, 2, 7, 3, 0, 4] 
first = pd.Series(first) 
first.is_unique

False

second =[1, 2, 3, 4, 5, 6, 7] 
second = pd.Series(second) 
second.is_unique

True

統計最大值 max()¶

Method to extract the value of the highest values in a Series

series =pd.Series([5, 3, 2, 1, 3, 9, 21, 3, np.nan])
series.max()

21.0

統計最大值索引 s.idxmax()¶

Method to extract the index positions of the highest values in a Series

series =pd.Series([5, 3, 2, 1, 3, 9, 21, 3, np.nan])
print(series)
print('-'*50)
print('The index of the highest value is ', series.idxmax())

0     5.0
1     3.0
2     2.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64
--------------------------------------------------
The index of the highest value is  6

統計最小值 min()¶

Method to extract the value of the lowest values in a Series

series =pd.Series([5, 3, 2, 1, 3, 9, 21, 3, np.nan])
series.min()

1.0

統計最小值索引 s.idxmin()¶

Method to extract the index positions of the lowest values in a Series

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print(series)
print('-'*50)
print('The index of the lowest value is ', series.idxmin())

0     5.0
1     3.0
2     1.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64
--------------------------------------------------
The index of the lowest value is  2

統計頻次 s.value_counts()¶

Method to count the number of the times each unique value occurs in a Series

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.value_counts()

3.0     3
1.0     2
21.0    1
9.0     1
5.0     1
dtype: int64

統計合計 s.sum()¶

Returns the sum of the values for the requested axis

series =pd.Series([5, 3, 1, np.nan])
series.sum()

9.0

統計相乘 s.prod()¶

Returns the product of the values for the requested axis

series =pd.Series([5, 3, 1, np.nan])
series.prod()

15.0

統計平均 s.mean()¶

Returns the mean of the values for the requested axis

series =pd.Series([5, 3, 0, 0, np.nan])
series.mean()

2.0

統計指數冪 s.pow()¶

Series.pow(other, level=None, fill_value=None, axis=0)
Method is used to put each element of passed series as exponential power of caller series and returned the results

series1 =pd.Series([2, 3, 4, np.nan, 3, 1])
series2 =pd.Series([1, 2, 3, 2, np.nan, np.nan])
series1.pow(series2)

0     2.0
1     9.0
2    64.0
3     NaN
4     NaN
5     1.0
dtype: float64

series1 =pd.Series([2, 3, 4, np.nan, 3, 1])
series2 =pd.Series([1, 2, 3, 2, np.nan, np.nan])
series1.pow(series2,  fill_value=1)

0     2.0
1     9.0
2    64.0
3     1.0
4     3.0
5     1.0
dtype: float64

統計絕對值 s.abs()¶

Method is used to get the absolute numeric value of each element in Series/DataFrame

series =pd.Series([-2, -3, 4, np.nan])
series.abs()

0    2.0
1    3.0
2    4.0
3    NaN
dtype: float64

統計取商"//"與取模"%" divmod()¶

series = pd.Series(np.arange(10))
print(series.tolist())
print('-'*50)
div, rem = divmod(series, 3)
print('use divmod')
print(div.tolist())
print(rem.tolist())
print('-'*50)
print('use // & %')
div2=series//3
rem2=series%3
print(div2.tolist())
print(rem2.tolist())
print('-'*50)
div4, rem4 = divmod(series, [2, 2, 3, 3, 4, 4, 5, 5, 6, 6])  # elementwise divmod()    
print('use divmod and a list')
print(div4.tolist())
print(rem4.tolist())

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
--------------------------------------------------
use divmod
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3]
[0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
--------------------------------------------------
use // & %
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3]
[0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
--------------------------------------------------
use divmod and a list
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
[0, 1, 2, 0, 0, 1, 1, 2, 2, 3]

排序 by values s.sort_values()¶

Method is called on a Series to sort the values in ascending or descending order

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.sort_values()

2     1.0
3     1.0
1     3.0
4     3.0
7     3.0
0     5.0
5     9.0
6    21.0
8     NaN
dtype: float64

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.sort_values(ascending=False)

6    21.0
5     9.0
0     5.0
7     3.0
4     3.0
1     3.0
3     1.0
2     1.0
8     NaN
dtype: float64

排序 by index s.sort_index()¶

Method is called on a pandas Series to sort it by the index instead of its values

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.sort_values().sort_index()

0     5.0
1     3.0
2     1.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64

返回指定元素 s.get()¶

Method is called on a Series to extract values from a Series. This is alternative syntax to the traditional bracket syntax

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.get(1)

3.0

返回指定元素 s.head()¶

be used to return a specified number of rows from the beginning of a Series. The method returns a brand new Series

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.head()

0    5.0
1    3.0
2    1.0
3    1.0
4    3.0
dtype: float64

返回指定元素 s.tail()¶

be used to return a specified number of rows from the end of a Series. The method returns a brand new Series

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
series.tail()

4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64

返回指定範圍 s.clip()¶

Series.clip(lower=None, upper=None, axis=None, inplace=False)
Python Series.clip() is used to clip value below and above to passed Least and Max value. This method comes in use when doing operations like Signal processing. As we know there are only two values in Digital signal, either High or Low. Pandas Series.clip() can be used to restrict the value to a Specific Range.

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print(series)
series.clip(2,4, inplace=True) # inplace=True 表示對原Series修改
series

0     5.0
1     3.0
2     1.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64

0    4.0
1    3.0
2    2.0
3    2.0
4    3.0
5    4.0
6    4.0
7    3.0
8    NaN
dtype: float64

返回指定範圍 s.clip_lower()¶

Used to clip values below a passed least value

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print(series)
series.clip_lower(2)

0     5.0
1     3.0
2     1.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64

0     5.0
1     3.0
2     2.0
3     2.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64

返回指定範圍 s.clip_upper()¶

Used to clip values above a passed maximum value

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print(series)
series.clip_upper(4)

0     5.0
1     3.0
2     1.0
3     1.0
4     3.0
5     9.0
6    21.0
7     3.0
8     NaN
dtype: float64

0    4.0
1    3.0
2    1.0
3    1.0
4    3.0
5    4.0
6    4.0
7    3.0
8    NaN
dtype: float64

改變類型 s.astype()¶

DataFrame.astype(dtype, copy=True, errors=’raise’)
be used to do data type conversions.
Cannot convert non-finite values (NA or inf) to integer, butcan use errors, ‘raise’ will raise the error and ‘ignore’ will pass without raising error.

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print('dtype is', series.dtypes)
print('dtype is', series.astype(int, errors='ignore').dtypes)
print('dtype is', series.fillna(0).astype(int).dtypes)
print('dtype is', series.fillna('A').astype(str).dtypes)
print('dtype is', series.dropna().astype(str).dtypes)
series.fillna(0).astype(int)
#series.dropna().astype(int)

dtype is float64
dtype is float64
dtype is int64
dtype is object
dtype is object

0     5
1     3
2     1
3     1
4     3
5     9
6    21
7     3
8     0
dtype: int64

改變類型 s.tolist()¶

Series.tolist()
Converted series into List
used to convert a series to list. Initially the series is of type pandas.core.series.Series and applying tolist() method, it is converted to list data type.

series =pd.Series([5, 3, 1, 1, 3, 9, 21, 3, np.nan])
print(type(series))
print(type(series.tolist()))
series.tolist()

<class 'pandas.core.series.Series'>
<class 'list'>

[5.0, 3.0, 1.0, 1.0, 3.0, 9.0, 21.0, 3.0, nan]

位置索引 s.factorize()¶

Return: Numeric representation of array
pandas.factorize() method helps to get the numeric representation of an array by identifying distinct values. This method is available as both pandas.factorize() and Series.factorize().

series =pd.Series([5, 3, 1, 1, np.nan, 9, 21, 3, np.nan])
series.factorize()

(array([ 0,  1,  2,  2, -1,  3,  4,  1, -1]),
 Float64Index([5.0, 3.0, 1.0, 9.0, 21.0], dtype='float64'))

# 相當與按下列Series的索引,對series的每一個元素定位,NaN爲-1
series =pd.Series([5, 3, 1, 1, np.nan, 9, 21, 3, np.nan])
print(pd.Series(series.dropna().unique()))

0     5.0
1     3.0
2     1.0
3     9.0
4    21.0
dtype: float64

映射 s.map()¶

Series.map(arg, na_action=None)
对Series的每个元素执行func函数

series =pd.Series([5, 3, 1, 1, np.nan, 9, 21, 3, np.nan])
#series.map(lambda x: x*10, na_action='ignore')
#series.fillna(0).map(lambda x: x*10 if x < 5 else (x*2 if x < 10 else x/10))
series.map(lambda x: x*10 if x < 5 else (x*2 if x < 10 else (0 if np.isnan(x) else x/10))) # 嵌套if語句

0    10.0
1    30.0
2    10.0
3    10.0
4     0.0
5    18.0
6     2.1
7    30.0
8     0.0
dtype: float64

映射 s.apply()¶

s.apply(func, convert_dtype=True, args=())
- func: .apply takes a function and applies it to all values of pandas series.
- convert_dtype: Convert dtype as per the function’s operation.
- args=(): Additional arguments to pass to function instead of series.
- Return Type: Pandas Series after applied function/operation.
區別
- apply()是一种让函数作用于列或者行操作
- applymap()是一种让函数作用于DataFrame每一个元素的操作
- map是一种让函数作用于Series每一个元素的操作

series =pd.Series([5, 3, 1, 1, np.nan, 9, 21, 3, np.nan])
series.apply(lambda x: x*10 if x < 5 else (x*2 if x < 10 else (0 if np.isnan(x) else x/10)))

0    10.0
1    30.0
2    10.0
3    10.0
4     0.0
5    18.0
6     2.1
7    30.0
8     0.0
dtype: float64

位移 s.shift()¶

series = pd.Series([1,3,5,np.nan,6,8])
print(series)
series2=series.shift(2)
series2

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

0    NaN
1    NaN
2    1.0
3    3.0
4    5.0
5    NaN
dtype: float64

對齊 label alignment¶

series1 = pd.Series([1,3,5,np.nan,6,8])
print(series1)
series2=series1.shift(2)
print(series2)
series1+series2

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64
0    NaN
1    NaN
2    1.0
3    3.0
4    5.0
5    NaN
dtype: float64

0     NaN
1     NaN
2     6.0
3     NaN
4    11.0
5     NaN
dtype: float64

切片 s[]¶

index=['a', 'b', 'c', 'd', 'e']
series = pd.Series(np.random.randn(5), index=index)
print(series)
print('-'*50)
print('series[0] is\n', series[0]) 
print('-'*50)
print('series[:3] is \n', series[:3])
print('-'*50)
print('series[[4, 3, 1]] is \n', series[[4, 3, 1]])   #  ndarray-like
print('-'*50)
print('series["a"] is \n', series['a'])               # dict-like
print('-'*50)
print('series["e"] is \n', series['e'])

a   -1.159010
b   -1.377384
c   -1.117303
d    0.439501
e   -1.185345
dtype: float64
--------------------------------------------------
series[0] is
 -1.1590104009506388
--------------------------------------------------
series[:3] is 
 a   -1.159010
b   -1.377384
c   -1.117303
dtype: float64
--------------------------------------------------
series[[4, 3, 1]] is 
 e   -1.185345
d    0.439501
b   -1.377384
dtype: float64
--------------------------------------------------
series["a"] is 
 -1.1590104009506388
--------------------------------------------------
series["e"] is 
 -1.1853449477967675

過濾 s[s > s.median()]¶

index=['a', 'b', 'c', 'd', 'e']
series = pd.Series(np.random.randn(5), index=index)
print(series)
series[series > series.median()]

a   -2.405354
b    0.705862
c    1.615508
d    0.176146
e    0.018139
dtype: float64

b    0.705862
c    1.615508
dtype: float64

環比百分比 s.pct_change()¶

pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)
Percentage change between the current and a prior element.
s.pct_change(periods=12) 年度環比

s1 = pd.Series([90, 91, 85])
s2=s1.shift(periods=1)
print(s2)
print('-'*50)
s3=s1.diff(periods=1)
print(s3)
print('-'*50)
s4=s3.div(s2)
print(s4)
print('-'*50)
s1.pct_change(periods=1)

0     NaN
1    90.0
2    91.0
dtype: float64
--------------------------------------------------
0    NaN
1    1.0
2   -6.0
dtype: float64
--------------------------------------------------
0         NaN
1    0.011111
2   -0.065934
dtype: float64
--------------------------------------------------

0         NaN
1    0.011111
2   -0.065934
dtype: float64

缺失值處理 s.isna()¶

Return a boolean same-sized object indicating if the values are NA.

ser = pd.Series([5, 6, np.NaN])
print(ser)
ser.isna()

0    5.0
1    6.0
2    NaN
dtype: float64

0    False
1    False
2     True
dtype: bool

缺失值處理 s.notna()¶

Return a boolean same-sized object indicating if the values are not NA.

ser = pd.Series([5, 6, np.NaN])
print(ser)
ser.notna()

0    5.0
1    6.0
2    NaN
dtype: float64

0     True
1     True
2    False
dtype: bool

缺失值處理 s.interpolate()¶

nterpolate values according to different methods.
Series.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs)

ser = pd.Series([0, 1, np.nan, 9, np.nan, 5])
ser.interpolate()

0    0.0
1    1.0
2    5.0
3    9.0
4    7.0
5    5.0
dtype: float64

Series节选

創建¶

創建空Series¶

使用常數創建¶

使用list創建¶

使用dict創建¶

根據ndarray創建¶

創建時同時添加索引¶

API excerpts¶

命名 s.name()¶

更名 s.rename()¶

比較 等於 s.eq() ==¶

比較 不等於 s.ne() !=¶

比較 小於等於 s.le() <=¶

比較 大於等於 s.ge() >=¶

比較 小於 s.lt() <¶

比較 大於 s.gt() >¶

比較 範圍 s.between()¶

合並 s.combine_first()¶

合並 s.combine()¶

計數(含空) s.szie¶

計數(非空) s.count()¶

計算 s.add() s.radd()¶

計算 s.sub()¶

計算 s.mul()¶

計算 s.div()¶

唯一值數組 s.unique()¶

唯一值計數 s.nunique()¶

唯一值判斷 s.is_unique¶

統計 最大值 max()¶

統計 最大值索引 s.idxmax()¶

統計 最小值 min()¶

統計 最小值索引 s.idxmin()¶

統計 頻次 s.value_counts()¶

統計 合計 s.sum()¶

統計 相乘 s.prod()¶

統計 平均 s.mean()¶

統計 指數冪 s.pow()¶

統計 絕對值 s.abs()¶

統計 取商"//"與取模"%" divmod()¶

排序 by values s.sort_values()¶

排序 by index s.sort_index()¶

返回指定元素 s.get()¶

返回指定元素 s.head()¶

返回指定元素 s.tail()¶

返回指定範圍 s.clip()¶

返回指定範圍 s.clip_lower()¶

返回指定範圍 s.clip_upper()¶

改變類型 s.astype()¶

改變類型 s.tolist()¶

位置索引 s.factorize()¶

映射 s.map()¶

映射 s.apply()¶

位移 s.shift()¶

對齊 label alignment¶

切片 s[]¶

過濾 s[s > s.median()]¶

環比百分比 s.pct_change()¶

缺失值處理 s.isna()¶

缺失值處理 s.notna()¶

缺失值處理 s.interpolate()¶

比較等於 s.eq() ==¶

比較不等於 s.ne() !=¶

比較小於等於 s.le() <=¶

比較大於等於 s.ge() >=¶

比較小於 s.lt() <¶

比較大於 s.gt() >¶

比較範圍 s.between()¶

統計最大值 max()¶

統計最大值索引 s.idxmax()¶

統計最小值 min()¶

統計最小值索引 s.idxmin()¶

統計頻次 s.value_counts()¶

統計合計 s.sum()¶

統計相乘 s.prod()¶

統計平均 s.mean()¶

統計指數冪 s.pow()¶

統計絕對值 s.abs()¶

統計取商"//"與取模"%" divmod()¶