Skip to content

ISSS622 - Python Programming and Data Analysis

Notifications You must be signed in to change notification settings

studywitme/Python

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

144 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python

ISSS622 - Python Programming and Data Analysis

Table of contents

1. Basics

1.1. Naming Convention

Screenshot 2021-09-08 at 22 32 46

1.2. Operators

  • 4 Bascis Data Types: String, Integer, Float and Boolean
  • Logical Variable: not, and, or
Operator Name Description
a / b True division Quotient of a and b
a // b Floor division Quotient of a and b, removing fractional parts
a % b Modulus Integer remainder after division of a by b
a ** b Exponentiation a raised to the power of b
  • Membership Operators: in and not in
  • Identify Operators: is and is not to identify if 2 variables are same class
x =5
type(x) is int #True

1.3. Iterables and Iterators

  • iterable: types of iterables
    • list/tuple/str/dict
    • zip/enumerate/range/reversed
  • iterator: An iterable can be passed to the built-in function iter(), which returns some object called iterator
it = iter([4, 3, 2, 1]) 
print(next(it))#4
print(next(it))#3

Screenshot 2021-09-08 at 22 32 46

1.4. Zip and Enumerate

  • zip(): to zip 2 lists together
  • enumerate(): to return both item & index corresponding to that item in the list
l1, l2 = [ 1, 2, 3, 4, 5 ], ['h', 'e', 'l', 'l', 'o']
for item in zip(l1, l2):
    print(item)

(1, 'h')
(2, 'e')
(3, 'l')
(4, 'l')
(5, 'o')
l1 = ['h', 'e', 'l', 'l', 'o']
for idx, item in enumerate(l1):
    print(idx,item)

0 h
1 e
2 l
3 l
4 o

(Back to top)

2. Functions

2.1. Argument Types

  • 2.1.1. Positional Arguments

  • 2.1.2. Keyword Arguments

  • 2.1.3. Default Arguments

  • 2.1.4. Variable-Length Arguments

    1. *args (Non-Keyword Arguments): extra arguments can be tacked on to your current formal parameters (including zero extra arguments)
    2. **kwargs (Keyword Arguments) : dictionary that maps each keyword to the value that we pass alongside it

    Example of *args

    def info(name, *args):
        hobby = []
        for a in args:
            hobby.append(a)
        print(name +"'s hobbies: " + ', '.join(hobby))
    info('Mike')                      #Mike's hobbies:
    info('Mike', 'hiking', 'reading') #Mike's hobbies: hiking, reading

    Example of **kwargs

    def info(name, **kwargs):
        hobby = []
        for k, v in kwargs.items():
            hobby.append(k+'-'+v)
        print(name +"'s hobbies: " + ', '.join(hobby))
    info('Mike', first='hiking', second='reading') #Mike's hobbies: first-hiking, second-reading

(Back to top)

2.2. Variable Scopes

  • There are 2 types of Variable: Local and Global scope

    • global variable
    y = 'global'
    def test(): 
        global y #This to declare y is global scope
        print(y)
    test()    #will print 'global'

(Back to top)

3. Lambda Expressions

  • Syntax: lambda argument_list: expression
    • argument_list (same as argument list in functions): x,y, *arg, **kwargs
    • expression (Output) must be single line

Example of lambda

lambda x, y: x*y          #input: x, y; output: x*y
lambda *args: sum(args).  #input: any number of parameters; output: their summation
lambda x: 1  #input: x; output: 1

3.1. Sorted

  • Syntax: sorted(iterable, key=None, reverse=False) sorts the elements in the given iterable by key
sorted([1, 2, 3, 4, 5], key = lambda x: abs(3 - x))  #[3, 2, 4, 1, 5]

3.2. Filter and Map

  • Filter syntax: filter(function, iterable) filters the given iterable (list) based on the given function
  • Map syntax : map(function, iterable) applies a given function to each item of the given iterable
  • Note: Both Filter and Map will return Iterable Object, so need to use list() function to convert to a lsit

Example of filter and map

list(filter(lambda n: n % 2 == 1, [1, 2, 3, 4, 5]))  #[1, 3, 5]

list(map(lambda x: x + 1, [1, 2, 3]))                #[2, 3, 4]

(Back to top)

4. Module

4.1. Random Module

import random
random.seed(42) #make results reproducible,

random.random() #return random number between 0.0 and 1.0
>>> 0.35553263284394376

random.randint(0, 10) #generate a random integer between two endpoints in Python
>>> 7

items = ['one', 'two', 'three', 'four', 'five']
random.choice(items) #choosing multiple elements from a sequence with replacement (duplicates are possible):
>>> 'four'
random.choices(items, k=2)
>>> ['three', 'three']
random.choices(items, k=3)
>>> ['three', 'five', 'four']


random.shuffle(items) #randomize a sequence in-place
>>> ['four', 'three', 'two', 'one', 'five']

(Back to top)

5. Class

Screenshot 2021-09-08 at 22 38 12

5.1. Object

5.1.1. Variable Assignment and Aliasing

  • Aliasing: many variables (a,b) refer to the same object list [1,2]

Screenshot 2021-09-08 at 22 52 48

5.1.2. Comparison Operators

  • ==: compares the values of the object
  • is: compares objects
a = [1,2]
b = [1,2] 

print(id(a)) #2661200625736 
print(id(b)) #2661202091528

a == b #True
a is b #False

5.1.3. Integer Caching

  • In Python, interpreters will typically cache small integers in the range of -5 to 256.
  • When the Python interpreter is launched, these integer objects will be created and available for later use in the memory.

Screenshot 2021-09-08 at 23 32 57

5.1.4. Shallow Copy vs Deep Copy

import copy
a = [[0, 1], 2, 3]
b = copy.copy(a)
c = copy.deepcopy(a)

Shallow Copy - copy()

  • Shallow Copy will only create a new object for the parent layer.
  • It will NOT create a new object for any of the child layer.

Screenshot 2021-09-08 at 23 38 28

Deep Copy - deepcopy()

  • Deep Copy will create new objects for the parent & child layers.

Screenshot 2021-09-08 at 23 45 50

5.1.5. Data Mutability

Immutable (values are changed, a new object will be created): integers, strings, and tuples • Mutable (values can be changed after creation): lists, dictionaries, and sets

Screenshot 2021-09-09 at 04 35 03

(Back to top)

5.2. Class

5.2.1. Class Definition

  • Class is a "blue-print" for creating Object
    • For example: Cars may not be exactly same, but the structures are same.

Screenshot 2021-09-09 at 04 38 24

5.2.2. Class Syntax

  • Class attribute: Student.num_of_stu is an attribute for the whole class, cannot use self.num_of_stu
  • Init method: __init__ & using self as the first argument
  • Class Method: at least one argument – self and can be include other method argument like birth_year
class Student:
    #Class attribute
    num_of_stu = 0 
    
    #Special init method
    def __init__(self, first, last): #use self as the first argument
      self.first = first
      self.last = last
      self.email = first + '.' + last + '@smu.edu.sg'
      Student.num_of_stu += 1 #attribute for the whole class, cannot use self.num_of_stu
    
    def full_name(self, birth_year): #Method, we have at least one argument – self & birth_year
      return self.first + ' ' + self.last + ' was born in '  + birth_year 

print(Student.num_of_stu) #0 
stu_1 = Student('Ryan','Tan') 
stu_1.full_name('1995') # "Ryan Tan was born in 1995"
print(Student.num_of_stu) #1

5.3. Inheritance

  • For example, Create Representative class based on the Student class
  • super(): to inherite all the attributes in parent class & Initiate more information than parent class
  • Override: to override the method of parent class
class Rep(Student):
    def __init__(self, first, last, cat):
      super().__init__(first, last) #parent class handles existing arguments 
      self.cat = cat #new information
    def full_name(self): #override the full_name method of parent class, Student
      return self.cat + ' representative: ' + self.first + ' ' + self.last 

(Back to top)

5.4. Magic Methods

  • Magic methods in Python are the special methods that start and end with the double underscores __
  • Built-in classes in Python define many magic methods. Use the dir() function to see the number of magic methods inherited by a class.
    >>> dir(int)
    ['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', ...]
  • Magic methods are most frequently used to define behaviors of predefined operators in Python
    • For example: __str__() method is executed when we want to print an object in a printable format. We can override the functionality of the __str__() method. As an instance:
      class Human:
          def __init__(self, id, name, addresses=[], maps={}):
              self.id = id
              self.name = name
              self.addresses = addresses
              self.maps = maps
      
          def __str__(self):
              return f'Id {self.id}: {self.name}'
      human = Human(1, 'Quan Nguyen', ['Address1', 'Address1'], {'London':2, 'UK':3})
      print(human) #Id 1: Quan Nguyen

(Back to top)

6. Regular Expression

6.0. Regex Summary

  • Character class: [] specify a set of characters to match
  • Metacharacters: \w [a-zA-Z0-9_], \W [^a-zA-Z0-9_], \d, \D, \s (white-space), \S (non white-space), . match anything except \n
  • \ to remove special meaning of the metacharacter. For example: [.] means match "." dot in the text, not mean match anything
  • Anchors: ^, $, \b to get grid of \n at beginning & end of text:
    • ^ beginning of text line, $ end of text line: use re.M to match the beginning ^ /end $ pattern in multiple lines
    • \b word boundary match until last word character [a-zA-Z0-9_]
  • Quantifiers: * zero or more , ? zero or one, + one or more, {m} m repetitions, {m, n} any number of repetitions from m to n, inclusive: to repeating literal/metacharacter/group/backreference
  • Group: to keep certain part out of the entire match, or match a repeat with backref
  • Backreference: Numbered groups: \1, \2, \3 numbering: from out to in, from left to right
  • Look ahead & Look behind

6.1. What is Regex

  • Regex: is a tiny programming language used for data manipulation
  • re module: is a Python module containing re engine and providing the regular expression functionality

Screenshot 2021-09-09 at 04 35 03

6.2. Search for a pattern

  • To search for a pattern, there are 2 steps:
    • Step 1: Compile the pattern
    • Step 2: Perform the search

Screenshot 2021-09-09 at 04 35 03

6.2.1. Compile the pattern

  • re.compile() function compiles a pattern so that the re engine can perform the search.
pat = re.compile(r'abc')
print(pat)
print(type(pat))

re.compile('abc')
<class 're.Pattern'>

6.2.2. Perform the search

6.2.2.1. Match()

  • match(): match the pattern from the beginning.
mat_abc1 = pat.match('ABC,ABc,AbC,abc')
mat_abc2 = pat.match('abc,ABc,AbC,abc')
print(mat_abc1) #None because pattern 'abc' not appear at the beginning
print(mat_abc2) #<re.Match object; span=(0, 3), match='abc'>

6.2.2.2. Search()

  • search(): match the pattern in any position in the text and returns the match in re.Match class.
  • BUT it only returns the first match
sear_abc1 = pat.search('ABC,ABc,AbC,abc')
sear_abc2 = pat.search('abc,ABc,AbC,abc')

print(sear_abc1) #<re.Match object; span=(12, 15), match='abc'>
print(sear_abc2) #<re.Match object; span=(0, 3), match='abc'>
print(type(sear_abc1))#<class 're.Match'>

6.2.2.3. Findall()

  • findall() method: finds all the matched strings and return them in a list.
find_abc1 = pat.findall('ABC,ABc,AbC,abc')
find_abc2 = pat.findall('abc,ABc,AbC,abc')

print(find_abc1) #['abc']
print(find_abc2) #['abc', 'abc']

6.2.2.4. FindIter()

  • The findall() method returns all the matched strings in a list.
  • finditer(): returns an iterator that lazily splits matches one at a time.
finditer_abc = pat.finditer('abc,ABc,AbC,abc')

print(finditer_abc) #<callable_iterator object at 0x7ff650853040>

for m in finditer_abc:
    #<re.Match object; span=(12, 15), match='abc'>
    #<re.Match object; span=(0, 3), match='abc'>
    print(m) 

6.3. Metacharacters

The metacharacters can be categorized into several types as below:

  • . ^ $ * + ? { } [ ] \ | ( )

  • "[" and "]"

  • Type 1 . [] - ^ \d \D \w \W \s \S: Metacharacters that match a single character:

    • . Dot: match any single character except the newline \n character

      p = re.compile(r'.at')
      m = p.findall('cat bat\n sat cap') #['cat', 'bat', 'sat']
    • [] character class: specify a set of characters to match

      • Metacharacters lose their special meaning inside character class.
      p = re.compile(r'[abcABC]')
      m = p.findall('abcABC') #['a', 'b', 'c', 'A', 'B', 'C']
    • - hyphen: specify a range of characters to match

      • If you want to match a literal hyphen, put it in the beginning or the end inside [], for ex: [-a-e] or [a-e-]
      p = re.compile(r'[a-z0-9]')
      m = p.findall('d0A3z6P') #['d', '0', '3', 'z', '6']
      
      p = re.compile(r'[-a-e]') # or [a-e-] if you want to match a hyphen -
      m = p.findall('e-a-s-y, easy') #['e', '-', 'a', '-', '-', 'e', 'a']
    • ^ caret: match any character NOT in the character class

      • A caret ^ not at the beginning of a character class, it works as a normal character
      • A caret outside a character class has a different meaning.
      p = re.compile(r'[^0-9a-z]') #Pattern exclude 0-9 and lowecase of a to z
      m = p.findall('1 2 3 Go') #Result: [' ', ' ', ' ', 'G'] &#8594; Only match space + G
      
      p = re.compile(r'[0-9^a-z]')#if ^ not at the beginning of a character class, it works as a normal character
      m = p.findall('1 2 3 ^Go') #['1', '2', '3', '^', 'o']
    • \d vs \D digits: \d (numeric digits) \D (non-digit, including \n)

      p = re.compile(r'\d')
      m = p.findall('a1\nA#')   #['1']
      p = re.compile(r'\D')
      m = p.findall('a1\nA#') #['a', '\n', 'A', '#']
    • \w vs \W word characters: \w ([a-zA-Z0-9_]) \W ([^a-zA-Z0-9_])

      Screenshot 2021-09-09 at 04 35 03

      p = re.compile(r'\w')
      m = p.findall('_#a!E$4-') #['_', 'a', 'E', '4']
      
      p = re.compile(r'\W')
      m = p.findall('_#a!E$4-') #['#', '!', '$', '-']
    • \s vs \S white space: \s (white-space) \S (non white-space) match based on whether a character is a whitespace

      Screenshot 2021-09-09 at 04 35 03

      text = 'Name\tISSS610\tISSS666\nJoe Jones\tA\tA\n' 
      
      p = re.compile(r'\s')
      m = p.findall(text) #['\t', '\t', '\n', ' ', '\t', '\t', '\n']
  • Type 2: Escaping metacharacters: \ Removes the special meaning of a metacharacter

    p1 = re.compile(r'.')
    p2 = re.compile(r'\.')
    m1 = p1.findall('smu.edu.sg') #['s', 'm', 'u', '.', 'e', 'd', 'u', '.', 's', 'g']
    m2 = p2.findall('smu.edu.sg') #['.', '.']
    
    p = re.compile(r'\d\\d') #First \d is to match any digit, then second \\d is to match "\d"
    m = p.findall('135\d') #['5\\d'] i.e: 5\d
  • Type 3: Anchors: ^ beginning of text, $ end of text, \b word boundary

    • ^ beginning of text: We have seen a caret used in a character class. Here the caret is used without a character class.
      • It matches the starting position in the text.
      • In the case of Multiline text, we can add flag re.MULTILINE or re.M in re.compile
      p = re.compile(r'^a[ab]c') 
      m = p.findall('''aac\nabc''') #['aac']
      
      p = re.compile(r'^a[ab]c', re.M) #Add flag re.M to match multiple text
      m = p.findall('''aac\nabc''') #['aac', 'abc']
    • $ end of text:
      • It matches the ending position in the text
      • Similar to caret, dollar sign matches the ending position but not in each line in multiline text, but this behavior can also be changed with re.MULTILINE or re.M
      p = re.compile(r'ab.$')
      m = p.findall('abc abd abe abf') #['abf']
      
      p = re.compile(r'[ab]c$', re.M) #Add flag re.M to match multiple text
      m = p.findall('ac\nbc') #['ac', 'bc']
    • \b word boundary: Match based on whether a position is a word boundary
      p = re.compile(r'\b\d\d\b')
      m = p.findall('1 2 3 11 12 13 111 112 113') #['11', '12', '13']
      
      p = re.compile(r'\b\w\w\b')
      m = p.findall('aa,ab;ac(AA)AB AC') #['aa', 'ab', 'ac', 'AA', 'AB', 'AC']
  • Type 4: Quantifiers:

    • *: zero or more
    • ?: zero or one
    • +: one or more
    • {m}: m repetitions
    • {m, n}: any number of repetitions from m to n, inclusive.
    p = re.compile(r'a[ab]*c')
    m = p.findall('a ab ac abc aac aabc aaac ababc') #['ac', 'abc', 'aac', 'aabc', 'aaac', 'ababc']
    
    p = re.compile(r'a[ab]+c')
    m = p.findall('a ab ac abc aac aabc aaac ababc') #['abc', 'aac', 'aabc', 'aaac', 'ababc']
    
    p = re.compile(r'a[ab]?c')
    m = p.findall('a ab ac abc aac aabc aaac ababc') #['ac', 'abc', 'aac', 'abc', 'aac', 'abc']
    
    p = re.compile(r'\d{3}')
    m = p.findall('1 2 3 11 12 13 111 112 113') #['111', '112', '113']
    
    p = re.compile(r'\d{2,3}')
    m = p.findall('1 2 3 11 12 13 111 112 113') #['11', '12', '13', '111', '112', '113']

6.4. Grouping Constructs

Screenshot 2021-09-23 at 11 08 51

6.4.1. Grouped Pattern

  • We can group pattern using () into sub-patterns
    p = re.compile(r'(\w+): (\d+)') #Sub-patterns are 2 group 
    m = p.findall('Course: Grade\nMath: 89\nPhysics: 92\n English: 78') #[('Math', '89'), ('Physics', '92'), ('English', '78')]
    
    chapters = 'Chapter 12: Numpy\n\
    Chapter 13: Pandas\n\
    Chapter 14: Data Visualzation'
    p = re.compile(r'^Chapter (\d+: .+)', re.M) #['12: Numpy', '13: Pandas', '14: Data Visualzation']
    m = p.findall(chapters)

6.4.2. Alternation

  • Match the sub-pattern before or the one after

    p = re.compile(r'(\w+)\.(bat|zip|exe)')
    m = p.findall('game.exe auto.bat text.zip') #[('game', 'exe'), ('auto', 'bat'), ('text', 'zip')]

6.4.3. re.Match.groups() vs re.Match.group()

  • .groups(): return all matched groups
  • .group(): allows users to choose different groups by giving the indices of the groups.
    • group(0) returns the whole match.
    • group(1) returns the 1st captured group.
    • group(2, 3, 4) returns the 2nd, 3rd and 4th groups.
      #Ex 1:  re.Match.groups() vs re.Match.group()
      p = re.compile(r'(\w+\.\w+)\s(\w+\.\w+)')
      m = p.search('game.exe auto.bat text.zip')
      
      print(m.groups()) #('game.exe', 'auto.bat')
      print(m.group(1)) # game.exe
      
      #Ex 2: re.Match.group()
      pattern = r'(\w+)\W+(\w+)\W+(\w+)\W+(\w)+'
      p = re.compile(pattern)
      m = p.search('one,,,two:three++++++4')
      print(m.group(0)) #one,,,two:three++++++4 (i.e: the whole match)
      print(m.group(1)) #one (i.e: match only group 1)
      print(m.group(2, 3, 4)) #('two', 'three', '4') 

6.4.4. Back Reference (\1, \2, \3, ...)

  • '(\w+)-\1' is different from '(\w+)-\w+'
  • '(\w+)-\1' : when the first group is matched, \1 match the same literal string in group1
  • For example: two patterns both match ‘one-one’, but the one with backreference, '(\w+)-\1', won’t match ‘one-two’.
    # pattern tries to match the type of number that starts with a few digits followed by one digit 
    # and then repeats the first few digits.
    p = re.compile(r'((\d+)\d\2)')
    m = p.finditer('1234123, 11311, 123, 54345')
    for string in m:
        print(string.group(1, 2)) 
        #('1234123', '123') (i.e: 123 - 4 - same as group 2, in this case is 123)
        #('11311', '11')
        #('434', '4')

6.5. Flags

Three common flags that are very useful are:

  • re.MULTILINE or re.M : make “^”/“$” match starting/ending position of each line.
  • re.IGNORECASE or re.I: match letters in a case-insensitive way.
  • re.DOTALL or re.S : make “.” match any character, including newlines \n.
p1 = re.compile(r'abc')
m1 = p1.findall('abc ABC aBC Abc') #['abc']

p2 = re.compile(r'abc', re.I)
m2 = p2.findall('abc ABC aBC Abc') #['abc', 'ABC', 'aBC', 'Abc'] because re.I means Ignore Case

6.6. Module-Level re Methods

6.6.1. re.match, re.search, re.findall, re.finditer

  • Using the module-level methods can skip the step compiling the pattern.
match = re.match(r'abc', 'abc')
search = re.search(r'abc', 'a abc')
findall = re.findall(r'abc', 'abc abc ab bc a b c')
finditer = re.finditer(r'abc', 'abc abc ab bc a b c')

print(f'match: {match}') #<re.Match object; span=(0, 3), match='abc'>
print(f'search: {search}') #<re.Match object; span=(2, 5), match='abc'>
print(f'findall: {findall}') #findall: ['abc', 'abc']
print(f'finditer: {finditer}') #<callable_iterator object at 0x7fb2e9796a30>

6.6.2. String-modifying methods

Split()

  • By default, the split() method returns a list of strings broken down, excluding the matched strings.
  • It is also possible to make split() return the matched strings, simply by using a group to capture the whole pattern.
p = re.compile(r'\W+')
split = p.split('The~split*method-is%powerful') #['The', 'split', 'method', 'is', 'powerful'], by default

#It is also possible to make split() return the matched strings, simply by using a group to capture the whole pattern.
p = re.compile(r'(\W+)')
split = p.split('The~split*method-is%powerful') #'The', '~', 'split', '*', 'method', '-', 'is', '%', 'powerful']

Sub(replacement string, text to match the pattern) & Subn()

  • sub() returns a new string after replacement.
  • subn() returns a tuple containing the new string and the number of replacements.
p = re.compile(r'Toko')
sub = p.sub('Tokyo', 'Toko is a large city.') #Tokyo is a large city.
subn = p.subn('Tokyo', 'Toko is Toko') #('Tokyo is Tokyo', 2)

6.7. Look ahead and Look behind

6.7.1. Look ahead (Look Forward)

Look ahead positive (?=)

  • Find expression A where expression B is matching: A(?=B)
p = re.compile(r"\s(\w+(-\w+){1,3}(?=[\s.]))") #(?=[\s.]) match A if B=[\s.] is matching either space or dot.
m = p.findall('''
The man is good-looking and rich.
The eleven-year-old twenty-five-storey building was developed by a famous developer in town.
This art piece is one-of-a-kind.
There is a five-and-one-half-foot-long sign at the outskirt of the town.''')
#[('good-looking', '-looking'), ('eleven-year-old', '-old'), ('twenty-five-storey', '-storey'), ('one-of-a-kind', '-kind')]

Look ahead negative (?!)

  • Find expression A where expression B does not follow:A(?!B)

6.7.2. Look behind (Look Backward)

Look behind positive (?<=)

  • Find expression A where expression B precedes: (?<=B)A

Look behind negative (?<!)

  • Find expression A where expression B does not precede: (?<!B)A

(Back to top)

About

ISSS622 - Python Programming and Data Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%