Python

Python-Basics

its fun learning python (again!) :-)

IMP links / CheatSheets

https://www.cheatography.com/desmovalvo/cheat-sheets/python3-data-structures/pdf/
https://ehmatthes.github.io/pcc/cheatsheets/README.html
https://www.analyticsvidhya.com/blog/2015/06/quick-guide-text-data-cleaning-python/
https://www.analyticsvidhya.com/blog/2015/06/data-visualization-in-python-cheat-sheet/
https://www.analyticsvidhya.com/blog/2015/06/infographic-cheat-sheet-data-exploration-python/
https://dbader.org/blog/

global veriable

https://stackoverflow.com/questions/423379/using-global-variables-in-a-function-other-than-the-one-that-created-them

glob_var=0

def set_1():
    global glob_var
    glob_var = 10

def set_2():
    glob_var = 20

print(glob_var)
set_1()
print(glob_var)
set_2()
print(glob_var)

0
10
10

init.py

The _init_.py file can contain the same Python code that any other module can contain, and Python will add some additional attributes to the module when it is imported.

if name == "main":

https://stackoverflow.com/questions/419163/what-does-if-name-main-do

The global variable, name, in the module that is the entry point to your program, is 'main'. Otherwise, it's the name you import the module by.
So, code under the if block will only run if the module is the entry point to your program.
It allows the code in the module to be importable by other modules, without executing the code block beneath on import.

yield

https://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do

yield is a keyword that is used like return, except the function will return a generator.

def getGenerator():
    list = range(5)
    for i in list:
        yield i*i
myGen = getGenerator()
for i in myGen:
    print(i)

Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:

*args and **kwargs

https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters
http://sys-exit.blogspot.com/2013/07/python-positional-arguments-and-keyword.html

The *args and **kwargs is a common idiom to allow arbitrary number of arguments to functions
The *args will give you all function parameters as a tuple
The **kwargs will give you all keyword arguments except for those corresponding to a formal parameter as a dictionary

def test(*args):
    for a in args:
        print(a)
test(1)
test(1,2,3,4,5)

def test_new(**kwargs):
    for a in kwargs:
        print(a,kwargs[a])
test_new(name="vb",age=27,sal=2000)

1
1
2
3
4
5
name vb
age 27
sal 2000

Python Scope LEGB Rule.

https://stackoverflow.com/questions/291978/short-description-of-the-scoping-rules

L, Local — Names assigned in any way within a function (def or lambda)), and not declared global in that function.
E, Enclosing-function locals — Name in the local scope of any and all statically enclosing functions (def or lambda), from inner to outer.
G, Global (module) — Names assigned at the top-level of a module file, or by executing a global statement in a def within the file.
B, Built-in (Python) — Names preassigned in the built-in names module : open,range,SyntaxError,...

The pipfile i.e. virtual environment

https://robots.thoughtbot.com/how-to-manage-your-python-projects-with-pipenv

pipenv install <abc>
pipenv install <xyz>
#OR
pipenv install requirments.txt
#this will create 'pipfile'

Once we have pipfile created run below to...

automagically locate the Pipfiles
create a new virtual environment
and install the necessary packag

pipenv install

To activate virtual environment

pipenv shell

Running code in the virtual environment

pipenv run python <myCode>.py

And Finaly

exit

https://docs.pipenv.org/basics/

- Generally, keep both Pipfile and Pipfile.lock in version control.
- Do not keep Pipfile.lock in version control if multiple versions of Python are being targeted.
- Specify your target Python version in your Pipfile’s [requires] section. Ideally, you should only have one target Python version, as this is a deployment tool.
- pipenv install is fully compatible with pip install syntax, for which the full documentation can be found here.

Deployment pipeline for python project, with Makefile

https://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/index.html

A. test module

make test
python setup.py test

pull docker image from nexux
run docker container for test

B. build and release

make compile setup.py bdist_wheel

this will create <>.whl

make release pipenv run pip install twine && pipenv run twine upload --config-file $(PYPIRC_PATH) -r nexus <>.zip

python packages management

https://docs.python.org/3/reference/import.html
Need to understand below three things, - a. python egg and wheel
- b. Makefile
- c. setup.py

Package - A folder/directory that contains init.py file.
Module - A valid python file with .py extension.
Distribution - How one package relates to other packages and modules.

Regular packages	Namespace packages
typically implemented as a directory containing an _init_.py file	is a composite of various portions, where each portion contributes a subpackage to the parent package
When imported, _init_.py file is implicitly executed, and the objects it defines are bound to names in the package’s namespace.	there is no _init_.py file

a. python egg and wheel

https://packaging.python.org/discussions/wheel-vs-egg/
http://peak.telecommunity.com/DevCenter/PythonEggs
https://www.blog.pythonlibrary.org/2012/07/12/python-101-easy_install-or-how-to-create-eggs/
https://mrtopf.de/en/a-small-introduction-to-python-eggs/
https://stackoverflow.com/questions/46915070/wheel-files-what-is-the-meaning-of-none-any-in-protobuf-3-4-0-py2-py3-none-a

Wheel is same concept as a .jar file in Java, it is a .zip file with some metadata files renamed .egg, for distributing code as bundles
It is a logical structure embodying the release of a specific version of a Python project, comprising its code, resources, and metadata.

wheel	egg
Wheel is a distribution format, i.e a packaging format.	both a distribution format and a runtime installation format (if left zipped), and was designed to be importable.
Wheel archives do not include .pyc files.	Eggs are to Pythons as Jars are to Java...
Wheel is internally organized by sysconfig path type, therefore making it easier to convert to other formats.	Python eggs are a way of bundling additional information with a Python project, that allows the project's dependencies to be checked and satisfied at runtime, as well as allowing projects to provide plugins for other projects
	.egg files are a "zero installation" format for a Python package; no build or install step is required, just put them on PYTHONPATH or sys.path and use them

for spark submit we rename .whl to .zip !!!
'sample_project' project structure

	/sample_project
		/src
			main.py
			__inti__.py
	Pipfile
	README.md
	setup.py

to build a *wheel for project 'sample_project' python setup.py bdist_wheel
- this will create below files/folder
- wheel will go to : dist/yourproject-.whl
- Ex: dist/example_pkg-0.0.1-py3-none-any.whl
- whl file package names by components : {distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl
- if 'abi tag; is 'noon' than that means it is 'sourse package'
- it creats ,.whl and egg-info directory

	/sample_project
		/src
			main.py
			__inti__.py
		/example_pkg.egg-info
			top_level.txt
			SOURCES.txt
			PKG-INFO
			dependency_links.txt
		/dist
			example_pkg-0.0.1-py3-none-any.whl
		/build
			... # lib, etc
	Pipfile
	README.md
	setup.py

to build a *egg for project 'sample_project' python setup.py bdist_egg
- three new folders: build, dist, and mymath.egg-info.
- The only one we care about is the dist folder in which you fill find your egg file, sample_project-0.1-py2.6.egg.
- The egg file itself is basically a zip file.
- If you change the extension to “zip”, you can look inside it and see that it has two folders: mymath and EGG-INFO.
- At this point, you should be able to point easy_install at your egg on your file system and have it install your package.

pipenv run pip wheel requirements.txt --only-binary all

this will download all dependencies from requirements.txt file and create a single wheel file from it.

b. Makefile

https://krzysztofzuraw.com/blog/2016/makefiles-in-python-projects.html
http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/
https://swcarpentry.github.io/make-novice/02-makefiles/

Makefiles are a simple way to organize code compilation.
complicated shell commands- put them under a rule in the makefile.
This is a build file, which for Make is called a Makefile - a file executed by Make
Makefiles Do Not Have to be Called Makefile. if we call it something else we need to tell Make where to find it. This we can do using -f flag.
Additional things that you want to add is something called PHONY. By default, makefile operates on files so if there will be a file called clean-pyc it will try to use it instead of a command. To avoid this use .PHONY at the beginning of your makefile
make -f MyOtherMakefile

    #example make file    
.PHONY: init clean wheel 
    
clean-build: 
	rm -rf build/
	
clean-dist: 
	rm -rf dist/

clean: clean-build clean-dist 
	rm -rf .eggs/
	rm -f Pipfile.*
	rm -f VERSION
	find . -name "*.egg-info" -exec rm -rf {} +
	find . -name "*.dist-info" -exec rm -rf {} +
	find . -name "*.egg" -exec rm -rf {} +
	find . -name '*.pyc' -exec rm -f {} +
	find . -name '*.pyo' -exec rm -f {} +
	find . -name '~' -exec rm -f {} +
	find . -name '__pycache__' -exec rm -rf {} +

init: clean
	pipenv --python $(PYTHON_VERSION)
	pipenv run pip install pylint

bdist_wheel-depen: init
	mkdir -p build && cd build && pipenv run pip wheel -r ../requirements.txt --only-binary all

wheel: bdist_wheel-depen
	mkdir -p dist && mv build/*.whl dist && pipenv run python setup.py bdist_wheel

c. setup.py

https://stackoverflow.com/questions/1471994/what-is-setup-py
https://pythonhosted.org/an_example_pypi_project/setuptools.html

setup.py tells you that the module/package you are about to install has been packaged and distributed with Distutils, which is the standard for distributing Python Modules.
SETUP() - IS THE MAIN FUNCTIONS
it used setuptools lib : from setuptools import setup
used to register python package with pipy
and distrubute / build package ( wheel the package )
using it is like

python setup.py <cmd>

# python setup.py --help :  to get all the <cmd>

Anaconda

https://linuxhint.com/anaconda-python-tutorial/

Anaconda is data science and machine learning platform for the Python and R programming languages.
It is designed to make the process of creating and distributing projects simple, stable and reproducible across systems.
Anaconda is a Python based platform that curates major data science packages including pandas, scikit-learn, SciPy, NumPy and Google’s machine learning platform, TensorFlow.
It comes packaged with conda (a pip like install tool), Anaconda navigator for a GUI experience, and spyder for an IDE.

Conda - environment manager

https://medium.freecodecamp.org/why-you-need-python-environments-and-how-to-manage-them-with-conda-85f155f4353c
https://conda.io/docs/_downloads/conda-cheatsheet.pdf
https://conda.io/docs/user-guide/tasks/manage-environments.html#activating-an-environment
https://community.hortonworks.com/articles/58418/running-pyspark-with-conda-env.html
https://www.slideshare.net/AaronMeurer/conda-a-binary-scipy2014

A conda environment is a directory that contains a specific collection of conda packages that you have installed.
On a machine the environment is made out of variables linking to different target folders containing executable or other resource files.
So if you execute a command it is either referenced from your PATH, PYTHON_LIBRARY, or any other defined variable.
These variables link to files in directories like /usr/bin, /usr/local/bin or any other referenced location.
They are called hard links or absolute reference as they start from root /.
Environments using hard links are not easily transportable
Therefor it is necessary to use relative links in a transportable/relocatable environment.
This is especially true for conda env as it creates hard links by default.
By making the conda env relocatable it can be used in a application by referencing it from the application root .
You can also share your environment with someone by giving them a copy of your environment.yaml file

$ conda create -n MyCondaEnv python=3.6 pandas


$ source activate MyCondaEnv
(MyEnv) $

Cloning an environment

https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
conda create --name myclone --clone myenv

https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/

Conda and pip
- pip installs python packages within any environment; conda installs any package within conda environments.
- If you want to, say, manage Python packages within an existing system Python installation, conda can't help you: by design, it can only install packages within conda environments.
Conda and Anaconda
- Conda is a package manager; Anaconda is a distribution.
- A software distribution is a pre-built and pre-configured collection of packages that can be installed and used on a system.
- A package manager is a tool that automates the process of installing, updating, and removing packages.

pyspark with CONDA

https://mapr.com/blog/python-pyspark-condas-pt1/

PYSPARK_PYTHON : The variable controlling the python environment for python applications in Spark .

# client mode
export PYTHONPATH=/opt/cloudera/parcels/Anaconda/envs/<conda_env_name>
export PATH=${PYTHONPATH}/bin:$PATH 

spark2-submit --master yarn --deploy-mode client ...

# cluster mode
spark2-submit --master yarn --deploy-mode cluster \
--conf "spark.yarn.appMasterEnv.PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/envs/<conda_env_name>/bin/python"  \
--conf "spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/envs/<conda_env_name>/bin/python" \
pySpark_script.py

TypeError vs ValueError

https://www.datacamp.com/community/tutorials/exception-handling-python

TypeError	ValueError
Passing arguments of the wrong type (e.g. passing a list when an int is expected) should result in a TypeError	Raised when a built-in operation or function receives an argument that has the right type but an inappropriate value
Whenever you see an error that include 'NoneType' that means that you have an operand or an object that is None when you were expecting something else.

package management

pipenv	Docker	Conda
requirements.txt, setup.py, Makefile	Dockerfile, setup.py, Makefile	setup.py, Makefile

Unit tests [TODO]

https://www.pythonsheets.com/notes/python-tests.html
https://python.g-node.org/python-summerschool-2009/_media/cheat_sheets.pdf
https://docs.python-guide.org/writing/tests/

mock.Mock vs mock.patch
- patch replaces the class with a mock object and lets you work with the mock instance
- Once you patch a class, references to the class are completely replaced by the mock instance.
- mock.patch is usually used when you are testing something that creates a new instance of a class inside of the test.
Mock vs MagicMock
- Mock is use for replacing or mocking an object in python unittest while.
- MagicMock is a subclass of Mock with all the "magic methods" pre-created and ready to use.
- Below are the pre-created magic methods and its default values for MagicMock.
```
__lt__: NotImplemented
__gt__: NotImplemented
__le__: NotImplemented
__ge__: NotImplemented
__int__: 1
__contains__: False
__len__: 0
__iter__: iter([])
...
```
@pytest.fixture(autouse=True)
- Autouse fixtures (xUnit setup on steroids) The class-level transact fixture is marked with autouse=true which implies that all test methods in the class will use this fixture without a need to state it in the test function signature or with a class-level usefixtures decorator.

@pytest.fixture(autouse=True)
def my_fixture():
    ...

unittest

Ex 1

import unittest

def fun(x):
    return x + 1

class MyTest(unittest.TestCase):
    def test(self):
        self.assertEqual(fun(3), 4)

Ex 2

#Basic structure of a test suite
import unittest
class FirstTestCase(unittest.TestCase):
	 def setUp(self):
		 """setUp is called before every test"""
		 pass
	 def tearDown(self):
		 """tearDown is called at the end of every test"""
		 pass
	 def testtruisms(self):
		 """All methods beginning with ‘test’ are executed"""
		 self.assertTrue(True)
		 self.assertFalse(False)
class SecondTestCase(unittest.TestCase):
	 def testapproximation(self):
		 self.assertAlmostEqual(1.1, 1.15, 1)

if __name__ == '__main__':
	 # run all TestCase's in this module
	 unittest.main()

Assert methods in unittest.TestCase Most assert methods accept an optional msg argument, which is used as an explanation for the error.

methon	result
assert_(expr[, msg) assertTrue(expr[, msg])	Fail if expr is False
assertFalse(expr[, msg])	Fail if expr is True
assertEqual(first, second[, msg])	Fail if first is not equal to second
assertNotEqual(first, second[, msg])	Fail if first is equal to second
assertAlmostEqual(first, second [, places[, msg]])	Fail if first is equal to second up to the decimal place indicated by places
assertNotAlmostEqual(first, second [, places[, msg]])	Fail if first is not equal to second up to the decimal place indicated by places
assertRaises(exception, callable, ...)	Fail if the function callable does not raise an exception of class exception. If additional positional or keyword arguments are given, they are passed to callable.
fail([msg])	Always fail

pytest

By default, pytest runs tests in sequential order
pytest provides us with an option to run tests in parallel. For this, we need to first install the pytest-xdist plugin.

pip install pytest-xdist
# pytest -n <num>
pytest -n 3

https://pypi.org/project/pytest-runner/
https://stackoverflow.com/questions/38155169/how-do-i-get-pytest-to-run-all-functions-as-test
https://docs.pytest.org/en/latest/goodpractices.html

Ex 1

import pytest 
def fun(x):
    return x+1

def test_fun():
    assert fun(5) == 7

Ex 2

import pytest 

def fun(x):
    try:
        return x+1
    except Exception as ex:
        raise

def test_fun():
    with pytest.raises(Exception) as exc_info:
        fun('5')
    assert exc_info.typename == "TypeError"

Code coverage with pytest-cov package

pip install pytest-cov

pytest \
--cov-report term-missing \
--cov=coverage_1 \
test_coverage_1.py

Use coverage as a tool to find blind spots in the code, but not as a metric or target goal.
Pytest reusable features so that we can feed our tests with data or objects in order to test more effectively and without repetition.

Auto unit testing with setuptool

Logging

http://zyxue.github.io/2015/08/05/quick-setup-for-python-logging.html

import logging  
logging.basicConfig(
    level=logging.DEBUG, format='%(asctime)s|%(levelname)s|%(message)s')

logging.info('some message')

import logging

def sample_function(secret_parameter):
    logger = logging.getLogger(__name__)  # __name__=projectA.moduleB
    logger.debug("Going to perform magic with '%s'",  secret_parameter)
    ...
    try:
        result = do_magic(secret_parameter)
    except IndexError:
        logger.exception("OMG it happened again, someone please tell Laszlo")
    except:
        logger.info("Unexpected exception", exc_info=True)
        raise
    else:
        logger.info("Magic with '%s' resulted in '%s'", secret_parameter, result, stack_info=True)

# 1: logging.yaml  

version: 1
disable_existing_loggers: False
formatters:
    simple:
        format: "%(asctime)s - %(levelname)s -[%(name)s]  %(message)s"
handlers:
    console:
        class: logging.StreamHandler
        level: DEBUG
        formatter: simple
        stream: ext://sys.stdout

    file:
        class: logging.handlers.RotatingFileHandler
        level: INFO
        formatter: simple
        filename: app_test.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8

loggers:
    <module name>.<py file>:
        level: DEBUG
        handlers: [console]
        propagate: no
    <module name>:
        level: DEBUG
        handlers: [console]
        propagate: no

root:
    level: INFO
    handlers: [console, file]


# 2: main.py

import logging
import logging.config

def setup_logging(default_path='logging.yaml', default_level=logging.INFO, env_key='LOG_CFG'):
    """Setup logging configuration
    """
    print("Path:")
    for i in os.listdir("."):
        print(i)
    config_path = default_path
    value = os.getenv(env_key, None)
    if value:
        config_path = value
    if os.path.exists(config_path):
        with open(config_path, 'rt') as f:
            config = yaml.safe_load(f.read())
        logging.config.dictConfig(config)
        logging.info("Logging successfully configured from {0}".format(config_path))
    else:
        logging.basicConfig(level=default_level)
        logging.warn("No logging configuration found. Using default configuration.")

# 3: test.py

import logging

def testF(abc):
    logger = logging.getLogger(__name__)
    try:
        logger.debug("Starting ...")
	...		
	logger.debug("...finished ")
        return result

    except Exception as e:
        logger.error("Error processing : {0}".format(e))

CI vs CD in Python

https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment

code coverage

https://coverage.readthedocs.io/en/v4.5.x/

pip install coverage 

coverage run --source <PY PACKAGE FOLDER> -m pytest 	# find the py package 
coverage report -m --fail-under=80			# fail if less than 80% code coverage
coverage html						# generate html report

style check

https://realpython.com/python-code-quality/

flake8 <PY PACKAGE FOLDER>
pylint <PY PACKAGE FOLDER> --ignore= <PY PACKAGE FOLDER>/tests

tox

https://tox.readthedocs.io/en/latest/

tox aims to automate and standardize testing in Python.

-tox.ini

[tox]
envlist = py36tox

[testenv]
setenv =
    PYTHONPATH={toxinidir}
    LC_ALL=en_US.UTF-8
    LANG=en_US.UTF-8
whitelist_externals =
    pytest
commands =
    pytest 

[testenv:abc]
whitelist_externals =
commands =

Function channing

https://medium.com/@adamshort/python-gems-5-silent-function-chaining-a6501b3ef07e

Resizing Jupyter notebook

https://medium.com/@1522933668924/using-matplotlib-in-jupyter-notebooks-comparing-methods-and-some-tips-python-c38e85b40ba1

from IPython.display import display, HTML

display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }
</style>
"""))

Python Anti-Patterns

https://docs.quantifiedcode.com/python-anti-patterns/index.html

Correctness
Maintainability
Readability
Security
Performance

Clean Code notes...

The KeyWords

DRY/OAOO The ideas of Don't Repeat Yourself (DRY) and Once and Only Once (OAOO)
YAGNI YAGNI (short for You Ain't Gonna Need It) is an idea you might want to keep in mind very often when writing a solution if you do not want to over-engineer it.
KIS KIS (stands for Keep It Simple) relates very much to the previous point. When you are designing a software component, avoid overengineering it; ask yourself if your solution is the minimal one that fits the problem.
EAFP/LBYL EAFP (stands for Easier to Ask Forgiveness than Permission), while LBYL (stands for Look Before You Leap).
Mixins A mixin is a base class that encapsulates some common behavior with the goal of reusing code. Typically, a mixin class is not useful on its own, and extending this class alone will certainly not work, because most of the time it depends on methods and properties that are defined in other classes. The idea is to use mixin classes along with other ones, through multiple inheritance, so that the methods or properties used on the mixin will be available.
SOLID principles
- S: Single responsibility principle ; a class) must have only one responsibility. Again, the smaller the class, the better.
- O: Open/closed principle; open to extension but closed for modification.
- L: Liskov's substitution principle; series of properties that an object type must hold to preserve reliability on its design.
- I: Interface segregation principle; that interfaces (set of methods an object expose) should be small.
- D: Dependency inversion principle; making it independent of things that are fragile, volatile, or out of our control
Decorators : decorators wrap a function, modifying its behavior.
- is a mechanism to simplify the way functions and methods are defined when they have to be modified after their original definition.
- Decorators are just syntax sugar for calling whatever is after the decorator as a first parameter of the decorator itself, and the result would be whatever the decorator returns.
- original is the decorated function, often also called a wrapped object.
- Use of decorators: Transforming parameters, Tracing code, Validate parameters, Implement retry operations, Simplify classes by moving some (repetitive) logic into decorators

DockString vs Annotations

class cls_no_use:
    def __init__(self, a, b):
        self.a = a
        self.b =b

        def _func_add(a: int = 10, b: int = 10) -> cls_no_use:
    """This func dose nothing but addition!"""
    return a+b

DockString

_func_add.__doc__

'This func dose nothing but addition!'

Annotations

_func_add.__annotations__

{'a': int, 'b': int, 'return': __main__.cls_no_use}

doctest The doctest module searches for pieces of text that look like interactive Python sessions, and then executes those sessions to verify that they work exactly as shown.

def my_fun():
    """
    this is my_fun
    Example:
    ::
        >>> assert my_fun() == None
        >>>
    """

Finally

print(f'_func_add(): {_func_add()}')
print(f'_func_add(20): {_func_add(20)}')
print(f'_func_add(20,20): {_func_add(20,20)}')

_func_add(): 20
_func_add(20): 30
_func_add(20,20): 40

Context managers

Using with is recomended!
General way

def _func_stop():
    print("Stop")

def _func_start():
    print("Start")
    
class DoNothing:
    def __enter__(self):
        _func_stop()
        return self
    
    def __exit__(self,str_type,str_value,str_traceback):
        _func_start()
        
def func_rolling():
    print("Lest get rolling!")
    

with DoNothing():
    func_rolling()
Stop
Lest get rolling!
Start

Yet another way

import contextlib

@contextlib.contextmanager
def func_do_nnothing():
    _func_stop()
    yield
    _func_start()
    
with func_do_nnothing():
    func_rolling()
    
Stop
Lest get rolling!
Start

EAFP (stands for Easier to Ask Forgiveness than Permission), while LBYL (stands for Look Before You Leap).

https://docs.quantifiedcode.com/python-anti-patterns/readability/asking_for_permission_instead_of_forgiveness_when_working_with_files.html

if os.path.exists(filename):
with open(filename) as f:

Prefer EAFP over LBYL.

try:
with open(filename) as f:
...
except FileNotFoundError as e:
logger.error(e)

Mutable default arguments

Simply put, don't use mutable objects as the default arguments of functions. If you use mutable objects as default arguments, you will get results that are not the expected ones.

How arguments are copied to functions

The first rule in Python is that all arguments are passed by a value. Always.

def _func_test_agrs(arg):
    arg += " in func"
    return arg

immutable = " hello world"
print(_func_test_agrs(immutable))
print(immutable)

mutable = list(" hello world")
print(_func_test_agrs(mutable))
print(mutable)
 hello world in func
 hello world
[' ', 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', ' ', 'i', 'n', ' ', 'f', 'u', 'n', 'c']
[' ', 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', ' ', 'i', 'n', ' ', 'f', 'u', 'n', 'c']

Don't mutate function arguments. In general, try to avoid side-effects in functions as much as possible.

Variable number of arguments

we can use the packing mechanism and pass them all together in a single instruction:

def _func_pack_args(arg1,arg2,arg3):
    print(arg1)
    print(arg2)
    print(arg3)

v1 = 10    
v2 = 20
v3 = 30
l = [v1,v2,v3]
# _func_pack_args(l)
#TypeError: _func_pack_args() missing 2 required positional arguments: 'arg2' and 'arg3'

_func_pack_args(*l)
10
20
30

Pass any # of args.

def _func_veriable_args(*args):
    for arg in args:
        print(arg)

_func_veriable_args(10,20,30)        
10
20
30

Static Type Checking in Python 3.6

https://medium.com/@ageitgey/learn-how-to-use-static-type-checking-in-python-3-6-in-10-minutes-12c86d72677b

EDA with Seaborn and matplotlib

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

sub plot management
https://www.kaggle.com/bombatkarvivek/pyspark-ml-pipeline-with-titanic-dataset-eda

Jupyter Notebook call from command line

This even can run the spark applications.

FILE_NAME=dummy3 jupyter nbconvert --to notebook --execute dummy_utility.ipynb 
[NbConvertApp] Converting notebook dummy_utility.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python3
...

in notebook

file_name = os.environ.get('FILE_NAME','dummy_default_name')

argparse

https://docs.python.org/3/library/argparse.html

parser = argparse.ArgumentParser(description='some description', 
			formatter_class=argparse.RawDescriptionHelpFormatter)

parser.add_argument('--arg1=', 
		help='what is arg1...',
		type=str, 
		default='Other', 
		dest='environment', 
		required=False)

args = parser.parse_args()
print(args.environment)

python test_argparse.py --arg1=="hello world!"

structlog

http://www.structlog.org/en/stable/getting-started.html .

chain = [
    structlog.stdlib.filter_by_level,
    structlog.stdlib.add_log_level,
    structlog.processors.StackInfoRenderer(),
    structlog.processors.format_exc_info,
    structlog.processors.JSONRenderer()
]

log_format = "%(filename)s:%(lineno)s %(funcName)10s %(message)s"


logging.basicConfig(level=log_level,
		stream=sys.stdout,
		format=log_format)

structlog.configure_once(
processors=chain,
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
wrapper_class=structlog.stdlib.BoundLogger,
cache_logger_on_first_use=True,
)

logger = structlog.get_logger()

logger.info()
logger.degub()
...

logdecorator

https://pypi.org/project/logdecorator/ .

from logdecorator import log_on_start, log_on_end, log_on_error

logger = structlog.get_logger()

@log_on_start(logging.INFO, "start ...", logger=logger)
@log_on_end(logging.INFO, "finish ...", logger=logger)
@log_on_error(logging.ERROR, "Error ...",
              reraise=True)
def handle(*, country: int,
...

truffleHog

https://github.com/dxa4481/truffleHog/blob/dev/README.md

Searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secrets accidentally committed.

DESIGN PATTERNS in PYTHON

https://refactoring.guru/design-patterns/python

Design patterns are typical solutions to commonly occurring problems in software design. They are like pre-made blueprints that you can customize to solve a recurring design problem in your code.

Creational patterns	Structural patterns	Behavioral patterns
provide object creation mechanisms that increase flexibility and reuse of existing code.	explain how to assemble objects and classes into larger structures, while keeping the structures flexible and efficient.	take care of effective communication and the assignment of responsibilities between objects.

1. Creational Patterns

1. Abstract Factory	2. Builder	3. Factory Method	4. Prototype	5. Singleton
Lets you produce families of related objects without specifying their concrete classes.	Lets you construct complex objects step by step. The pattern allows you to produce different types and representations of an object using the same construction code.	Provides an interface for creating objects in a superclass, but allows subclasses to alter the type of objects that will be created.	Lets you copy existing objects without making your code dependent on their classes.	Lets you ensure that a class has only one instance, while providing a global access point to this instance.

2. Structural Patterns

1. Adapter	2. Bridge	3. Composite	4. Decorator	5. Facade	6. Flyweight	7. Proxy
Allows objects with incompatible interfaces to collaborate.	Lets you split a large class or a set of closely related classes into two separate hierarchies—abstraction and implementation—which can be developed independently of each other.	Lets you compose objects into tree structures and then work with these structures as if they were individual objects.	Lets you attach new behaviors to objects by placing these objects inside special wrapper objects that contain the behaviors.	Provides a simplified interface to a library, a framework, or any other complex set of classes.	Lets you fit more objects into the available amount of RAM by sharing common parts of state between multiple objects instead of keeping all of the data in each object.	Lets you provide a substitute or placeholder for another object. A proxy controls access to the original object, allowing you to perform something either before or after the request gets through to the original object.

3. Behavioral Patterns

1. Chain of Responsibility	2. Iterator	3. Memento	4. State	5. Template Method	6. Command	7. Mediator	8. Observer	9. Strategy	10. Visitor
Lets you pass requests along a chain of handlers. Upon receiving a request, each handler decides either to process the request or to pass it to the next handler in the chain.	ets you traverse elements of a collection without exposing its underlying representation (list, stack, tree, etc.).	Lets you save and restore the previous state of an object without revealing the details of its implementation.	Lets an object alter its behavior when its internal state changes. It appears as if the object changed its class.	Defines the skeleton of an algorithm in the superclass but lets subclasses override specific steps of the algorithm without changing its structure.	Turns a request into a stand-alone object that contains all information about the request. This transformation lets you parameterize methods with different requests, delay or queue a request's execution, and support undoable operations.	Lets you reduce chaotic dependencies between objects. The pattern restricts direct communications between the objects and forces them to collaborate only via a mediator object.	Lets you define a subscription mechanism to notify multiple objects about any events that happen to the object they're observing.	Lets you define a family of algorithms, put each of them into a separate class, and make their objects interchangeable.	Lets you separate algorithms from the objects on which they operate.

DBB - Behavior Driven Development

https://behave.readthedocs.io/en/latest/philosophy.html https://behave.readthedocs.io/en/latest/tutorial.html

It extends TDD by writing test cases in a natural language that non-programmers can read.
Behavior-driven developers use their native language in combination with the ubiquitous language of domain-driven design to describe the purpose and benefit of their code.
This allows the developers to focus on why the code should be created, rather than the technical details, and
minimizes translation between the technical language in which the code is written and the domain language spoken by the business, users, stakeholders, project management, etc.
TDD vs BDD
benave vs pttest-bdd vs cucumber

class method vs static method

https://www.geeksforgeeks.org/class-method-vs-static-method-python/ https://stackabuse.com/pythons-classmethod-and-staticmethod-explained/

A class method takes cls as first parameter while a static method needs no specific parameters.
A class method can access or modify class state while a static method can’t access or modify it.
In general, static methods know nothing about class state. They are utility type methods that take some parameters and work upon those parameters. On the other hand class methods must have class as parameter.
We use @classmethod decorator in python to create a class method and we use @staticmethod decorator to create a static method in python.

PEP 8 -- Style Guide for Python Code

https://www.python.org/dev/peps/pep-0008/

Indentation Use 4 spaces per indentation level.
Tabs or Spaces? Spaces are the preferred indentation method.
Maximum Line Length Limit all lines to a maximum of 79 characters.
Should a Line Break Before or After a Binary Operator? : Before.

# Correct:
# easy to match operators with operands
income = (gross_wages
          + taxable_interest
          + (dividends - qualified_dividends)

Blank Lines
- Surround top-level function and class definitions with two blank lines.
- Method definitions inside a class are surrounded by a single blank line.
Imports Imports should usually be on separate lines:
"dunders" (i.e. names with two leading and two trailing underscores) such as all, author, version,
String Quotes In Python, single-quoted strings and double-quoted strings are the same.
String Quotes In Python, single-quoted strings and double-quoted strings are the same.
Use ''.startswith() and ''.endswith() instead of string slicing to check for prefixes or suffixes.
Object type comparisons should always use isinstance() instead of comparing types directly:
Don't compare boolean values to True or False using ==:

Built in functions

https://docs.python.org/3/library/functions.html#property

TODO

Click Package, http://zetcode.com/python/click/

Python click module is used to create command-line (CLI) applications.

import click

@click.command()
@click.option('-a', '--arg1', required=True, help='argument # 1')
def hello():
   click.echo("hello ")
   
if __name__ == '__main__':
   hello()

ABC, abstractmethod, abstractproperty

https://github.com/vivek-bombatkar/MyLearningNotes/blob/master/Python/design-patterns/all_in_one.py

from abc import ABC, abstractmethod, abstractproperty

Importing modules from parent folder

https://stackoverflow.com/questions/714063/importing-modules-from-parent-folder

import os,sys,inspect
currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(currentdir)
sys.path.insert(0,parentdir)

Name		Name	Last commit message	Last commit date
parent directory ..
clean_code		clean_code
design-patterns		design-patterns
other		other
Clean Code in Python.pdf		Clean Code in Python.pdf
README.md		README.md
__init__.py		__init__.py
python_cheatsheet.pdf		python_cheatsheet.pdf

FilesExpand file tree

Python

Directory actions

More options

Directory actions

More options

Latest commit

History

Python

Folders and files

parent directory

README.md

Python-Basics

IMP links / CheatSheets

global veriable

init.py

if name == "main":

yield

*args and **kwargs

Python Scope LEGB Rule.

The pipfile i.e. virtual environment

Deployment pipeline for python project, with Makefile

A. test module

B. build and release

python packages management

a. python egg and wheel

pipenv run pip wheel requirements.txt --only-binary all

b. Makefile

c. setup.py

Anaconda

Conda - environment manager

pyspark with CONDA

TypeError vs ValueError

package management

Unit tests [TODO]

unittest

pytest

Auto unit testing with setuptool

Logging

CI vs CD in Python

code coverage

style check

tox

Function channing

Resizing Jupyter notebook

Python Anti-Patterns

Clean Code notes...

The KeyWords

DockString vs Annotations

Context managers

EAFP (stands for Easier to Ask Forgiveness than Permission), while LBYL (stands for Look Before You Leap).

Mutable default arguments

How arguments are copied to functions

Variable number of arguments

Static Type Checking in Python 3.6

EDA with Seaborn and matplotlib

Jupyter Notebook call from command line

argparse

structlog

logdecorator

truffleHog

DESIGN PATTERNS in PYTHON

DBB - Behavior Driven Development

class method vs static method

PEP 8 -- Style Guide for Python Code

Built in functions

Click Package, http://zetcode.com/python/click/

ABC, abstractmethod, abstractproperty

Importing modules from parent folder