Skip to content

brunocampos01/understanding-the-python-ecosystem

Repository files navigation

Becoming a Expert Python

Python 3 License

This project focuses on understanding the language ecosystem, not getting into programming details.

Summary

🌄 Python's Habitat

This topic describe how to set up the environment to Python developement.


🐍 Python's Taxonomy

This topic describe features of the pattern of Python projects.


💢 Python's Behavior

This topic describe how the language is designed and how it works.


🐛 Python's Feeding

This topic describes formatting patterns following a style guide.


🔍 Python's Other Features

Extra topics to see.






Preparing the Environment for the Python

Linux

Python needs a set of tools that are system requirements. If necessary, install these requirements with this command:

sudo apt update sudo apt install\ software-properties-common\ build-essential\ libffi-dev\ python3-pip\ python3-dev\ python3-venv\ python3-setuptools\ python3-pkg-resources

Now, the environment is done to install Python

sudo apt install python

Windows

On Windows, I recommend using the package manager chocolatey and set your Powershell to can work as admin. See this tutorial.

Now, install Python

choco install python 

Test

python --version 



Check Python Configuration

Check current version

Watch
python --version

Check where installed Python

Watch
which python

Check which Python versions are installed

Watch
sudo update-alternatives --list python


Advanced settings of Python

Install multiples Python versions Sometimes you might work on different projects at the same time with different versions of Python. Normally I using Anaconda is the easiest solution, however, can there are restricted.
  1. Add repository

    Watch

    This PPA contains more recent Python versions packaged for Ubuntu.

    sudo add-apt-repository ppa:deadsnakes/ppa -y
  2. Update packeages

    sudo apt update -y
  3. Check which python version is installed

    python --version
  4. Install Python

    sudo apt install python3.<VERSION>

Change system's Python

Before installed other versions of Python it's necessary set which system's Python will be use.

  1. Use update-alternatives

    It's possible use the update-alternatives command to set priority to different versions of the same software installed in Ubuntu systems. Now, define priority of versions:

    sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1 sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.10 2 sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.8 3 sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.6 4

    In directory /usr/bin will be create simbolic link: /usr/bin/python -> /etc/alternatives/python*

  2. Choose version

    Watch
    sudo update-alternatives --config python
  3. Test

    python --version

Change Python2 to Python3

If return Python 2, try set a alias in /home/$USER/.bashrc, see this example.

alias python=python3

NOTE: The important thing to realize is that Python 3 is not backwards compatible with Python 2. This means that if you try to run Python 2 code as Python 3, it will probably break.


Set Python's Environment Variables
  • To individual project PYTHONPATH search path until module. Example: Apache Airflow read dag\ folder and add automatically any file that is in this directory.
  • To interpreter PYTHONHOME indicate standard packages.

Set PYTHONPATH
  1. Open profile

    sudo vim ~/.bashrc
  2. Insert Python PATH

    export PYTHONHOME=/usr/bin/python<NUMER_VERSION>
  3. Update profile/bashrc

    source ~/.bashrc
  4. Test

    >>> import sys >>> from pprint import pprint >>> pprint(sys.path) ['', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages']

    Example with Apache Airflow

    >>> import sys >>> from pprint import pprint >>> pprint(sys.path) ['', '/home/project_name/dags', '/home/project_name/config', '/home/project_name/utilities', ... ]



What is a virtual environment and how it works

Python can run in a virtual environment with isolation from the system.


Arquitecture of Execution

Virtualenv enables us to create multiple Python environments which are isolated from the global Python environment as well as from each other.


When Python is initiating, it analyzes the path of its binary. In a virtual environment, it's actually just a copy or Symbolic link to your system's Python binary. Next, set the sys.prefix location which is used to locate the site-packages (third party packages/libraries)


Symbolic link

  • sys.prefix points to the virtual environment directory.
  • sys.base.prefix points to the non-virtual environment.

Folder of virtual environment

ll # random.py -> /usr/lib/python3.6/random.py # reprlib.py -> /usr/lib/python3.6/reprlib.py # re.py -> /usr/lib/python3.6/re.py # ...
tree ├── bin │ ├── activate │ ├── activate.csh │ ├── activate.fish │ ├── easy_install │ ├── easy_install-3.8 │ ├── pip │ ├── pip3 │ ├── pip3.8 │ ├── python -> python3.8 │ ├── python3 -> python3.8 │ └── python3.8 -> /Library/Frameworks/Python.framework/Versions/3.8/bin/python3.8 ├── include ├── lib │ └── python3.8 │ └── site-packages └── pyvenv.cfg
Create Virtual Environment
Watch

Create virtual environment

virtualenv -p python3 <NAME_ENVIRONMENT>

Activate

source <NAME_ENVIRONMENT>/bin/activate



Package manager

Pipenv

Create and manage automatically a virtualenv for your projects, as well as adds/removes packages from your Pipfile as you install/uninstall packages. It also generates the ever-important Pipfile.lock, which is used to produce deterministic builds.

Features

  • Deterministic builds
  • Separates development and production environment packages into a single file Pipefile
  • Automatically adds/removes packages from your Pipfile
  • Automatically create and manage a virtualenv
  • Check PEP 508 requirements
  • Check installed package safety

Pipfile X requirements

# Pipfile [[source]] name = "pypi" url = "https://pypi.org/simple" verify_ssl = true [dev-packages] [packages] requests = "*" numpy = "==1.18.1" pandas = "==1.0.1" wget = "==3.2" [requires] python_version = "3.8" platform_system = 'Linux'
# requirements.txt requests matplotlib==3.1.3 numpy==1.18.1 pandas==1.0.1 wget==3.2

Install

pip3 install --user pipenv

Create Pipfile and virtual environment

  1. Create environment

    Watch
    pipenv --python 3
  2. See where virtual environment is installed

    pipenv --venv
  3. Activate environment

    pipenv run
  4. Install packages with Pipefile

    pipenv install flask # or pipenv install --dev flask
  5. Create lock file

    Watch
    pipenv lock

Python Package Index

Doc Python Package Index

Poetry

Doc Poetry

Conda

Doc Conda



Requirements File

Requirements.txt is file containing a list of items to be installed using pip install.

Principal Comands
  1. Visualize instaled packages
pip3 freeze
  1. Generate file requirements.txt
pip3 freeze > requirements.txt
  1. Test
cat requirements.txt
  1. Install packages in requirements
pip3 install -r requirements.txt



Deterministic Build

Using pip and requirements.txt file, have a real issue here is that the build isn’t deterministic. What I mean by that is, given the same input (the requirements.txt file), pip does not always produce the same environment.

pip-tools

A set of command line tools to help you keep your pip-based packages fresh.

Features

  • Distinguish direct dependencies and versions
  • Freeze a set of exact packages and versions that we know work
  • Make it reasonably easy to update packages
  • Take advantage of pip's hash checking to give a little more confidence that packages haven't been modified (DNS attack)
  • Stable
Principal Comands
  1. Install
pip install pip-tools 
  1. Get packages's version
pip3 freeze > requirements.in
  1. Generate hashes and list dependeces
pip-compile --generate-hashes requirements.in

output: requirements.txt

  1. Install packages and hash checking
pip-compile --generate-hashes requirements.in



Compiler and interpreter

CPython can be defined as both an interpreter and a compiler.

  • The compiler converts the .py source file into a .pyc bytecode for the Python virtual machine.
  • The interpreter executes this bytecode on the virtual machine.


CPython's Design

The principal feature of CPython, is that it makes use of a global interpreter lock (GIL). This is a mechanism used in computer-language interpreters to synchronize the execution of threads so that only one native thread can execute at a time.
Therefore, for a CPU-bound task in Python, single-process multi-thread Python program would not improve the performance. However, this does not mean multi-thread is useless in Python. For a I/O-bound task in Python, multi-thread could be used to improve the program performance.

Multithreading in Python The Python has multithreads despite the GIL. Using Python threading, we are able to make better use of the CPU sitting idle when waiting for the I/O bound, how memory I/O, hard drive I/O, network I/O.

This can happen when multiple threads are servicing separate clients. One thread may be waiting for a client to reply, and another may be waiting for a database query to execute, while the third thread is actually processing Python code or other example is read multiples images from disk.

NOTE: we would have to be careful and use locks when necessary. Lock and unlock make sure that only one thread could write to memory at one time, but this will also introduce some overhead.


Community Consensus

Removing the GIL would have made Python 3 slower in comparison to Python 2 in single-threaded performance. Other problem if remove the GIL it's would broke the existing C extensions which depend heavily on the solution that the GIL provides.
Although many proposals have been made to eliminate the GIL, the general consensus has been that in most cases, the advantages of the GIL outweigh the disadvantages; in the few cases where the GIL is a bottleneck, the application should be built around the multiprocessing structure.




How Python runs a program

  1. Tokenize the source code: Parser/tokenizer.c
  2. Parse the stream of tokens into an Abstract Syntax Tree (AST): Parser/parser.c
  3. Transform AST into a Control Flow Graph: Python/compile.c
  4. Emit bytecode based on the Control Flow Graph: Python/compile.c


How Python search path module

When Python executes this statement:

import my_lib

The interpreter searches my_lib.py a list of directories assembled from the following sources:

  • Current directory
  • The list of directories contained in the PYTHONPATH environment variable
  • In directory which Python was is installed. E.g.

The resulting search can be accessed using the sys module:

import sys sys.paths # ['', '/usr/lib/python38.zip',  # '/usr/lib/python3.8', # '/usr/lib/python3.8/lib-dynload', # '/home/campos/.local/lib/#python3.8/site-packages', # '/usr/local/lib/python3.8/dist-packages', # '/usr/lib/python3/dist-packages']

Now, to see where a packeage was imported from you can use the attribute __file__:

import zipp zipp.__file__ # '/usr/lib/python3/dist-packages/zipp.py'

NOTE: you can see that the __file__ directory is in the list of directories searched by the interpreter.



How Python manages process and threads

TODO



How Python manages memory

TODO



How to deeply understand Python code execution










Best Pratices

"Readability counts"

Identation and Length

  • 4 spaces
  • Limit all lines to a maximum 72 characteres to docstring or comments
  • Limit all lines to a maximum 79 characteres to code
# Aligned with opening delimiter. foo = long_function_name(var_one=0.0, var_two=0.0, var_three=0.0, var_four=0.0)

Naming Convention

  • Class Name (camelCase): CapWords()
  • Variables (snack_case): cat_words
  • Constants: MAX_OVERFLOW
Line Break After a Binary Operator
income = (gross_wages + taxable_interest + (dividends - qualified_dividends) - ira_deduction - student_loan_interest)
Encoding

By default: UTF-8

# -*- coding: UTF-8 -*- <code>
Strings ' ' and " "

Single quotation marks and strings with double quotation marks are the same.

Comments #
  • First word need upper case.
  • Comments in-line separete by 2 spaces.
x = x + 1 # Compensar borda
Imports

Following order:

  1. Standard library imports.
  2. Related third party imports. (parte de terceiros)
  3. Local application/library specific imports.
import argparse import configparser import os import mysql.connector import my_module

Yes:

import os import sys

No:

import os, sys

No problems:

from subprocess import Popen, PIPE
Dunders to Documentation
__version__ = '0.1' __author__ = 'Bruno Campos'
String Concatenation
  • Use ''.join(), to concatenate 3 or more:
os.path.dirname.join(stringA + stringB + stringC + stringD)
  • This optimization is fragile even in CPython. Not use:
stringA + stringB + stringC + stringD
String Methods
  • Use string methods instead of the string module because, String methods are always much faster.
  • Use ''.startswith() and ''.endswith() instead of string slicing to check for prefixes or suffixes.
Yes: if foo.startswith('bar'): No: if foo[:3] == 'bar':
Exception

Limit the clausule try: minimal code necessary.

Yes:

try: value = collection[key] except KeyError: return key_not_found(key) else: return handle_value(value)

No:

try: # Too broad! return handle_value(collection[key]) except KeyError: # Will also catch KeyError raised by handle_value() return key_not_found(key)
  • Objetivo de responder à pergunta "O que deu errado?" programaticamente, em vez de apenas afirmar que "Ocorreu um problema"
Return

"Should explicitly state this as return None"

  • Be consistent in return statements.
  • Todas as instruções de retorno em uma função devem retornar uma expressão ou nenhuma delas deve.

Yes:

def foo(x): if x >= 0: return math.sqrt(x) else: return None

No:

def foo(x): if x >= 0: return math.sqrt(x)
Type Comparisons
  • Always use isinstance()
Yes: if isinstance(obj, int): No: if type(obj) is type(1):

Annotation Functions

"Don’t use comments to specify a type, when you can use type annotation."

  • Atua como um linter (analisador de código para mostrar erros) muito poderoso.
  • O Python não atribui nenhum significado a essas anotações.
  • Examples:

Method arguments and return values

def func(a: int) -> List[int]:
def hello_name(name: str) -> str: return (f'Hello' {name}')

Declare the type of a variable (type hints)

a = SomeFunc() # type: SomeType

Isso informa que o tipo esperado do argumento de nome é str . Analogicamente, o tipo de retorno esperado é str .

Type Hints
def send_email(address, # type: Union[str, List[str]] sender, # type: str cc, # type: Optional[List[str]] bcc, # type: Optional[List[str]] subject='', body=None # type: List[str] ): """Send an email message. Return True if successful.""" <code>

TODO

References

Docstrings

  • Docstrings must have:
    • Args
    • Returns
    • Raises

Simple Example

def say_hello(name): """  A simple function that says hello...  Richie style  """ print(f"Hello {name}, is it me you're looking for?")

Example partner Google

def fetch_bigtable_rows(big_table, keys, other_silly_variable=None): """Fetches rows from a Bigtable.   Retrieves rows pertaining to the given keys from the Table instance  represented by big_table. Silly things may happen if  other_silly_variable is not None.   Args:  big_table: An open Bigtable Table instance.  keys: A sequence of strings representing the key of each table row  to fetch.  other_silly_variable: Another optional variable, that has a much  longer name than the other args, and which does nothing.   Returns:  A dict mapping keys to the corresponding table row data  fetched. Each row is represented as a tuple of strings. For  example:   {'Serak': ('Rigel VII', 'Preparer'),  'Zim': ('Irk', 'Invader'),  'Lrrr': ('Omicron Persei 8', 'Emperor')}   If a key from the keys argument is missing from the dictionary,  then that row was not found in the table.   Raises:  IOError: An error occurred accessing the bigtable.Table object.  """ return None

__doc__

Such a docstring becomes the __doc__ special attribute of that object.

  • Simple Example
print(say_hello.__doc__) # A simple function that says hello... Richie style
  • Example partner Google

help()
  • Create manual: man
  • Is a built-in function help() that prints out the objects docstring.
>>> help(say_hello) Help on function say_hello in module __main__: # say_hello(name) # A simple function that says hello... Richie style

Scripts with Docstrings
  • Docstrings must show how to use script
  • Must doc:
    • Usage: sintax command line
    • Examples
    • Arguments required and optional
""" Example of program with many options using docopt. Usage:  options_example.py [-hvqrf FILE PATH]  my_program tcp <host> <port> [--timeout=<seconds>]  Examples:  calculator_example.py 1 + 2 + 3 + 4 + 5  calculator_example.py 1 + 2 '*' 3 / 4 - 5 # note quotes around '*'  calculator_example.py sum 10 , 20 , 30 , 40  Arguments:  FILE input file  PATH out directory  Options:  -h --help show this help message and exit  --version show version and exit  -v --verbose print status messages  -q --quiet quiet mode  -f --force  -t, --timeout TIMEOUT set timeout TIMEOUT seconds  -a, --all List everything.  """ from docopt import docopt if __name__ == '__main__': arguments = docopt(__doc__, version='1.0.0rc2') print(arguments)
Functions with Docstrings

A docstring to a function or method must resume:

  • behavior
  • arguments required
  • arguments optional
  • default value of arguments
  • returns
  • raise Exceptions

Example

def says(self, sound=None): """Prints what the animals name is and what sound it makes.   If the argument `sound` isn't passed in, the default Animal  sound is used.   Parameters  ----------  sound : str, optional  The sound the animal makes (default is None)   Raises  ------  NotImplementedError  If no sound is set for the animal or passed in as a parameter.  """ if self.sound is None and sound is None: raise NotImplementedError("Silent Animals are not supported!") out_sound = self.sound if sound is None else sound print(self.says_str.format(name=self.name, sound=out_sound))
Class with Docstrings

A docstring para uma classe deve resumir seu comportamento e listar os métodos públicos e variáveis ​​de instância. Se a classe se destina a ser uma subclasse e possui uma interface adicional para subclasses, essa interface deve ser listada separadamente (no docstring). O construtor de classe deve ser documentado na docstring para seu método init . Os métodos individuais devem ser documentados por seus próprios docstring.

Example

class SimpleClass: """Class docstrings go here.""" def say_hello(self, name: str): """Class method docstrings go here.""" print(f'Hello {name}')

Class docstrings should contain the following information:

  • A brief summary of its purpose and behavior
  • Any public methods, along with a brief description
  • Any class properties (attributes)
  • Anything related to the interface for subclassers, if the class is intended to be subclassed
References

Methods with numerous parameters

Methods with numerous parameters are a challenge to maintain, especially if most of them share the same datatype.
These situations usually denote the need for new objects to wrap the numerous parameters.

Example(s):

  • too many arguments
def add_person(birthYear: int, birthMonth: int, birthDate: int, height: int, weight: int, ssn: int): '''too many arguments''' . . .
  • preferred approach
def add_person(birthdate: 'Date', measurements: 'BodyMeasurements', ssn: int): '''preferred approach''' . . .

Cyclomatic Complexity

cyclomatic complexity counts the number of decision points in a method




Interview Questions on Virtual Environment
  1. What is virtual environment in Python?
  2. How to create and use a virtual environment in Python?
  3. How do Python virtual environments work?


References


Gmail Stackoverflow LinkedIn GitHub Medium Creative Commons License