| Both sides previous revisionPrevious revisionNext revision | Previous revision | 
| other:python:misc_by_jyp [2022/02/21 17:31]  – [numpy related stuff] Added ufuncs jypeter | other:python:misc_by_jyp [2025/08/29 11:17] (current)  – [Extra tutorials] Added the "Stats stuff" category jypeter | 
|---|
| </WRAP> | </WRAP> | 
|  |  | 
| ==== Reading/setting environments variables ==== | ===== Extra tutorials ===== | 
|  |  | 
|  | Only **when you have already read all the content of this page several times**, and you are looking for new ideas | 
|  |  | 
|  | * [[https://medium.com/data-science/calculating-distance-between-two-geolocations-in-python-26ad3afe287b|Calculating distance between two geo-locations in Python]]: | 
|  | * ''[[https://github.com/mapado/haversine|haversine]]'', ''[[https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.haversine_distances.html|haversine_distances]] @ scikit-learn'' and [[https://en.wikipedia.org/wiki/Haversine_formula|Haversine formula]] | 
|  | * Looking at table data with ''pandas'' | 
|  | * [[https://blog.devgenius.io/data-profiling-in-python-common-ways-to-explore-your-data-part-1-0efd0dedff75|Summary information]] | 
|  | * [[https://blog.devgenius.io/data-profiling-in-python-common-ways-to-explore-your-data-part-2-396384522e91|More detailed information]] | 
|  | * [[https://blog.devgenius.io/data-cleansing-in-python-common-ways-to-clean-your-data-3459a256dd85|Table data cleaning]] | 
|  | * Stats stuff | 
|  | * [[https://medium.com/@tubelwj/python-outlier-detection-iqr-method-and-z-score-implementation-8e825edf4b32|Python Outlier Detection: IQR Method and Z-score Implementation]] | 
|  | * [[https://medium.com/pythons-gurus/clean-code-in-python-good-vs-bad-practices-examples-2df344bddacc|Clean Code in Python: Good vs. Bad Practices Examples]] | 
|  | * [[https://peps.python.org/pep-0008/|PEP 8 – Style Guide for Python Code]] | 
|  | * [[https://realpython.com/python-pep8/|How to Write Beautiful Python Code With PEP 8]] | 
|  | * [[https://www.datacamp.com/tutorial/pep8-tutorial-python-code|PEP-8 Tutorial: Code Standards in Python]] | 
|  | * Some checkers/linters: [[https://docs.astral.sh/ruff/|ruff]], [[https://flake8.pycqa.org/en/stable/|flake8]] | 
|  | * [[https://medium.com/@yaduvanshineelam09/ultimate-python-cheat-sheet-practical-python-for-everyday-tasks-8a33abc0892f|Ultimate Python Cheat Sheet: Practical Python For Everyday Tasks]] | 
|  | * [[https://medium.com/pythoneers/16-hacks-that-will-take-your-python-skills-to-the-next-level-12e7a9b97421|16 Hacks That Will Take Your Python Skills to the Next Level]] | 
|  | * [[https://levelup.gitconnected.com/modular-coding-in-python-finally-solve-your-import-errors-af2fd172fcf7|Modular Coding in Python: Finally Solve your Import Errors]] (understanding and fixing ModuleNotFoundError and ImportError) | 
|  | * [[https://medium.com/@moraneus/understanding-multithreading-and-multiprocessing-in-python-1ed39bb078d5|Understanding Multithreading and Multiprocessing in Python]] | 
|  | ===== Reading/setting environments variables ===== | 
|  |  | 
| <code>>>> os.environ['TMPDIR'] | <code>>>> os.environ['TMPDIR'] | 
| </code> | </code> | 
|  |  | 
| ==== Generating (aka raising) an error ==== |  | 
|  | ===== Generating (aka raising) an error ===== | 
|  |  | 
| This will stop the script, unless it is called in a function, and the code calling the function explicitely catches and deals with errors | This will stop the script, unless it is called in a function, and the code calling the function explicitely catches and deals with errors | 
|  |  | 
|  |  | 
| ==== Stopping a script ==== | ===== Using log files (aka logging) ===== | 
|  |  | 
|  | It is always possible to display information messages using the ''print()'' command, but it is more efficient to use //logging// tools when you want to **display correctly a lot of information about a script progress | 
|  | ** | 
|  | * [[https://loguru.readthedocs.io/|Loguru]] is a library which aims to bring enjoyable logging in Python | 
|  | * See also [[https://betterstack.com/community/guides/logging/loguru/|A Complete Guide to Logging in Python with Loguru]] | 
|  | * More on [[https://betterstack.com/community/guides/logging/#python|logging with python]] | 
|  | * The default (but not easy to use) Python ''[[https://docs.python.org/3/library/logging.html|logging]]'' module | 
|  | ===== Stopping a script ===== | 
|  |  | 
| A user can use ''CTRL-C'' or ''kill'' to stop a script, or ''CTRL-Z'' to suspend it temporarily (use ''fg'' to resume a suspended script). The code below can be used by the script itself to interrupt its execution, instead of raising an error | A user can use ''CTRL-C'' or ''kill'' to stop a script, or ''CTRL-Z'' to suspend it temporarily (use ''fg'' to resume a suspended script). The code below can be used by the script itself to interrupt its execution, instead of raising an error | 
|  |  | 
| <code>sys.exit('Some optional message about why we are stopping')</code> | <code>sys.exit('Some optional message about why we are stopping')</code> | 
|  | ===== Checking if a file/directory is writable by the current user ===== | 
|  |  | 
| ==== Checking if a file/directory is writable by the current user ==== |  | 
|  |  | 
| <code>>>> os.access('/', os.W_OK) | <code>>>> os.access('/', os.W_OK) | 
| >>> os.access('/home/jypmce/.bashrc', os.W_OK) | >>> os.access('/home/jypmce/.bashrc', os.W_OK) | 
| True</code> | True</code> | 
|  |  | 
|  |  | 
|  | ===== Playing with strings ===== | 
|  |  | 
|  | ==== String formatting ==== | 
|  |  | 
|  | * Knowing how to display/print a string correctly is always useful for information and debugging purpose | 
|  | * There are lots of different ways to display strings | 
|  |  | 
|  | === String formatting examples === | 
|  |  | 
|  | You will find below some examples of //quick printing//, as well as using //old style formatting//, //formatted string literals (f-strings)// and the //String ''format()'' Method//. More details in the next section | 
|  |  | 
|  | <code python> | 
|  | >>> # Basic (but quick and efficient) printing | 
|  |  | 
|  | >>> year = 1984 | 
|  | >>> print(year) | 
|  | 1984 | 
|  | >>> print('[', year, 'is a famous book ]') | 
|  | [ 1984 is a famous book ] | 
|  |  | 
|  | >>> # Old style formatting | 
|  |  | 
|  | >>> print('[ %i is a famous book ]' % (year,)) | 
|  | [ 1984 is a famous book ] | 
|  | >>> print('[ %10i is a famous book ]' % (year,)) | 
|  | [       1984 is a famous book ] | 
|  | >>> print('[ %-10i is a famous book ]' % (year,)) | 
|  | [ 1984       is a famous book ] | 
|  | >>> print('[ %010i is a famous book ]' % (year,)) | 
|  | [ 0000001984 is a famous book ] | 
|  |  | 
|  | >>> # Formatted string literals (f-strings) | 
|  |  | 
|  | >>> print(f'[ {year} is a famous book ]') | 
|  | [ 1984 is a famous book ] | 
|  | >>> print(f'[ {year=} is a famous book ]') | 
|  | [ year=1984 is a famous book ] | 
|  | >>> print(f'[ {year:10} is a famous book ]') | 
|  | [       1984 is a famous book ] | 
|  | >>> print(f'[ {year:<10} is a famous book ]') | 
|  | [ 1984       is a famous book ] | 
|  | >>> print(f'[ {year:010} is a famous book ]') | 
|  | [ 0000001984 is a famous book ] | 
|  | >>> print(f'[ {year:10.2f} is a famous book (yes, {year}!) ]') | 
|  | [    1984.00 is a famous book (yes, 1984!) ] | 
|  |  | 
|  | >>> # The String format() Method | 
|  |  | 
|  | >>> print('[ {} is a famous book ]'.format(year)) | 
|  | [ 1984 is a famous book ] | 
|  | >>> print('[ {:10} is a famous book ]'.format(year)) | 
|  | [       1984 is a famous book ] | 
|  | >>> print('[ {:<10} is a famous book ]'.format(year)) | 
|  | [ 1984       is a famous book ] | 
|  | >>> print('[ {:010} is a famous book ]'.format(year)) | 
|  | [ 0000001984 is a famous book ] | 
|  | >>> print('[ {:10.2f} is a famous book  (yes, {}!) ]'.format(year, year)) | 
|  | [    1984.00 is a famous book  (yes, 1984!) ] | 
|  | >>> print('[ {title:10.2f} is a famous book  (yes, {title}!) ]'.format(title=year)) | 
|  | [    1984.00 is a famous book  (yes, 1984!) ] | 
|  | >>> print('[ {title:10.2e} is a famous book ]'.format(title=year)) | 
|  | [   1.98e+03 is a famous book ]</code> | 
|  |  | 
|  | === String formatting references === | 
|  |  | 
|  | * [[https://docs.python.org/3/tutorial/inputoutput.html#formatted-string-literals|Formatted String Literals]] (//f-strings//) | 
|  | * Available in Python >= 3.6 | 
|  | * [[https://docs.python.org/3/reference/lexical_analysis.html#f-strings|More documentation]] | 
|  | * [[https://docs.python.org/3/library/string.html#formatspec|Format Specification Mini-Language]] | 
|  | * See also the [[https://pyformat.info/|PyFormat site]] | 
|  |  | 
|  | * [[https://docs.python.org/3/tutorial/inputoutput.html#the-string-format-method|The String format() Method]] | 
|  | * [[https://docs.python.org/3/library/string.html#formatspec|Format Specification Mini-Language]] | 
|  | * See also the [[https://pyformat.info/|PyFormat site]] | 
|  |  | 
|  | * [[https://pyformat.info/|PyFormat site]]: string formatting using the //old style// and the //String ''format()'' method// | 
|  | * <wrap hi>Hint</wrap>: this can also be used as an **easy documentation for //f-strings// format**! | 
|  |  | 
|  | * [[https://docs.python.org/3/tutorial/inputoutput.html#the-string-format-method|Old string formatting]] | 
|  | ==== Splitting (complex) strings ==== | 
|  |  | 
|  | It's easy to split a string with multiple blank delimiters, or a specific delimiter, but it can be harder to deal with sub-strings | 
|  |  | 
|  | <code>>>> str_with_blanks = 'one    two\t3\t\tFOUR' | 
|  | >>> str_with_blanks.split() | 
|  | ['one', 'two', '3', 'FOUR'] | 
|  |  | 
|  | >>> str_with_simple_delimiters = '1,2,3.14,  4' | 
|  | >>> str_with_simple_delimiters.split(',') | 
|  | ['1', '2', '3.14', '  4'] | 
|  |  | 
|  | >>> complex_string='-o 1 --long "A string with accented chars: é è à ç"' | 
|  | >>> complex_string.split() | 
|  | ['-o', '1', '--long', '"A', 'string', 'with', 'accented', 'chars:', '\xc3\xa9', '\xc3\xa8', '\xc3\xa0', '\xc3\xa7"'] | 
|  |  | 
|  | >>> import shlex | 
|  | >>> shlex.split(complex_string) | 
|  | ['-o', '1', '--long', 'A string with accented chars: \xc3\xa9 \xc3\xa8 \xc3\xa0 \xc3\xa7']</code> | 
|  |  | 
|  |  | 
| ==== Working with paths and filenames ==== | ==== Working with paths and filenames ==== | 
|  |  | 
| If you are in a hurry, you can just use string functions to work with path and file names. But you will need some specific functions to check if a file exists, and similar operations. All these are available in 2 libraries that have similar functions. Both of these libraries can deal with Unix-type paths on Linux computers, and Windows-type paths on Windows computers | If you are in a hurry, you can just use string functions to work with paths and file names. | 
|  |  | 
| * [[https://docs.python.org/3/library/os.path.html|os.path]] //Common pathname manipulations// |  | 
|  | You will need some specific objects and functions to check if a file exists, and similar operations. Check the libraries listed below, that can automatically deal with Unix-type paths on Linux and MacOS computers, and Windows-type paths on Windows computers | 
|  |  | 
|  | * [[https://docs.python.org/3/library/os.path.html|os.path]]: //common pathname manipulations// | 
| * Available since... a long time! Use this if you want to avoid backward compatibility problems | * Available since... a long time! Use this if you want to avoid backward compatibility problems | 
| * Some functions are directly in [[https://docs.python.org/3/library/os.html|os]] //Miscellaneous operating system interfaces//\\ e.g. [[https://docs.python.org/3/library/os.html#os.remove|os.remove]] and [[https://docs.python.org/3/library/os.html#os.rmdir|os.rmdir]] | * Some functions are directly in [[https://docs.python.org/3/library/os.html|os]] //Miscellaneous operating system interfaces//\\ e.g. [[https://docs.python.org/3/library/os.html#os.remove|os.remove]] and [[https://docs.python.org/3/library/os.html#os.rmdir|os.rmdir]] | 
| * [[https://docs.python.org/3/library/pathlib.html|pathlib]] //Object-oriented filesystem paths// | * [[https://docs.python.org/3/library/pathlib.html|pathlib]]: a **more recent** //object-oriented// way to deal with //filesystem paths// | 
| * Available since Python version 3.4 | * Available since Python version 3.4 | 
| * [[https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]] | * [[https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]] | 
| * [[https://docs.python.org/3/library/shutil.html|High-level file operations]] | * [[https://docs.python.org/3/library/shutil.html|shutil]]: High-level file operations, e.g copy/move a file or directory tree | 
|  |  | 
|  |  | 
| === Example: getting the full path of the Python used === | === Example: getting the full path of the Python executable used === | 
|  |  | 
| Note: the actual python may be different from the default python! | Note: the actual python may be different from the default python! | 
| /usr/bin/python | /usr/bin/python | 
|  |  | 
| $ /modfs/modtools/miniconda3//envs/analyse_3.6_test/bin/python | $ /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/bin/python | 
| >>> import sys, shutil | >>> import sys, shutil | 
| >>> shutil.which('python') | >>> shutil.which('python') | 
| '/usr/bin/python' | '/usr/bin/python' | 
| >>> sys.executable | >>> sys.executable | 
| '/modfs/modtools/miniconda3//envs/analyse_3.6_test/bin/python'</code> | '/home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/bin/python'</code> | 
|  |  | 
|  |  | 
| </code> | </code> | 
|  |  | 
|  |  | 
|  | === Example: system independent paths with pathlib === | 
|  |  | 
|  | Note: the following example was generated on a Linux server and uses a <wrap em>/</wrap> character as a path separator | 
|  |  | 
|  | <code>>>> my_home = Path.home() | 
|  | >>> my_home | 
|  | PosixPath('/home/users/my_login') | 
|  | >>> my_conf = my_home / '.config' / 'evince' | 
|  | >>> my_conf | 
|  | PosixPath('/home/users/my_login/.config/evince') | 
|  | >>> my_conf.is_dir() | 
|  | True | 
|  | >>> my_conf.is_file() | 
|  | False | 
|  | >>> list(my_conf.glob('*')) | 
|  | [PosixPath('/home/users/my_login/.config/evince/evince_toolbar.xml'), PosixPath(' /home/users/my_login/.config/evince/accels')] | 
|  | >>> [ ff.name for ff in my_conf.glob('*') ] | 
|  | ['evince_toolbar.xml', 'accels'] | 
|  | </code> | 
|  |  | 
| === Example: getting the size(s) of all the files in a directory === | === Example: getting the size(s) of all the files in a directory === | 
| >>> f_tmp.close() | >>> f_tmp.close() | 
| >>> os.remove(f_tmp.name)</code> | >>> os.remove(f_tmp.name)</code> | 
| ==== Using command-line arguments ==== |  | 
|  |  | 
| === The extremely easy but non-flexible way: sys.argv === |  | 
|  | ===== Using command-line arguments ===== | 
|  |  | 
|  | ==== The extremely easy but non-flexible way: sys.argv ==== | 
|  |  | 
| The name of a script, the number of arguments (including the name of the script), and the arguments (as strings) can be accessed through the ''sys.argv'' strings' list | The name of a script, the number of arguments (including the name of the script), and the arguments (as strings) can be accessed through the ''sys.argv'' strings' list | 
| 2 tas_tes.nc</code> | 2 tas_tes.nc</code> | 
|  |  | 
| === The C-style way: getopt === |  | 
|  | ==== The C-style way: getopt ==== | 
|  |  | 
| Use [[https://docs.python.org/3/library/getopt.html|getopt]] (//C-style parser for command line options//) | Use [[https://docs.python.org/3/library/getopt.html|getopt]] (//C-style parser for command line options//) | 
|  |  | 
| === The deprecated Python way: optparse === |  | 
|  | ==== The deprecated Python way: optparse ==== | 
|  |  | 
| [[https://docs.python.org/3/library/optparse.html|optparse]] (//parser for command line options//) is **deprecated since Python version 3.2**! You should now use argparse (check [[https://docs.python.org/3/library/argparse.html#upgrading-optparse-code|Upgrading optparse code]] for converting from ''optparse'' to ''argparse'') | [[https://docs.python.org/3/library/optparse.html|optparse]] (//parser for command line options//) is **deprecated since Python version 3.2**! You should now use argparse (check [[https://docs.python.org/3/library/argparse.html#upgrading-optparse-code|Upgrading optparse code]] for converting from ''optparse'' to ''argparse'') | 
|  |  | 
| === The current Python way: argparse === |  | 
|  | ==== The current Python way: argparse ==== | 
|  |  | 
| [[https://docs.python.org/3/library/argparse.html|argparse]] (//parser for command-line options, arguments and sub-commands//) is available since Python version 3.2 | [[https://docs.python.org/3/library/argparse.html|argparse]] (//parser for command-line options, arguments and sub-commands//) is available since Python version 3.2 | 
|  |  | 
| ==== Using ordered dictionaries ==== |  | 
|  | ===== Using ordered dictionaries ===== | 
|  |  | 
| **Dictionary order is guaranteed to be insertion order**! Note that the [[https://docs.python.org/3/library/stdtypes.html#dict|usual Python dictionary]] also guarantees the order since version **3.6** | **Dictionary order is guaranteed to be insertion order**! Note that the [[https://docs.python.org/3/library/stdtypes.html#dict|usual Python dictionary]] also guarantees the order since version **3.6** | 
| Check the [[https://docs.python.org/3/library/collections.html#collections.OrderedDict|OrderedDict class]] (''from collections import OrderedDict'') and the [[https://realpython.com/python-ordereddict/|OrderedDict vs dict in Python: The Right Tool for the Job]] tutorial | Check the [[https://docs.python.org/3/library/collections.html#collections.OrderedDict|OrderedDict class]] (''from collections import OrderedDict'') and the [[https://realpython.com/python-ordereddict/|OrderedDict vs dict in Python: The Right Tool for the Job]] tutorial | 
|  |  | 
| ==== Using sets ==== |  | 
|  | ===== Using sets ===== | 
|  |  | 
| [[https://docs.python.org/3/tutorial/datastructures.html#sets|Python sets]] are **groups of unique elements**. They can be used to easily find all the unique elements of //something// and you can easily determine the **intersection**, **union** (and other similar operations) of sets. | [[https://docs.python.org/3/tutorial/datastructures.html#sets|Python sets]] are **groups of unique elements**. They can be used to easily find all the unique elements of //something// and you can easily determine the **intersection**, **union** (and other similar operations) of sets. | 
|  |  | 
| ==== Printing a readable version of long lists or dictionaries ==== |  | 
|  | ===== Printing a readable version of long lists or dictionaries ===== | 
|  |  | 
| The [[https://docs.python.org/3/library/pprint.html|pprint]] module can be used for //pretty printing// objects (lists, dictionaries, ...). It will wrap long lines in a meaningful way | The [[https://docs.python.org/3/library/pprint.html|pprint]] module can be used for //pretty printing// objects (lists, dictionaries, ...). It will wrap long lines in a meaningful way | 
| </code> | </code> | 
|  |  | 
| ==== Sorting ==== |  | 
|  | ===== Storing objects and data in a file (shelve and friends) ===== | 
|  |  | 
|  | The built-in [[other:python:jyp_steps#the_shelve_package|shelve]] module can be **easily** used for storing temporary/intermediate data | 
|  |  | 
|  | More options: | 
|  | * Some [[other:python:jyp_steps#data_file_formats|non-NetCDF]] file formats | 
|  | * Working with [[other:python:jyp_steps#netcdf_filesusing_cdms2_xarray_and_netcdf4|NetCDF]] files | 
|  |  | 
|  |  | 
|  | ===== Using a configuration file ===== | 
|  |  | 
|  | The built-in [[https://docs.python.org/3/library/configparser.html|configparser]] module can be easily used for reading (**and** writing!) text configuration files. | 
|  |  | 
|  | Note: a configuration file is also a way to easily store and exchange text data ! | 
|  |  | 
|  |  | 
|  | ===== Working with global variables ===== | 
|  |  | 
|  | There is a good chance you don't actually want/need a //global// variable. Be sure to use the ''global'' statement correctly if you want to avoid side-effects... | 
|  |  | 
|  | * [[https://docs.python.org/3/faq/programming.html?highlight=global#why-am-i-getting-an-unboundlocalerror-when-the-variable-has-a-value|Using (and changing) a global variable inside a script or module]] | 
|  | * Simple module example\\ <code>_myvar = 10 | 
|  |  | 
|  | def set_myvar(new_val): | 
|  | # Note: need to explicitly define a global variable (of a module) | 
|  | # as 'global' BEFORE changing its value in a function! | 
|  | # Otherwise, the value will not be REdefined outside the function | 
|  | global _myvar | 
|  | _myvar = new_val | 
|  |  | 
|  | def get_myvar(): | 
|  | return _myvar | 
|  |  | 
|  | def myfunc(nb_repeat = 10): | 
|  | print(nb_repeat * _myvar)</code> | 
|  | * [[https://docs.python.org/3/faq/programming.html?highlight=global#how-do-i-share-global-variables-across-modules|Sharing global variables across modules]] | 
|  | ===== Sorting ===== | 
|  |  | 
| * When dealing with **numerical values**, you should use the [[https://numpy.org/doc/stable/reference/routines.sort.html|numpy sorting, searching, and counting routines]]! | * When dealing with **numerical values**, you should use the [[https://numpy.org/doc/stable/reference/routines.sort.html|numpy sorting, searching, and counting routines]]! | 
| ['c', 'd', 'b', 'a']</code> | ['c', 'd', 'b', 'a']</code> | 
|  |  | 
| ==== numpy related stuff ==== |  | 
|  |  | 
| === Finding and counting unique values === | ===== Efficient looping with numpy, map, itertools and list comprehension ===== | 
|  |  | 
|  | <wrap hi>Big, nested, explicit ''for'' loops should be avoided at all cost</wrap>, in order to reduce a script execution time! | 
|  |  | 
|  | * **''numpy'' arrays** should be used when dealing with //numerical data// | 
|  | * **Masked arrays** can be used to deal with //special cases// and remove tests from loops | 
|  |  | 
|  | * The built-in [[https://docs.python.org/3/library/functions.html?highlight=map#map|map]] function (and similar functions like [[https://docs.python.org/3/library/functions.html?highlight=zip#zip|zip]], [[https://docs.python.org/3/library/functions.html?highlight=filter#filter|filter]], ...) can be used to efficiently apply a function (possibly a //simple// [[https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions|lambda]] function) to all the elements of a list | 
|  | * <code>>>> my_ints = [1, 2, 3] | 
|  |  | 
|  | >>> map(str, my_ints) | 
|  | ['1', '2', '3'] | 
|  |  | 
|  | >>> map(lambda ii: str(10*ii + 5), my_ints) | 
|  | ['15', '25', '35']</code> | 
|  |  | 
|  | * The [[https://docs.python.org/3/library/itertools.html|itertools]] module defines many more fancy iterators that can be used for efficient looping | 
|  | * Example: replacing nested loops with [[https://docs.python.org/3/library/itertools.html#itertools.product|product]] | 
|  | * <code>>>> it.product('AB', '01') | 
|  | <itertools.product object at 0x2b35a7b5f100> | 
|  |  | 
|  | >>> list(it.product('AB', '01')) | 
|  | [('A', '0'), ('A', '1'), ('B', '0'), ('B', '1')] | 
|  |  | 
|  | >>> for c1, c2 in it.product('AB', '01'): | 
|  | ...   print(c1 + c2) | 
|  | ... | 
|  | A0 | 
|  | A1 | 
|  | B0 | 
|  | B1 | 
|  |  | 
|  | >>> for c1, c2 in it.product(['A', 'B'], ['0', '1']): | 
|  | ...   print(c1 + c2) | 
|  | ... | 
|  | A0 | 
|  | A1 | 
|  | B0 | 
|  | B1 | 
|  |  | 
|  | >>> for c1, c2, c3 in it.product('AB', '01', '$!'): | 
|  | ...   print(c1 + c2 + c3, end=', ') | 
|  | ... | 
|  | A0$, A0!, A1$, A1!, B0$, B0!, B1$, B1!,</code> | 
|  |  | 
|  | * The [[https://docs.python.org/3/tutorial/datastructures.html?highlight=comprehension#list-comprehensions|list comprehension]] (aka //implicit loops//) can also be used to generate lists from lists | 
|  | * Example: converting a list of integers to a list of strings\\ Note: in that case, you should rather use the ''map'' function detailed above | 
|  | * <code>>>> my_ints = [1, 2, 3] | 
|  |  | 
|  | >>> [ str(ii) for ii in my_ints ] | 
|  | ['1', '2', '3']</code> | 
|  | ===== numpy related stuff ===== | 
|  |  | 
|  | ==== Using a numpy array to store arbitrary objects ==== | 
|  |  | 
|  | The numpy arrays are usually used to store [[https://numpy.org/doc/stable/reference/arrays.scalars.html|scalars]] of the same type (see also the [[https://numpy.org/doc/stable/reference/arrays.dtypes.html|Data type objects (dtype)]]), very often numerical values. | 
|  |  | 
|  | It is also possible to store **arbitrary** Python objects in an array, rather than using nested lists or dictionaries! | 
|  |  | 
|  | <code>>>> some_array = np.empty((2, 3), dtype=object) | 
|  | >>> some_array | 
|  | array([[None, None, None], | 
|  | [None, None, None]], dtype=object) | 
|  | >>> some_array.shape | 
|  | (2, 3) | 
|  | >>> print(some_array[-1, -1]) | 
|  | None | 
|  | >>> some_array[-1, 0] = filled_contour # e.g. save an existing cartopy filled contour object | 
|  | >>> some_array | 
|  | array([[None, None, None], | 
|  | [<cartopy.mpl.contour.GeoContourSet object at 0x2ab679e8bf10>, | 
|  | None, None]], dtype=object)</code> | 
|  |  | 
|  |  | 
|  | ==== Dealing with a variable number of indices ==== | 
|  |  | 
|  | [[https://numpy.org/doc/stable/user/basics.indexing.html#dealing-with-variable-indices|Official reference]] | 
|  |  | 
|  | <code>>>> i10 = np.identity(10) | 
|  | >>> i10 | 
|  | array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], | 
|  | [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], | 
|  | ... | 
|  | [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]) | 
|  | >>> i10.shape | 
|  | (10, 10) | 
|  |  | 
|  | >>> i10[3:7, 4:6] | 
|  | array([[0., 0.], | 
|  | [1., 0.], | 
|  | [0., 1.], | 
|  | [0., 0.]]) | 
|  |  | 
|  | >>> s0 = slice(3, 7) | 
|  | >>> s1 = slice(4, 6) | 
|  | >>> i10[s0, s1] | 
|  | array([[0., 0.], | 
|  | [1., 0.], | 
|  | [0., 1.], | 
|  | [0., 0.]]) | 
|  |  | 
|  | >>> my_slices = (s0, s1) | 
|  | >>> i10[my_slices] | 
|  | array([[0., 0.], | 
|  | [1., 0.], | 
|  | [0., 1.], | 
|  | [0., 0.]]) | 
|  |  | 
|  | >>> my_fancy_slices = (s0, Ellipsis) | 
|  | >>> i10[my_fancy_slices] | 
|  | array([[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.], | 
|  | [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.], | 
|  | [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], | 
|  | [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.]]) | 
|  | >>> i10[my_fancy_slices].shape | 
|  | (4, 10) | 
|  |  | 
|  | >>> # WARNING! DANGERRRR! NEVER forget that a VIEW is NOT A COPY | 
|  | >>> # and that you can change the content of the original array by mistake | 
|  | >>> my_view = i10[my_slices] | 
|  | >>> my_view[:, :] = -1 | 
|  | >>> my_view | 
|  | array([[-1., -1.], | 
|  | [-1., -1.], | 
|  | [-1., -1.], | 
|  | [-1., -1.]]) | 
|  | >>> i10 | 
|  | array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], | 
|  | [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], | 
|  | [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], | 
|  | [ 0.,  0.,  0.,  1., -1., -1.,  0.,  0.,  0.,  0.], | 
|  | [ 0.,  0.,  0.,  0., -1., -1.,  0.,  0.,  0.,  0.], | 
|  | [ 0.,  0.,  0.,  0., -1., -1.,  0.,  0.,  0.,  0.], | 
|  | [ 0.,  0.,  0.,  0., -1., -1.,  1.,  0.,  0.,  0.], | 
|  | [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.], | 
|  | [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.], | 
|  | [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])</code> | 
|  |  | 
|  |  | 
|  | ==== Finding and counting unique values ==== | 
|  |  | 
| Use ''np.unique'', do **not** try to use histogram related functions! | Use ''np.unique'', do **not** try to use histogram related functions! | 
| array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])</code> | array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])</code> | 
|  |  | 
| === Applying a ufunc over all the elements of an array === |  | 
|  | ==== Applying a ufunc over all the elements of an array ==== | 
|  |  | 
| There are all sorts of //ufuncs// (Universal Functions), and we will just use below ''add'' from the [[https://numpy.org/doc/stable/reference/ufuncs.html#math-operations|math operations]], applied on the arrays defined in [[#finding_and_counting_unique_values|Finding and counting unique values]] | There are all sorts of //ufuncs// (Universal Functions), and we will just use below ''add'' from the [[https://numpy.org/doc/stable/reference/ufuncs.html#math-operations|math operations]], applied on the arrays defined in [[#finding_and_counting_unique_values|Finding and counting unique values]] | 
| >>> sorted_vals[0:3].sum(), sorted_vals[3:6].sum(), sorted_vals[6:10].sum() | >>> sorted_vals[0:3].sum(), sorted_vals[3:6].sum(), sorted_vals[6:10].sum() | 
| (3.0, 4.5, 8.0)</code> | (3.0, 4.5, 8.0)</code> | 
|  |  | 
|  |  | 
|  | ==== Applying a ufunc over specified sections of an array ==== | 
|  |  | 
|  | The [[https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html#numpy.ufunc.reduceat|reduceat]] function can be used to avoid explicit python loops, and improve the speed (but not the readability...) of a script. The example below //improves// what has been shown above | 
|  |  | 
|  | <code># Define a list with the boundaries of the intervals we want to apply the 'add' function to | 
|  | # We need to add the beginning index (0), AND remove the last index | 
|  | # (reduceat will automatically go to the end of the input array | 
|  | >>> nb_unique | 
|  | array([3, 3, 4]) | 
|  | >>> slices_indices = [0] + list(np.add.accumulate(nb_unique)) | 
|  | >>> slices_indices.pop() # Remove last element | 
|  | 10 | 
|  | >>> slices_indices | 
|  | [0, 3, 6] | 
|  |  | 
|  | # Compute the sums over the selected intervals with just one call | 
|  | >>> np.add.reduceat(np.sort(vals), slices_indices) | 
|  | array([3. , 4.5, 8. ])</code> | 
|  |  | 
|  | ==== Exercise your brain with numpy ==== | 
|  |  | 
|  | Have a look at [[https://github.com/rougier/numpy-100/blob/master/100_Numpy_exercises.ipynb|100 numpy exercises]] | 
|  |  | 
|  | ===== matplotlib related stuff ===== | 
|  |  | 
|  | ==== Working with time axes (and ticks) ==== | 
|  |  | 
|  | If you have problems setting the limits of a time axis, choosing the ticks' locations, or specifying the style of the labels, you should check the: | 
|  | * [[https://matplotlib.org/stable/gallery/index.html#ticks|Ticks examples' gallery]] | 
|  | * [[https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html|Date tick labels example]] | 
|  |  | 
|  |  | 
|  | ===== Data representation ===== | 
|  |  | 
|  | A few notes for a future section or page about about //data representation// (bits and bytes) on disk and in memory, vs //data format// | 
|  |  | 
|  | FIXME Add parts (pages 28 to 37) of this [[http://www.lsce.ipsl.fr/Phocea/file.php?class=page&file=5/pythonCDAT_jyp_2sur2_070306.pdf|old tutorial]] to this section | 
|  |  | 
|  | ==== Base notions ==== | 
|  |  | 
|  | * **Never forget** that all the bits and pieces of information we use are coded in [[https://en.wikipedia.org/wiki/Binary_number#Counting_in_binary|base 2]] (''0''s and ''1''s ...), grouped in bytes! | 
|  | * Some things can be stored exactly (integers, characters, ...) | 
|  | * In other cases (**//real// numbers** that we work with all the time, compressed images/videos/music) we only store **//good enough approximation//** | 
|  |  | 
|  | * 1 byte <=> 8 bits | 
|  | * ''REAL*4'' <=> 4 bytes <=> 32 bits | 
|  | * For easier written/displayed representation, 1 byte is usually split into 2 groups of 4 bits, and displayed using base 16 and [[https://en.wikipedia.org/wiki/Hexadecimal|hexadecimal representation]] (characters ''0'', ''1'', ..., ''A'', ''B'', ..., ''F'') | 
|  | * ''0000'' <=> ''0'',\\ ''0010'' <=> ''1'', ...,\\ ''1111'' <=> ''F'' | 
|  | * ''1101'' <=> ''D'' in hexadecimal <=> ''13'' in decimal (''**1** * 8 + **1** * 4 + **0** * 2 + **1** * 1'') | 
|  | * ''11111101'' in //base 2// <=> ''1111 1101'' <=> ''FD'' in //hexadecimal// <=> ''253'' (''15 * 16 + 13'') in //decimal// | 
|  |  | 
|  | * Base conversion with Python | 
|  | * <code>>>> hex(13) # Decimal to Hexadecimal conversion | 
|  | '0xd' | 
|  | >>> hex(253) | 
|  | '0xfd' | 
|  | >>> hex(256) | 
|  | '0x100' | 
|  | >>> int('0x100', 16) # Hexadecimal to Decimal conversion | 
|  | 256 | 
|  | >>> int('1111', 2) # Binary to Decimal conversion | 
|  | 15 | 
|  | >>> int('11111101', 2) # '11111101' <=> '1111 1101' <=> 'FD' <=> 15 * 16 + 13 = 253 | 
|  | 253 | 
|  | >>> 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0 | 
|  | 11 | 
|  | >>> int('13', 8) # 1*8 + 3 | 
|  | 11</code> | 
|  |  | 
|  | * More technical topics | 
|  | * [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]]: the art of ordering bits, everything about MSB (Most Significant Byte) and LSB (Least Significant Byte) | 
|  | * [[https://en.wikipedia.org/wiki/Endianness|Endianness]]: the art of ordering bytes | 
|  | ==== Numerical values ==== | 
|  |  | 
|  | * Binary data representation of some numbers (only some common types are listed here): | 
|  | * Languages and packages **references** used below: | 
|  | * Python: [[https://numpy.org/doc/stable/reference/arrays.scalars.html#sized-aliases|NumPy Sized aliases]] | 
|  | * NetCDF: [[https://docs.unidata.ucar.edu/nug/current/md_types.html|Data Types]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|Fortran related Data Types]], [[https://docs.unidata.ucar.edu/nug/current/_c_d_l.html#cdl_data_types|CDL Data Types]] | 
|  | * Fortran: Intel Fortran Compiler [[https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-1/intrinsic-data-types.html|Intrinsic Data Types]] | 
|  | * [[https://en.wikipedia.org/wiki/Integer_(computer_science)|Integers]] | 
|  | * Range: | 
|  | * 4-byte //signed// integers: ''−2,147,483,648'' to ''2,147,483,647'' | 
|  | * Python: ''numpy.int32'' | 
|  | * NetCDF: ''int'', ''NC_INT'' or ''NC_LONG'', ''NF90_INT'' | 
|  | * Fortran: ''INTEGER*4'' | 
|  | * 8-byte //signed// integers: ''−9,223,372,036,854,775,808'' to ''9,223,372,036,854,775,807'' | 
|  | * Python: ''numpy.int64'' | 
|  | * NetCDF: ''int64'', ''NC_INT64'' | 
|  | * Fortran: ''INTEGER*8'' | 
|  | * Tech note: signed integers use [[https://en.wikipedia.org/wiki/Two%27s_complement|two's complement]] for coding negative integers | 
|  | * [[https://en.wikipedia.org/wiki/IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//) | 
|  | * Range: | 
|  | * 4-byte float: ''~8 significant digits * 10E±38'' | 
|  | * Python: ''numpy.float32'' | 
|  | * NetCDF: ''float'', ''NC-FLOAT'', ''NF90_FLOAT'' | 
|  | * Fortran:''REAL*4'' | 
|  | * See also [[https://en.wikipedia.org/wiki/Single-precision_floating-point_format|Single-precision floating-point format]] | 
|  | * 8-byte float: ''~15 significant digits * 10E±308'' | 
|  | * Python: ''numpy.float64'' | 
|  | * NetCDF: ''double'', ''NC_DOUBLE'', ''NF90_DOUBLE'' | 
|  | * Fortran: ''REAL*8'' | 
|  | * **Special values**: | 
|  | * [[https://en.wikipedia.org/wiki/NaN|NaN]]: //Not a Number// | 
|  | * Python: ''numpy.nan'' | 
|  | * Infinity | 
|  | * Python: ''-numpy.inf'' and ''numpy.inf'' | 
|  | * Note: it is cleaner to use masks (and [[https://numpy.org/doc/stable/reference/maskedarray.generic.html|Numpy masked arrays]]) rather than ''NaN''s, when you have to deal with missing values ! | 
|  | * <wrap hi>The RISKS of working with (the wrong) floats</wrap>: | 
|  | * [[https://en.wikipedia.org/wiki/Round-off_error|Round-off error]] | 
|  | * [[https://en.wikipedia.org/wiki/Catastrophic_cancellation|Catastrophic cancellation]] | 
|  | * [[https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html|What Every Computer Scientist Should Know About Floating-Point Arithmetic]] | 
|  | * A rather technical example: we //play// with a numpy 4-byte integer scalar | 
|  | * <code>>>> one_int32 = np.int32(1) | 
|  | >>> one_int32 | 
|  | 1 | 
|  | >>> type(one_int32) | 
|  | <class 'numpy.int32'> | 
|  | >>> one_int32.dtype | 
|  | dtype('int32') | 
|  | >>> one_int32.shape # A numpy SCALAR, is an ARRAY WITH NO SHAPE ! | 
|  | () | 
|  | >>> one_int32[0] | 
|  | Traceback (most recent call last): | 
|  | File "<stdin>", line 1, in <module> | 
|  | IndexError: invalid index to scalar variable. | 
|  | >>> one_int32[()] # Note how to access the single element, when there is NO SHAPE | 
|  | 1 | 
|  | >>> one_int32.ndim # NO SHAPE means no dimensions, but there is ONE element | 
|  | 0 | 
|  | >>> one_int32.size | 
|  | 1 | 
|  | >>> one_int32.nbytes # The element requires 4 bytes of storage | 
|  | 4 | 
|  | >>> hex(one_int32) # We can print the hexadecimal representation for INTEGERS scalars and arrays | 
|  | '0x1' | 
|  | >>> hex(one_int32 * 15) | 
|  | '0xf' | 
|  | >>> hex(one_int32 * 16) | 
|  | '0x10' | 
|  |  | 
|  | # 'Serialize' the data (i.e. change the data to a series of bytes) | 
|  | # Note: the serialized data seems to be printed in the reverse order of 'hex(one_int32)' | 
|  | >>> one_int32_serialized = one_int32.tobytes() | 
|  | >>> type(one_int32_serialized) | 
|  | <class 'bytes'> | 
|  | >>> len(one_int32_serialized) | 
|  | 4 | 
|  | >>> one_int32_serialized | 
|  | b'\x01\x00\x00\x00' | 
|  | >>> one_int32_serialized.hex(' ') # Another way to print the hexadecimal values | 
|  | '01 00 00 00' | 
|  |  | 
|  | # Use the following in the unlikely case where you need to change the endianness (bytes ordering) | 
|  | >>> one_int32_reversed_endian = one_int32.byteswap() | 
|  | >>> one_int32_reversed_endian # Same bytes in a different order represent a different number (of course) | 
|  | 16777216 | 
|  | >>> hex(one_int32_reversed_endian) # Compare to the output of hex(one_int32) above | 
|  | '0x1000000' | 
|  | >>> one_int32_reversed_endian.tobytes() | 
|  | b'\x00\x00\x00\x01'</code> | 
|  | * Another technical example: we use an array of 2 integers\\ When using ''byteswap()'', notice how bytes are swapped by groups of 4 bytes, because int32 use 4 bytes | 
|  | * <code>>>> array_example = np.asarray((3, 17), dtype=np.int32) | 
|  | >>> array_example | 
|  | array([ 3, 17], dtype=int32) | 
|  | >>> array_example.shape, array_example.ndim, array_example.size, array_example.nbytes | 
|  | ((2,), 1, 2, 8) | 
|  | >>> array_example.tobytes().hex(' ', 4) | 
|  | '03000000 11000000' | 
|  | >>> array_example.byteswap().tobytes().hex(' ', 4) | 
|  | '00000003 00000011' | 
|  | </code> | 
|  |  | 
|  | * Manipulating binary data with [[https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview|bytes, bytearray, memoryview]] | 
|  |  | 
|  | * Array addressing | 
|  | * [[https://www.geeksforgeeks.org/calculation-of-address-of-element-of-1-d-2-d-and-3-d-using-row-major-and-column-major-order/|Calculation of address of element of 1-D, 2-D, and 3-D using row-major and column-major order]] | 
|  | * In other words: //using indices to go from 1-D to n-Dimnensions data// | 
|  | * The [[https://en.wikipedia.org/wiki/Array_(data_structure)|array]] structure | 
|  | * python/C vs Fortran... | 
|  |  | 
|  | * disk and ram usage: how to check the usage (available ram and disk), best practice on multi-user systems (how much allowed?) | 
|  | * ''du'', ''df'', ''cat /proc/meminfo'', ''top'' | 
|  |  | 
|  | * understanding and reverse-engineering //binary// format | 
|  | * ''od'', ''strings'' | 
|  |  | 
|  | * binary vs text format: ascii, utf, raw | 
|  | * text related functions in python: ''str'', ''int'', ''float'', ''ord'', ... | 
|  | * lists conversion with ''map'' and ''join'' | 
|  |  | 
|  | * Misc : ''md5sum'' | 
|  |  | 
|  | ==== Strings ==== | 
|  |  | 
|  | * Encoding, [[https://en.wikipedia.org/wiki/ASCII|ASCII]], [[https://en.wikipedia.org/wiki/Unicode|unicode]], [[https://en.wikipedia.org/wiki/UTF-8|UTF-8]], ... | 
|  |  | 
|  | * Getting the binary representation of a string | 
|  | * <code>>>> test_string = 'A B 0 1 à µ' | 
|  | >>> type(test_string) | 
|  | <class 'str'> | 
|  | >>> len(test_string) | 
|  | 11 | 
|  | >>> test_string_bin = test_string.encode('utf-8') | 
|  | >>> test_string_bin | 
|  | b'A B 0 1 \xc3\xa0 \xc2\xb5' | 
|  | >>> type(test_string_bin) | 
|  | <class 'bytes'> | 
|  | >>> len(test_string_bin) | 
|  | 13 | 
|  | >>> test_string_bin.hex('-') | 
|  | '41-20-42-20-30-20-31-20-c3-a0-20-c2-b5' | 
|  | </code> | 
|  |  | 
|  |  | 
| /* | /* | 
| ==== Tip template ==== | ===== Tip template ===== | 
|  |  | 
| <code>Some code</code> | <code>Some code</code> |