User Tools

Site Tools


other:python:misc_by_jyp

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
other:python:misc_by_jyp [2021/08/25 12:00]
jypeter [Working with paths and filenames] Added more examples
other:python:misc_by_jyp [2023/12/08 15:51] (current)
jypeter [Efficient looping with numpy, map and itertools] Added list comprehension
Line 5: Line 5:
 </​WRAP>​ </​WRAP>​
  
-==== Reading/​setting environments variables ==== 
  
 +===== Reading/​setting environments variables =====
  
 <​code>>>>​ os.environ['​TMPDIR'​] <​code>>>>​ os.environ['​TMPDIR'​]
Line 17: Line 17:
 </​code>​ </​code>​
  
-==== Generating (aka raising) an error ====+ 
 +===== Generating (aka raising) an error =====
  
 This will stop the script, unless it is called in a function, and the code calling the function explicitely catches and deals with errors This will stop the script, unless it is called in a function, and the code calling the function explicitely catches and deals with errors
Line 25: Line 26:
  
  
-==== Stopping a script ====+===== Stopping a script ​=====
  
 A user can use ''​CTRL-C''​ or ''​kill''​ to stop a script, or ''​CTRL-Z''​ to suspend it temporarily (use ''​fg''​ to resume a suspended script). The code below can be used by the script itself to interrupt its execution, instead of raising an error A user can use ''​CTRL-C''​ or ''​kill''​ to stop a script, or ''​CTRL-Z''​ to suspend it temporarily (use ''​fg''​ to resume a suspended script). The code below can be used by the script itself to interrupt its execution, instead of raising an error
  
 <​code>​sys.exit('​Some optional message about why we are stopping'​)</​code>​ <​code>​sys.exit('​Some optional message about why we are stopping'​)</​code>​
- +===== Checking if a file/​directory is writable by the current user =====
- +
-==== Checking if a file/​directory is writable by the current user ====+
  
 <​code>>>>​ os.access('/',​ os.W_OK) <​code>>>>​ os.access('/',​ os.W_OK)
Line 38: Line 37:
 >>>​ os.access('/​home/​jypmce/​.bashrc',​ os.W_OK) >>>​ os.access('/​home/​jypmce/​.bashrc',​ os.W_OK)
 True</​code>​ True</​code>​
 +
 +
 +===== Playing with strings =====
 +
 +
 +==== Splitting (complex) strings ====
 +
 +It's easy to split a string with multiple blank delimiters, or a specific delimiter, but it can be harder to deal with sub-strings
 +
 +<​code>>>>​ str_with_blanks = '​one ​   two\t3\t\tFOUR'​
 +>>>​ str_with_blanks.split()
 +['​one',​ '​two',​ '​3',​ '​FOUR'​]
 +
 +>>>​ str_with_simple_delimiters = '​1,​2,​3.14, ​ 4'
 +>>>​ str_with_simple_delimiters.split(','​)
 +['​1',​ '​2',​ '​3.14',​ ' ​ 4']
 +
 +>>>​ complex_string='​-o 1 --long "A string with accented chars: é è à ç"'​
 +>>>​ complex_string.split()
 +['​-o',​ '​1',​ '​--long',​ '"​A',​ '​string',​ '​with',​ '​accented',​ '​chars:',​ '​\xc3\xa9',​ '​\xc3\xa8',​ '​\xc3\xa0',​ '​\xc3\xa7"'​]
 +
 +>>>​ import shlex
 +>>>​ shlex.split(complex_string)
 +['​-o',​ '​1',​ '​--long',​ 'A string with accented chars: \xc3\xa9 \xc3\xa8 \xc3\xa0 \xc3\xa7'​]</​code>​
 +
  
 ==== Working with paths and filenames ==== ==== Working with paths and filenames ====
  
-If you are in a hurry, you can just use string functions to work with path and file names. ​But you will need some specific functions to check if a file exists, and similar operations. ​All these are available in 2 libraries that have similar functions. Both of these libraries ​can deal with Unix-type paths on Linux computers, and Windows-type paths on Windows computers+If you are in a hurry, you can just use string functions to work with paths and file names. 
 + 
 + 
 +You will need some specific ​objects and functions to check if a file exists, and similar operations. ​Check the libraries ​listed below, ​that can automatically ​deal with Unix-type paths on Linux and MacOS computers, and Windows-type paths on Windows computers
  
-  * [[https://​docs.python.org/​3/​library/​os.path.html|os.path]] //Common ​pathname manipulations//​+  * [[https://​docs.python.org/​3/​library/​os.path.html|os.path]]//common ​pathname manipulations//​
     * Available since... a long time! Use this if you want to avoid backward compatibility problems     * Available since... a long time! Use this if you want to avoid backward compatibility problems
     * Some functions are directly in [[https://​docs.python.org/​3/​library/​os.html|os]] //​Miscellaneous operating system interfaces//​\\ e.g. [[https://​docs.python.org/​3/​library/​os.html#​os.remove|os.remove]] and [[https://​docs.python.org/​3/​library/​os.html#​os.rmdir|os.rmdir]]     * Some functions are directly in [[https://​docs.python.org/​3/​library/​os.html|os]] //​Miscellaneous operating system interfaces//​\\ e.g. [[https://​docs.python.org/​3/​library/​os.html#​os.remove|os.remove]] and [[https://​docs.python.org/​3/​library/​os.html#​os.rmdir|os.rmdir]]
-  * [[https://​docs.python.org/​3/​library/​pathlib.html|pathlib]] //Object-oriented filesystem paths//+  * [[https://​docs.python.org/​3/​library/​pathlib.html|pathlib]]: a **more recent** ​//object-oriented// way to deal with //filesystem paths//
     * Available since Python version 3.4     * Available since Python version 3.4
     * [[https://​docs.python.org/​3/​library/​pathlib.html#​correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]]     * [[https://​docs.python.org/​3/​library/​pathlib.html#​correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]]
-  * [[https://​docs.python.org/​3/​library/​shutil.html|High-level file operations]]+  * [[https://​docs.python.org/​3/​library/​shutil.html|shutil]]: ​High-level file operations, e.g copy/move a file or directory tree
  
  
-=== Example: getting the full path of the Python used ===+=== Example: getting the full path of the Python ​executable ​used ===
  
-<​code>>>>​ import shutil +Note: the actual python may be different from the default python! 
->>> ​my_python = shutil.which('​python'​) + 
->>> ​my_python+<​code>​$ which python 
 +/​usr/​bin/​python 
 + 
 +$ /​home/​share/​unix_files/​cdat/​miniconda3_21-02/​envs/​cdatm_py3/​bin/​python 
 +>>>​ import ​sys, shutil 
 +>>>​ shutil.which('​python'​) 
 +'/​usr/​bin/​python'​ 
 +>>> ​sys.executable
 '/​home/​share/​unix_files/​cdat/​miniconda3_21-02/​envs/​cdatm_py3/​bin/​python'</​code>​ '/​home/​share/​unix_files/​cdat/​miniconda3_21-02/​envs/​cdatm_py3/​bin/​python'</​code>​
  
Line 73: Line 107:
 </​code>​ </​code>​
  
 +
 +=== Example: system independent paths with pathlib ===
 +
 +Note: the following example was generated on a Linux server and uses a <wrap em>/</​wrap>​ character as a path separator
 +
 +<​code>>>>​ my_home = Path.home()
 +>>>​ my_home
 +PosixPath('/​home/​users/​my_login'​)
 +>>>​ my_conf = my_home / '​.config'​ / '​evince'​
 +>>>​ my_conf
 +PosixPath('/​home/​users/​my_login/​.config/​evince'​)
 +>>>​ my_conf.is_dir()
 +True
 +>>>​ my_conf.is_file()
 +False
 +>>>​ list(my_conf.glob('​*'​))
 +[PosixPath('/​home/​users/​my_login/​.config/​evince/​evince_toolbar.xml'​),​ PosixPath('​ /​home/​users/​my_login/​.config/​evince/​accels'​)]
 +>>>​ [ ff.name for ff in my_conf.glob('​*'​) ]
 +['​evince_toolbar.xml',​ '​accels'​]
 +</​code>​
  
 === Example: getting the size(s) of all the files in a directory === === Example: getting the size(s) of all the files in a directory ===
Line 95: Line 149:
 >>>​ sum(files_sizes) >>>​ sum(files_sizes)
 64792</​code>​ 64792</​code>​
-==== Using command-line arguments ==== 
  
-=== The extremely easy but non-flexible way: sys.argv ===+==== Generating file names ==== 
 + 
 +=== Name depending on the current date/time === 
 + 
 +<​code>>>>​ import time 
 +>>>​ plot_version = time.strftime('​%Y%m%d_%H%M'​) 
 +>>>​ f_name = '​test_%s.nc'​ % (plot_version,​) 
 +>>>​ f_name 
 +'​test_20210827_1334.nc'​ 
 +</​code>​ 
 + 
 +=== Temporary file === 
 + 
 +<​code>>>>​ import tempfile, os 
 +>>>​ f_tmp = tempfile.NamedTemporaryFile(mode='​w',​ suffix='​.nc',​ delete=False) 
 +>>>​ f_tmp 
 +<​tempfile._TemporaryFileWrapper object at 0x2b5614743820>​ 
 +>>>​ f_tmp.name 
 +'/​tmp/​tmpi6uk9hre.nc'​ 
 +>>>​ f_tmp.close() 
 +>>>​ os.remove(f_tmp.name)</​code>​ 
 + 
 + 
 +===== Using command-line arguments ===== 
 + 
 +==== The extremely easy but non-flexible way: sys.argv ​====
  
 The name of a script, the number of arguments (including the name of the script), and the arguments (as strings) can be accessed through the ''​sys.argv''​ strings'​ list The name of a script, the number of arguments (including the name of the script), and the arguments (as strings) can be accessed through the ''​sys.argv''​ strings'​ list
Line 119: Line 197:
 2 tas_tes.nc</​code>​ 2 tas_tes.nc</​code>​
  
-=== The C-style way: getopt ===+ 
 +==== The C-style way: getopt ​====
  
 Use [[https://​docs.python.org/​3/​library/​getopt.html|getopt]] (//C-style parser for command line options//) Use [[https://​docs.python.org/​3/​library/​getopt.html|getopt]] (//C-style parser for command line options//)
  
-=== The deprecated Python way: optparse ===+ 
 +==== The deprecated Python way: optparse ​====
  
 [[https://​docs.python.org/​3/​library/​optparse.html|optparse]] (//parser for command line options//) is **deprecated since Python version 3.2**! You should now use argparse (check [[https://​docs.python.org/​3/​library/​argparse.html#​upgrading-optparse-code|Upgrading optparse code]] for converting from ''​optparse''​ to ''​argparse''​) [[https://​docs.python.org/​3/​library/​optparse.html|optparse]] (//parser for command line options//) is **deprecated since Python version 3.2**! You should now use argparse (check [[https://​docs.python.org/​3/​library/​argparse.html#​upgrading-optparse-code|Upgrading optparse code]] for converting from ''​optparse''​ to ''​argparse''​)
  
-=== The current Python way: argparse ===+ 
 +==== The current Python way: argparse ​====
  
 [[https://​docs.python.org/​3/​library/​argparse.html|argparse]] (//parser for command-line options, arguments and sub-commands//​) is available since Python version 3.2 [[https://​docs.python.org/​3/​library/​argparse.html|argparse]] (//parser for command-line options, arguments and sub-commands//​) is available since Python version 3.2
  
-==== Using ordered dictionaries ====+ 
 +===== Using ordered dictionaries ​=====
  
 **Dictionary order is guaranteed to be insertion order**! Note that the [[https://​docs.python.org/​3/​library/​stdtypes.html#​dict|usual Python dictionary]] also guarantees the order since version **3.6** **Dictionary order is guaranteed to be insertion order**! Note that the [[https://​docs.python.org/​3/​library/​stdtypes.html#​dict|usual Python dictionary]] also guarantees the order since version **3.6**
Line 137: Line 219:
 Check the [[https://​docs.python.org/​3/​library/​collections.html#​collections.OrderedDict|OrderedDict class]] (''​from collections import OrderedDict''​) and the [[https://​realpython.com/​python-ordereddict/​|OrderedDict vs dict in Python: The Right Tool for the Job]] tutorial Check the [[https://​docs.python.org/​3/​library/​collections.html#​collections.OrderedDict|OrderedDict class]] (''​from collections import OrderedDict''​) and the [[https://​realpython.com/​python-ordereddict/​|OrderedDict vs dict in Python: The Right Tool for the Job]] tutorial
  
-==== Printing a readable version of long lists or dictionaries ====+ 
 +===== Using sets ===== 
 + 
 +[[https://​docs.python.org/​3/​tutorial/​datastructures.html#​sets|Python sets]] are **groups of unique elements**. They can be used to easily find all the unique elements of //​something//​ and you can easily determine the **intersection**,​ **union** (and other similar operations) of sets. 
 + 
 + 
 +===== Printing a readable version of long lists or dictionaries ​=====
  
 The [[https://​docs.python.org/​3/​library/​pprint.html|pprint]] module can be used for //pretty printing// objects (lists, dictionaries,​ ...). It will wrap long lines in a meaningful way The [[https://​docs.python.org/​3/​library/​pprint.html|pprint]] module can be used for //pretty printing// objects (lists, dictionaries,​ ...). It will wrap long lines in a meaningful way
  
-<​code>>>> ​from collections ​import ​OrderedDict+<​code>>>>​ import ​pprint
  
->>>​ test_dic = OrderedDict([('​AWI-ESM-1-1-LR_AWI'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})('​CESM2_NCAR'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})('​IPSL-CM6A-LR_IPSL'​{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ '​r1i1p1f2':​ {'​grid':​ '​gr'​},​ '​r1i1p1f3':​ {'​grid':​ '​gr'​},​ '​r1i1p1f4':​ {'​grid':​ '​gr'​}})])+>>>​ test_dic = {'​AWI-ESM-1-1-LR_AWI'​:{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ '​CESM2_NCAR'​:{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ '​IPSL-CM6A-LR_IPSL'​:{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ '​r1i1p1f2':​ {'​grid':​ '​gr'​},​ '​r1i1p1f3':​ {'​grid':​ '​gr'​},​ '​r1i1p1f4':​ {'​grid':​ '​gr'​}}}
  
 >>>​ print(test_dic) >>>​ print(test_dic)
-OrderedDict([('​AWI-ESM-1-1-LR_AWI'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})('​CESM2_NCAR'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})('​IPSL-CM6A-LR_IPSL'​{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ '​r1i1p1f2':​ {'​grid':​ '​gr'​},​ '​r1i1p1f3':​ {'​grid':​ '​gr'​},​ '​r1i1p1f4':​ {'​grid':​ '​gr'​}})])+{'​AWI-ESM-1-1-LR_AWI'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ '​CESM2_NCAR'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ '​IPSL-CM6A-LR_IPSL'​{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ '​r1i1p1f2':​ {'​grid':​ '​gr'​},​ '​r1i1p1f3':​ {'​grid':​ '​gr'​},​ '​r1i1p1f4':​ {'​grid':​ '​gr'​}}}
  
 >>>​ pprint.pprint(test_dic) >>>​ pprint.pprint(test_dic)
-OrderedDict([('​AWI-ESM-1-1-LR_AWI'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})+{'​AWI-ESM-1-1-LR_AWI'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ 
-             ('​CESM2_NCAR'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})+ '​CESM2_NCAR'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ 
-             ('​IPSL-CM6A-LR_IPSL'​+ '​IPSL-CM6A-LR_IPSL'​{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ 
-              ​{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ +                       ​'​r1i1p1f2':​ {'​grid':​ '​gr'​},​ 
-               ​'​r1i1p1f2':​ {'​grid':​ '​gr'​},​ +                       ​'​r1i1p1f3':​ {'​grid':​ '​gr'​},​ 
-               ​'​r1i1p1f3':​ {'​grid':​ '​gr'​},​ +                       ​'​r1i1p1f4':​ {'​grid':​ '​gr'​}}
-               ​'​r1i1p1f4':​ {'​grid':​ '​gr'​}})])+                        
 +>>>​ dir(test_dic) 
 +['​__class__',​ '​__contains__',​ '​__delattr__',​ [... lots of unreadable stuff removed...'​setdefault',​ '​update',​ '​values'​] 
 + 
 +>>>​ pprint.pprint(dir(test_dic)
 +['​__class__',​ 
 + '​__contains__',​ 
 + 
 +[... lots of lines removed in this example ] 
 + 
 + '​setdefault',​ 
 + '​update',​ 
 + '​values'​] 
 </​code>​ </​code>​
 +
 +
 +===== Storing objects and data in a file (shelve and friends) =====
 +
 +The built-in [[other:​python:​jyp_steps#​the_shelve_package|shelve]] module can be **easily** used for storing temporary/​intermediate data
 +
 +More options:
 +  * Some [[other:​python:​jyp_steps#​data_file_formats|non-NetCDF]] file formats
 +  * Working with [[other:​python:​jyp_steps#​netcdf_filesusing_cdms2_xarray_and_netcdf4|NetCDF]] files
 +
 +
 +===== Using a configuration file =====
 +
 +The built-in [[https://​docs.python.org/​3/​library/​configparser.html|configparser]] module can be easily used for reading (**and** writing!) text configuration files.
 +
 +Note: a configuration file is also a way to easily store and exchange text data !
 +
 +
 +===== Working with global variables =====
 +
 +There is a good chance you don't actually want/need a //global// variable. Be sure to use the ''​global''​ statement correctly if you want to avoid side-effects...
 +
 +  * [[https://​docs.python.org/​3/​faq/​programming.html?​highlight=global#​why-am-i-getting-an-unboundlocalerror-when-the-variable-has-a-value|Using (and changing) a global variable inside a script or module]]
 +    * Simple module example\\ <​code>​_myvar = 10
 +
 +def set_myvar(new_val):​
 +    # Note: need to explicitly define a global variable (of a module)
 +    # as '​global'​ BEFORE changing its value in a function!
 +    # Otherwise, the value will not be REdefined outside the function
 +    global _myvar
 +    _myvar = new_val
 +
 +def get_myvar():​
 +    return _myvar
 +
 +def myfunc(nb_repeat = 10):
 +    print(nb_repeat * _myvar)</​code>​
 +  * [[https://​docs.python.org/​3/​faq/​programming.html?​highlight=global#​how-do-i-share-global-variables-across-modules|Sharing global variables across modules]]
 +===== Sorting =====
 +
 +  * When dealing with **numerical values**, you should use the [[https://​numpy.org/​doc/​stable/​reference/​routines.sort.html|numpy sorting, searching, and counting routines]]!
 +  * [[https://​docs.python.org/​3/​howto/​sorting.html|Sorting HOW TO]]
 +  * Example: sorting the keys and the values of a dictionary, and then using the ''​key''​ parameter to sort the keys of a dictionary according to the value associated with the key
 +    * If we provide a ''​key''​ function, the ''​sort''​ function will sort the elements by the values returned by the function, instead of sorting by the initial values. The function used for generating the key below is very simple and we can use a //lambda// (i.e //in place//) function
 +    * <​code>>>>​ demo_dic = {'​a':​10,​ '​b':​5,​ '​c':​-1,​ '​d':​0}
 +
 +>>>​ sorted(demo_dic.keys())
 +['​a',​ '​b',​ '​c',​ '​d'​]
 +
 +>>>​ sorted(demo_dic.values())
 +[-1, 0, 5, 10]
 +
 +>>>​ sorted(demo_dic.keys(),​ key=lambda key_name:​demo_dic[key_name])
 +['​c',​ '​d',​ '​b',​ '​a'​]</​code>​
 +
 +
 +===== Efficient looping with numpy, map, itertools and list comprehension =====
 +
 +<wrap hi>Big, nested, explicit ''​for''​ loops should be avoided at all cost</​wrap>,​ in order to reduce a script execution time!
 +
 +  * **''​numpy''​ arrays** should be used when dealing with //numerical data//
 +    * **Masked arrays** can be used to deal with //special cases// and remove tests from loops
 +
 +  * The built-in [[https://​docs.python.org/​3/​library/​functions.html?​highlight=map#​map|map]] function (and similar functions like [[https://​docs.python.org/​3/​library/​functions.html?​highlight=zip#​zip|zip]],​ [[https://​docs.python.org/​3/​library/​functions.html?​highlight=filter#​filter|filter]],​ ...) can be used to efficiently apply a function (possibly a //simple// [[https://​docs.python.org/​3/​tutorial/​controlflow.html#​lambda-expressions|lambda]] function) to all the elements of a list
 +    * <​code>>>>​ my_ints = [1, 2, 3]
 +
 +>>>​ map(str, my_ints)
 +['​1',​ '​2',​ '​3'​]
 +
 +>>>​ map(lambda ii: str(10*ii + 5), my_ints)
 +['​15',​ '​25',​ '​35'​]</​code>​
 +
 +  * The [[https://​docs.python.org/​3/​library/​itertools.html|itertools]] module defines many more fancy iterators that can be used for efficient looping
 +    * Example: replacing nested loops with [[https://​docs.python.org/​3/​library/​itertools.html#​itertools.product|product]]
 +      * <​code>>>>​ it.product('​AB',​ '​01'​)
 +<​itertools.product object at 0x2b35a7b5f100>​
 +
 +>>>​ list(it.product('​AB',​ '​01'​))
 +[('​A',​ '​0'​),​ ('​A',​ '​1'​),​ ('​B',​ '​0'​),​ ('​B',​ '​1'​)]
 +
 +>>>​ for c1, c2 in it.product('​AB',​ '​01'​):​
 +...   ​print(c1 + c2)
 +...
 +A0
 +A1
 +B0
 +B1
 +
 +>>>​ for c1, c2 in it.product(['​A',​ '​B'​],​ ['​0',​ '​1'​]):​
 +...   ​print(c1 + c2)
 +...
 +A0
 +A1
 +B0
 +B1
 +
 +>>>​ for c1, c2, c3 in it.product('​AB',​ '​01',​ '​$!'​):​
 +...   ​print(c1 + c2 + c3, end=', ')
 +...
 +A0$, A0!, A1$, A1!, B0$, B0!, B1$, B1!,</​code>​
 +
 +  * The [[https://​docs.python.org/​3/​tutorial/​datastructures.html?​highlight=comprehension#​list-comprehensions|list comprehension]] (aka //implicit loops//) can also be used to generate lists from lists
 +    * Example: converting a list of integers to a list of strings\\ Note: in that case, you should rather use the ''​map''​ function detailed above
 +      * <​code>>>>​ my_ints = [1, 2, 3]
 +
 +>>>​ [ str(ii) for ii in my_ints ]
 +['​1',​ '​2',​ '​3'​]</​code>​
 +===== numpy related stuff =====
 +
 +==== Using a numpy array to store arbitrary objects ====
 +
 +The numpy arrays are usually used to store [[https://​numpy.org/​doc/​stable/​reference/​arrays.scalars.html|scalars]] of the same type (see also the [[https://​numpy.org/​doc/​stable/​reference/​arrays.dtypes.html|Data type objects (dtype)]]), very often numerical values.
 +
 +It is also possible to store **arbitrary** Python objects in an array, rather than using nested lists or dictionaries!
 +
 +<​code>>>>​ some_array = np.empty((2,​ 3), dtype=object)
 +>>>​ some_array
 +array([[None,​ None, None],
 +       ​[None,​ None, None]], dtype=object)
 +>>>​ some_array.shape
 +(2, 3)
 +>>>​ print(some_array[-1,​ -1])
 +None
 +>>>​ some_array[-1,​ 0] = filled_contour # e.g. save an existing cartopy filled contour object
 +>>>​ some_array
 +array([[None,​ None, None],
 +       ​[<​cartopy.mpl.contour.GeoContourSet object at 0x2ab679e8bf10>,​
 +        None, None]], dtype=object)</​code>​
 +
 +        ​
 +==== Dealing with a variable number of indices ====
 +
 +[[https://​numpy.org/​doc/​stable/​user/​basics.indexing.html#​dealing-with-variable-indices|Official reference]]
 +
 +<​code>>>>​ i10 = np.identity(10)
 +>>>​ i10
 +array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
 +       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
 +...
 +       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
 +>>>​ i10.shape
 +(10, 10)
 +
 +>>>​ i10[3:7, 4:6]
 +array([[0., 0.],
 +       [1., 0.],
 +       [0., 1.],
 +       [0., 0.]])
 +       
 +>>>​ s0 = slice(3, 7)
 +>>>​ s1 = slice(4, 6)
 +>>>​ i10[s0, s1]
 +array([[0., 0.],
 +       [1., 0.],
 +       [0., 1.],
 +       [0., 0.]])
 +       
 +>>>​ my_slices = (s0, s1)
 +>>>​ i10[my_slices]
 +array([[0., 0.],
 +       [1., 0.],
 +       [0., 1.],
 +       [0., 0.]])
 +       
 +>>>​ my_fancy_slices = (s0, Ellipsis)
 +>>>​ i10[my_fancy_slices]
 +array([[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
 +       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
 +       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
 +       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.]])
 +>>>​ i10[my_fancy_slices].shape
 +(4, 10)
 +
 +>>>​ # WARNING! DANGERRRR! NEVER forget that a VIEW is NOT A COPY
 +>>>​ # and that you can change the content of the original array by mistake
 +>>>​ my_view = i10[my_slices]
 +>>>​ my_view[:, :] = -1
 +>>>​ my_view
 +array([[-1.,​ -1.],
 +       [-1., -1.],
 +       [-1., -1.],
 +       [-1., -1.]])
 +>>>​ i10
 +array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
 +       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
 +       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
 +       [ 0.,  0.,  0.,  1., -1., -1.,  0.,  0.,  0.,  0.],
 +       [ 0.,  0.,  0.,  0., -1., -1.,  0.,  0.,  0.,  0.],
 +       [ 0.,  0.,  0.,  0., -1., -1.,  0.,  0.,  0.,  0.],
 +       [ 0.,  0.,  0.,  0., -1., -1.,  1.,  0.,  0.,  0.],
 +       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
 +       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
 +       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])</​code>​
 +
 +
 +==== Finding and counting unique values ====
 +
 +Use ''​np.unique'',​ do **not** try to use histogram related functions!
 +
 +<​code>>>>​ vals = np.random.randint(2,​ 5, (10,)) * 0.5 # Get 10 discreet float values
 +>>>​ vals
 +array([1. , 2. , 1. , 2. , 2. , 1.5, 1. , 1.5, 2. , 1.5])
 +
 +>>>​ np.unique(vals)
 +array([1. , 1.5, 2. ])
 +>>>​ unique_vals,​ nb_unique = np.unique(vals,​ return_counts=True)
 +>>>​ unique_vals
 +array([1. , 1.5, 2. ])
 +>>>​ nb_unique
 +array([3, 3, 4])
 +
 +>>>​ sorted_vals = np.sort(vals) # Sorted copy, in order to check the result
 +>>>​ sorted_vals
 +array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])</​code>​
 +
 +
 +==== Applying a ufunc over all the elements of an array ====
 +
 +There are all sorts of //ufuncs// (Universal Functions), and we will just use below ''​add''​ from the [[https://​numpy.org/​doc/​stable/​reference/​ufuncs.html#​math-operations|math operations]],​ applied on the arrays defined in [[#​finding_and_counting_unique_values|Finding and counting unique values]]
 +
 +<​code>#​ Get the sum of all the elements of '​vals'​
 +>>>​ np.add.reduce(vals)
 +15.5
 +>>>​ np.add.reduce(sorted_vals)
 +15.5
 +>>>​ vals.sum() # The usual and easy way to do it
 +15.5
 +
 +# Compute the sum of the elements of '​nb_unique'​
 +# AND keep (accumulate) the intermediate results
 +>>>​ nb_unique
 +array([3, 3, 4])
 +>>>​ np.add.accumulate(nb_unique)
 +array([ 3,  6, 10])
 +
 +# The accumulated values can be used as indices to separate the different groups of sorted values!
 +>>>​ sorted_vals
 +array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])
 +>>>​ sorted_vals[0:​3]
 +array([1., 1., 1.])
 +>>>​ sorted_vals[3:​6]
 +array([1.5, 1.5, 1.5])
 +>>>​ sorted_vals[6:​10]
 +array([2., 2., 2., 2.])
 +
 +# Compute the sum of each equal-value group
 +>>>​ sorted_vals[0:​3].sum(),​ sorted_vals[3:​6].sum(),​ sorted_vals[6:​10].sum()
 +(3.0, 4.5, 8.0)</​code>​
 +
 +
 +==== Applying a ufunc over specified sections of an array ====
 +
 +The [[https://​numpy.org/​doc/​stable/​reference/​generated/​numpy.ufunc.reduceat.html#​numpy.ufunc.reduceat|reduceat]] function can be used to avoid explicit python loops, and improve the speed (but not the readability...) of a script. The example below //​improves//​ what has been shown above
 +
 +<​code>#​ Define a list with the boundaries of the intervals we want to apply the '​add'​ function to
 +# We need to add the beginning index (0), AND remove the last index
 +# (reduceat will automatically go to the end of the input array
 +>>>​ nb_unique
 +array([3, 3, 4])
 +>>>​ slices_indices = [0] + list(np.add.accumulate(nb_unique))
 +>>>​ slices_indices.pop() # Remove last element
 +10
 +>>>​ slices_indices
 +[0, 3, 6]
 +
 +# Compute the sums over the selected intervals with just one call
 +>>>​ np.add.reduceat(np.sort(vals),​ slices_indices)
 +array([3. , 4.5, 8. ])</​code>​
 +
 +==== Exercise your brain with numpy ====
 +
 +Have a look at [[https://​github.com/​rougier/​numpy-100/​blob/​master/​100_Numpy_exercises.ipynb|100 numpy exercises]]
 +
 +===== matplotlib related stuff =====
 +
 +==== Working with time axes (and ticks) ====
 +
 +If you have problems setting the limits of a time axis, choosing the ticks' locations, or specifying the style of the labels, you should check the:
 +  * [[https://​matplotlib.org/​stable/​gallery/​index.html#​ticks|Ticks examples'​ gallery]]
 +  * [[https://​matplotlib.org/​stable/​gallery/​text_labels_and_annotations/​date.html|Date tick labels example]]
 +
 +
 +===== Data representation =====
 +
 +A few notes for a future section or page about about //data representation//​ (bits and bytes) on disk and in memory, vs //data format//
 +
 +FIXME Add parts (pages 28 to 37) of this [[https://​wiki.lsce.ipsl.fr/​pmip3/​doku.php/​other:​python:​jyp_steps#​part_2|old tutorial]] to this section
 +
 +==== Base notions ====
 +
 +  * **Never forget** that all the bits and pieces of information we use are coded in [[https://​en.wikipedia.org/​wiki/​Binary_number#​Counting_in_binary|base 2]] (''​0''​s and ''​1''​s ...), grouped in bytes!
 +    * Some things can be stored exactly (integers, characters, ...)
 +    * In other cases (**//real// numbers** that we work with all the time, compressed images/​videos/​music) we only store **//good enough approximation//​**
 +
 +  * 1 byte <=> 8 bits
 +    * ''​REAL*4''​ <=> 4 bytes <=> 32 bits
 +    * For easier written/​displayed representation,​ 1 byte is usually split into 2 groups of 4 bits, and displayed using base 16 and [[https://​en.wikipedia.org/​wiki/​Hexadecimal|hexadecimal representation]] (characters ''​0'',​ ''​1'',​ ..., ''​A'',​ ''​B'',​ ..., ''​F''​)
 +      * ''​0000''​ <=> ''​0'',​\\ ''​0010''​ <=> ''​1'',​ ...,\\ ''​1111''​ <=> ''​F''​
 +      * ''​1101''​ <=> ''​D''​ in hexadecimal <=> ''​13''​ in decimal (''​**1** * 8 + **1** * 4 + **0** * 2 + **1** * 1''​)
 +      * ''​11111101''​ in //base 2// <=> ''​1111 1101''​ <=> ''​FD''​ in //​hexadecimal//​ <=> ''​253''​ (''​15 * 16 + 13''​) in //decimal//
 +
 +  * Base conversion with Python
 +    * <​code>>>>​ hex(13) # Decimal to Hexadecimal conversion
 +'​0xd'​
 +>>>​ hex(253)
 +'​0xfd'​
 +>>>​ hex(256)
 +'​0x100'​
 +>>>​ int('​0x100',​ 16) # Hexadecimal to Decimal conversion
 +256
 +>>>​ int('​1111',​ 2) # Binary to Decimal conversion
 +15
 +>>>​ int('​11111101',​ 2) # '​11111101'​ <=> '1111 1101' <=> '​FD'​ <=> 15 * 16 + 13 = 253
 +253
 +>>>​ 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0
 +11
 +>>>​ int('​13',​ 8) # 1*8 + 3
 +11</​code>​
 +
 +  * More technical topics
 +    * [[https://​en.wikipedia.org/​wiki/​Bit_numbering|Bit numbering]]:​ the art of ordering bits, everything about MSB (Most Significant Byte) and LSB (Least Significant Byte)
 +    * [[https://​en.wikipedia.org/​wiki/​Endianness|Endianness]]:​ the art of ordering bytes
 +==== Numerical values ====
 +
 +  * Binary data representation of some numbers (only some common types are listed here):
 +    * Languages and packages **references** used below:
 +      * Python: [[https://​numpy.org/​doc/​stable/​reference/​arrays.scalars.html#​sized-aliases|NumPy Sized aliases]]
 +      * NetCDF: [[https://​docs.unidata.ucar.edu/​nug/​current/​md_types.html|Data Types]], [[https://​docs.unidata.ucar.edu/​netcdf-fortran/​current/​f90-variables.html#​f90-language-types-corresponding-to-netcdf-external-data-types|Fortran related Data Types]], [[https://​docs.unidata.ucar.edu/​nug/​current/​_c_d_l.html#​cdl_data_types|CDL Data Types]]
 +      * Fortran: Intel Fortran Compiler [[https://​www.intel.com/​content/​www/​us/​en/​docs/​fortran-compiler/​developer-guide-reference/​2023-1/​intrinsic-data-types.html|Intrinsic Data Types]]
 +    * [[https://​en.wikipedia.org/​wiki/​Integer_(computer_science)|Integers]]
 +      * Range:
 +        * 4-byte //signed// integers: ''​−2,​147,​483,​648''​ to ''​2,​147,​483,​647''​
 +          * Python: ''​numpy.int32''​
 +          * NetCDF: ''​int'',​ ''​NC_INT''​ or ''​NC_LONG'',​ ''​NF90_INT''​
 +          * Fortran: ''​INTEGER*4''​
 +        * 8-byte //signed// integers: ''​−9,​223,​372,​036,​854,​775,​808''​ to ''​9,​223,​372,​036,​854,​775,​807''​
 +          * Python: ''​numpy.int64''​
 +          * NetCDF: ''​int64'',​ ''​NC_INT64''​
 +          * Fortran: ''​INTEGER*8''​
 +      * Tech note: signed integers use [[https://​en.wikipedia.org/​wiki/​Two%27s_complement|two'​s complement]] for coding negative integers
 +    * [[https://​en.wikipedia.org/​wiki/​IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//​)
 +      * Range:
 +        * 4-byte float: ''​~8 significant digits * 10E±38''​
 +          * Python: ''​numpy.float32''​
 +          * NetCDF: ''​float'',​ ''​NC-FLOAT'',​ ''​NF90_FLOAT''​
 +          * Fortran:''​REAL*4''​
 +          * See also [[https://​en.wikipedia.org/​wiki/​Single-precision_floating-point_format|Single-precision floating-point format]]
 +        * 8-byte float: ''​~15 significant digits * 10E±308''​
 +          * Python: ''​numpy.float64''​
 +          * NetCDF: ''​double'',​ ''​NC_DOUBLE'',​ ''​NF90_DOUBLE''​
 +          * Fortran: ''​REAL*8''​
 +      * **Special values**:
 +        * [[https://​en.wikipedia.org/​wiki/​NaN|NaN]]:​ //Not a Number//
 +          * Python: ''​numpy.nan''​
 +        * Infinity
 +          * Python: ''​-numpy.inf''​ and ''​numpy.inf''​
 +        * Note: it is cleaner to use masks (and [[https://​numpy.org/​doc/​stable/​reference/​maskedarray.generic.html|Numpy masked arrays]]) rather than ''​NaN''​s,​ when you have to deal with missing values !
 +      * <wrap hi>The RISKS of working with (the wrong) floats</​wrap>:​
 +        * [[https://​en.wikipedia.org/​wiki/​Round-off_error|Round-off error]]
 +        * [[https://​en.wikipedia.org/​wiki/​Catastrophic_cancellation|Catastrophic cancellation]]
 +          * [[https://​docs.oracle.com/​cd/​E19957-01/​806-3568/​ncg_goldberg.html|What Every Computer Scientist Should Know About Floating-Point Arithmetic]]
 +    * A rather technical example: we //play// with a numpy 4-byte integer scalar
 +      * <​code>>>>​ one_int32 = np.int32(1)
 +>>>​ one_int32
 +1
 +>>>​ type(one_int32)
 +<class '​numpy.int32'>​
 +>>>​ one_int32.dtype
 +dtype('​int32'​)
 +>>>​ one_int32.shape # A numpy SCALAR, is an ARRAY WITH NO SHAPE !
 +()
 +>>>​ one_int32[0]
 +Traceback (most recent call last):
 +  File "<​stdin>",​ line 1, in <​module>​
 +IndexError: invalid index to scalar variable.
 +>>>​ one_int32[()] # Note how to access the single element, when there is NO SHAPE
 +1
 +>>>​ one_int32.ndim # NO SHAPE means no dimensions, but there is ONE element
 +0
 +>>>​ one_int32.size
 +1
 +>>>​ one_int32.nbytes # The element requires 4 bytes of storage
 +4
 +>>>​ hex(one_int32) # We can print the hexadecimal representation for INTEGERS scalars and arrays
 +'​0x1'​
 +>>>​ hex(one_int32 * 15)
 +'​0xf'​
 +>>>​ hex(one_int32 * 16)
 +'​0x10'​
 +
 +# '​Serialize'​ the data (i.e. change the data to a series of bytes)
 +# Note: the serialized data seems to be printed in the reverse order of '​hex(one_int32)'​
 +>>>​ one_int32_serialized = one_int32.tobytes()
 +>>>​ type(one_int32_serialized)
 +<class '​bytes'>​
 +>>>​ len(one_int32_serialized)
 +4
 +>>>​ one_int32_serialized ​
 +b'​\x01\x00\x00\x00'​
 +>>>​ one_int32_serialized.hex('​ ') # Another way to print the hexadecimal values
 +'01 00 00 00'
 +
 +# Use the following in the unlikely case where you need to change the endianness (bytes ordering)
 +>>>​ one_int32_reversed_endian = one_int32.byteswap()
 +>>>​ one_int32_reversed_endian # Same bytes in a different order represent a different number (of course)
 +16777216
 +>>>​ hex(one_int32_reversed_endian) # Compare to the output of hex(one_int32) above
 +'​0x1000000'​
 +>>>​ one_int32_reversed_endian.tobytes()
 +b'​\x00\x00\x00\x01'</​code>​
 +    * Another technical example: we use an array of 2 integers\\ When using ''​byteswap()'',​ notice how bytes are swapped by groups of 4 bytes, because int32 use 4 bytes
 +      * <​code>>>>​ array_example = np.asarray((3,​ 17), dtype=np.int32)
 +>>>​ array_example
 +array([ 3, 17], dtype=int32)
 +>>>​ array_example.shape,​ array_example.ndim,​ array_example.size,​ array_example.nbytes
 +((2,), 1, 2, 8)
 +>>>​ array_example.tobytes().hex('​ ', 4)
 +'​03000000 11000000'​
 +>>>​ array_example.byteswap().tobytes().hex('​ ', 4)
 +'​00000003 00000011'​
 +</​code>​
 +
 +  * Manipulating binary data with [[https://​docs.python.org/​3/​library/​stdtypes.html#​binary-sequence-types-bytes-bytearray-memoryview|bytes,​ bytearray, memoryview]]
 +
 +  * Array addressing
 +    * [[https://​www.geeksforgeeks.org/​calculation-of-address-of-element-of-1-d-2-d-and-3-d-using-row-major-and-column-major-order/​|Calculation of address of element of 1-D, 2-D, and 3-D using row-major and column-major order]]
 +      * In other words: //using indices to go from 1-D to n-Dimnensions data// ​
 +    * The [[https://​en.wikipedia.org/​wiki/​Array_(data_structure)|array]] structure
 +    * python/C vs Fortran...
 +
 +  * disk and ram usage: how to check the usage (available ram and disk), best practice on multi-user systems (how much allowed?)
 +    * ''​du'',​ ''​df'',​ ''​cat /​proc/​meminfo'',​ ''​top''​
 +
 +  * understanding and reverse-engineering //binary// format
 +    * ''​od'',​ ''​strings''​
 +
 +  * binary vs text format: ascii, utf, raw
 +    * text related functions in python: ''​str'',​ ''​int'',​ ''​float'',​ ''​ord'',​ ...
 +      * lists conversion with ''​map''​ and ''​join''​
 +
 +  * Misc : ''​md5sum''​
 +
 +==== Strings ====
 +
 +  * Encoding, [[https://​en.wikipedia.org/​wiki/​ASCII|ASCII]],​ [[https://​en.wikipedia.org/​wiki/​Unicode|unicode]],​ [[https://​en.wikipedia.org/​wiki/​UTF-8|UTF-8]],​ ...
 +
 +  * Getting the binary representation of a string
 +    * <​code>>>>​ test_string = 'A B 0 1 à µ'
 +>>>​ type(test_string)
 +<class '​str'>​
 +>>>​ len(test_string)
 +11
 +>>>​ test_string_bin = test_string.encode('​utf-8'​)
 +>>>​ test_string_bin
 +b'A B 0 1 \xc3\xa0 \xc2\xb5'​
 +>>>​ type(test_string_bin)
 +<class '​bytes'>​
 +>>>​ len(test_string_bin)
 +13
 +>>>​ test_string_bin.hex('​-'​)
 +'​41-20-42-20-30-20-31-20-c3-a0-20-c2-b5'​
 +</​code>​
 +
  
 /* /*
-==== Tip template ====+===== Tip template ​=====
  
 <​code>​Some code</​code>​ <​code>​Some code</​code>​
other/python/misc_by_jyp.1629892804.txt.gz · Last modified: 2021/08/25 12:00 by jypeter