User Tools

Site Tools


other:python:misc_by_jyp

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
other:python:misc_by_jyp [2022/03/08 16:40]
jypeter Added a string section
other:python:misc_by_jyp [2023/12/08 15:51] (current)
jypeter [Efficient looping with numpy, map and itertools] Added list comprehension
Line 5: Line 5:
 </​WRAP>​ </​WRAP>​
  
-==== Reading/​setting environments variables ==== 
  
 +===== Reading/​setting environments variables =====
  
 <​code>>>>​ os.environ['​TMPDIR'​] <​code>>>>​ os.environ['​TMPDIR'​]
Line 17: Line 17:
 </​code>​ </​code>​
  
-==== Generating (aka raising) an error ====+ 
 +===== Generating (aka raising) an error =====
  
 This will stop the script, unless it is called in a function, and the code calling the function explicitely catches and deals with errors This will stop the script, unless it is called in a function, and the code calling the function explicitely catches and deals with errors
Line 25: Line 26:
  
  
-==== Stopping a script ====+===== Stopping a script ​=====
  
 A user can use ''​CTRL-C''​ or ''​kill''​ to stop a script, or ''​CTRL-Z''​ to suspend it temporarily (use ''​fg''​ to resume a suspended script). The code below can be used by the script itself to interrupt its execution, instead of raising an error A user can use ''​CTRL-C''​ or ''​kill''​ to stop a script, or ''​CTRL-Z''​ to suspend it temporarily (use ''​fg''​ to resume a suspended script). The code below can be used by the script itself to interrupt its execution, instead of raising an error
  
 <​code>​sys.exit('​Some optional message about why we are stopping'​)</​code>​ <​code>​sys.exit('​Some optional message about why we are stopping'​)</​code>​
- +===== Checking if a file/​directory is writable by the current user =====
- +
-==== Checking if a file/​directory is writable by the current user ====+
  
 <​code>>>>​ os.access('/',​ os.W_OK) <​code>>>>​ os.access('/',​ os.W_OK)
Line 39: Line 38:
 True</​code>​ True</​code>​
  
-==== Playing with strings ==== 
  
-=== Filenames, etc ===+===== Playing with strings =====
  
-Check [[other:​python:​misc_by_jyp#​working_with_paths_and_filenames|Working with paths and filenames]] and [[other:​python:​misc_by_jyp#​generating_file_names|Generating file names]] 
  
-=== Splitting strings ===+==== Splitting ​(complex) ​strings ​====
  
 It's easy to split a string with multiple blank delimiters, or a specific delimiter, but it can be harder to deal with sub-strings It's easy to split a string with multiple blank delimiters, or a specific delimiter, but it can be harder to deal with sub-strings
Line 64: Line 61:
 >>>​ shlex.split(complex_string) >>>​ shlex.split(complex_string)
 ['​-o',​ '​1',​ '​--long',​ 'A string with accented chars: \xc3\xa9 \xc3\xa8 \xc3\xa0 \xc3\xa7'​]</​code>​ ['​-o',​ '​1',​ '​--long',​ 'A string with accented chars: \xc3\xa9 \xc3\xa8 \xc3\xa0 \xc3\xa7'​]</​code>​
 +
 +
 ==== Working with paths and filenames ==== ==== Working with paths and filenames ====
  
-If you are in a hurry, you can just use string functions to work with path and file names. ​But you will need some specific functions to check if a file exists, and similar operations. ​All these are available in 2 libraries that have similar functions. Both of these libraries ​can deal with Unix-type paths on Linux computers, and Windows-type paths on Windows computers+If you are in a hurry, you can just use string functions to work with paths and file names. 
 + 
 + 
 +You will need some specific ​objects and functions to check if a file exists, and similar operations. ​Check the libraries ​listed below, ​that can automatically ​deal with Unix-type paths on Linux and MacOS computers, and Windows-type paths on Windows computers
  
-  * [[https://​docs.python.org/​3/​library/​os.path.html|os.path]] //Common ​pathname manipulations//​+  * [[https://​docs.python.org/​3/​library/​os.path.html|os.path]]//common ​pathname manipulations//​
     * Available since... a long time! Use this if you want to avoid backward compatibility problems     * Available since... a long time! Use this if you want to avoid backward compatibility problems
     * Some functions are directly in [[https://​docs.python.org/​3/​library/​os.html|os]] //​Miscellaneous operating system interfaces//​\\ e.g. [[https://​docs.python.org/​3/​library/​os.html#​os.remove|os.remove]] and [[https://​docs.python.org/​3/​library/​os.html#​os.rmdir|os.rmdir]]     * Some functions are directly in [[https://​docs.python.org/​3/​library/​os.html|os]] //​Miscellaneous operating system interfaces//​\\ e.g. [[https://​docs.python.org/​3/​library/​os.html#​os.remove|os.remove]] and [[https://​docs.python.org/​3/​library/​os.html#​os.rmdir|os.rmdir]]
-  * [[https://​docs.python.org/​3/​library/​pathlib.html|pathlib]] //Object-oriented filesystem paths//+  * [[https://​docs.python.org/​3/​library/​pathlib.html|pathlib]]: a **more recent** ​//object-oriented// way to deal with //filesystem paths//
     * Available since Python version 3.4     * Available since Python version 3.4
     * [[https://​docs.python.org/​3/​library/​pathlib.html#​correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]]     * [[https://​docs.python.org/​3/​library/​pathlib.html#​correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]]
-  * [[https://​docs.python.org/​3/​library/​shutil.html|High-level file operations]]+  * [[https://​docs.python.org/​3/​library/​shutil.html|shutil]]: ​High-level file operations, e.g copy/move a file or directory tree
  
  
-=== Example: getting the full path of the Python used ===+=== Example: getting the full path of the Python ​executable ​used ===
  
 Note: the actual python may be different from the default python! Note: the actual python may be different from the default python!
Line 84: Line 86:
 /​usr/​bin/​python /​usr/​bin/​python
  
-$ /modfs/modtools/miniconda3//envs/analyse_3.6_test/bin/python+$ /home/share/unix_files/cdat/​miniconda3_21-02/envs/cdatm_py3/bin/python
 >>>​ import sys, shutil >>>​ import sys, shutil
 >>>​ shutil.which('​python'​) >>>​ shutil.which('​python'​)
 '/​usr/​bin/​python'​ '/​usr/​bin/​python'​
 >>>​ sys.executable >>>​ sys.executable
-'/modfs/modtools/miniconda3//envs/analyse_3.6_test/​bin/​python'</​code>​+'/home/share/unix_files/cdat/​miniconda3_21-02/envs/cdatm_py3/​bin/​python'</​code>​
  
  
Line 105: Line 107:
 </​code>​ </​code>​
  
 +
 +=== Example: system independent paths with pathlib ===
 +
 +Note: the following example was generated on a Linux server and uses a <wrap em>/</​wrap>​ character as a path separator
 +
 +<​code>>>>​ my_home = Path.home()
 +>>>​ my_home
 +PosixPath('/​home/​users/​my_login'​)
 +>>>​ my_conf = my_home / '​.config'​ / '​evince'​
 +>>>​ my_conf
 +PosixPath('/​home/​users/​my_login/​.config/​evince'​)
 +>>>​ my_conf.is_dir()
 +True
 +>>>​ my_conf.is_file()
 +False
 +>>>​ list(my_conf.glob('​*'​))
 +[PosixPath('/​home/​users/​my_login/​.config/​evince/​evince_toolbar.xml'​),​ PosixPath('​ /​home/​users/​my_login/​.config/​evince/​accels'​)]
 +>>>​ [ ff.name for ff in my_conf.glob('​*'​) ]
 +['​evince_toolbar.xml',​ '​accels'​]
 +</​code>​
  
 === Example: getting the size(s) of all the files in a directory === === Example: getting the size(s) of all the files in a directory ===
Line 149: Line 171:
 >>>​ f_tmp.close() >>>​ f_tmp.close()
 >>>​ os.remove(f_tmp.name)</​code>​ >>>​ os.remove(f_tmp.name)</​code>​
-==== Using command-line arguments ==== 
  
-=== The extremely easy but non-flexible way: sys.argv ===+ 
 +===== Using command-line arguments ===== 
 + 
 +==== The extremely easy but non-flexible way: sys.argv ​====
  
 The name of a script, the number of arguments (including the name of the script), and the arguments (as strings) can be accessed through the ''​sys.argv''​ strings'​ list The name of a script, the number of arguments (including the name of the script), and the arguments (as strings) can be accessed through the ''​sys.argv''​ strings'​ list
Line 173: Line 197:
 2 tas_tes.nc</​code>​ 2 tas_tes.nc</​code>​
  
-=== The C-style way: getopt ===+ 
 +==== The C-style way: getopt ​====
  
 Use [[https://​docs.python.org/​3/​library/​getopt.html|getopt]] (//C-style parser for command line options//) Use [[https://​docs.python.org/​3/​library/​getopt.html|getopt]] (//C-style parser for command line options//)
  
-=== The deprecated Python way: optparse ===+ 
 +==== The deprecated Python way: optparse ​====
  
 [[https://​docs.python.org/​3/​library/​optparse.html|optparse]] (//parser for command line options//) is **deprecated since Python version 3.2**! You should now use argparse (check [[https://​docs.python.org/​3/​library/​argparse.html#​upgrading-optparse-code|Upgrading optparse code]] for converting from ''​optparse''​ to ''​argparse''​) [[https://​docs.python.org/​3/​library/​optparse.html|optparse]] (//parser for command line options//) is **deprecated since Python version 3.2**! You should now use argparse (check [[https://​docs.python.org/​3/​library/​argparse.html#​upgrading-optparse-code|Upgrading optparse code]] for converting from ''​optparse''​ to ''​argparse''​)
  
-=== The current Python way: argparse ===+ 
 +==== The current Python way: argparse ​====
  
 [[https://​docs.python.org/​3/​library/​argparse.html|argparse]] (//parser for command-line options, arguments and sub-commands//​) is available since Python version 3.2 [[https://​docs.python.org/​3/​library/​argparse.html|argparse]] (//parser for command-line options, arguments and sub-commands//​) is available since Python version 3.2
  
-==== Using ordered dictionaries ====+ 
 +===== Using ordered dictionaries ​=====
  
 **Dictionary order is guaranteed to be insertion order**! Note that the [[https://​docs.python.org/​3/​library/​stdtypes.html#​dict|usual Python dictionary]] also guarantees the order since version **3.6** **Dictionary order is guaranteed to be insertion order**! Note that the [[https://​docs.python.org/​3/​library/​stdtypes.html#​dict|usual Python dictionary]] also guarantees the order since version **3.6**
Line 191: Line 219:
 Check the [[https://​docs.python.org/​3/​library/​collections.html#​collections.OrderedDict|OrderedDict class]] (''​from collections import OrderedDict''​) and the [[https://​realpython.com/​python-ordereddict/​|OrderedDict vs dict in Python: The Right Tool for the Job]] tutorial Check the [[https://​docs.python.org/​3/​library/​collections.html#​collections.OrderedDict|OrderedDict class]] (''​from collections import OrderedDict''​) and the [[https://​realpython.com/​python-ordereddict/​|OrderedDict vs dict in Python: The Right Tool for the Job]] tutorial
  
-==== Using sets ====+ 
 +===== Using sets =====
  
 [[https://​docs.python.org/​3/​tutorial/​datastructures.html#​sets|Python sets]] are **groups of unique elements**. They can be used to easily find all the unique elements of //​something//​ and you can easily determine the **intersection**,​ **union** (and other similar operations) of sets. [[https://​docs.python.org/​3/​tutorial/​datastructures.html#​sets|Python sets]] are **groups of unique elements**. They can be used to easily find all the unique elements of //​something//​ and you can easily determine the **intersection**,​ **union** (and other similar operations) of sets.
  
-==== Printing a readable version of long lists or dictionaries ====+ 
 +===== Printing a readable version of long lists or dictionaries ​=====
  
 The [[https://​docs.python.org/​3/​library/​pprint.html|pprint]] module can be used for //pretty printing// objects (lists, dictionaries,​ ...). It will wrap long lines in a meaningful way The [[https://​docs.python.org/​3/​library/​pprint.html|pprint]] module can be used for //pretty printing// objects (lists, dictionaries,​ ...). It will wrap long lines in a meaningful way
Line 229: Line 259:
 </​code>​ </​code>​
  
-==== Sorting ====+ 
 +===== Storing objects and data in a file (shelve and friends) ===== 
 + 
 +The built-in [[other:​python:​jyp_steps#​the_shelve_package|shelve]] module can be **easily** used for storing temporary/​intermediate data 
 + 
 +More options: 
 +  * Some [[other:​python:​jyp_steps#​data_file_formats|non-NetCDF]] file formats 
 +  * Working with [[other:​python:​jyp_steps#​netcdf_filesusing_cdms2_xarray_and_netcdf4|NetCDF]] files 
 + 
 + 
 +===== Using a configuration file ===== 
 + 
 +The built-in [[https://​docs.python.org/​3/​library/​configparser.html|configparser]] module can be easily used for reading (**and** writing!) text configuration files. 
 + 
 +Note: a configuration file is also a way to easily store and exchange text data ! 
 + 
 + 
 +===== Working with global variables ===== 
 + 
 +There is a good chance you don't actually want/need a //global// variable. Be sure to use the ''​global''​ statement correctly if you want to avoid side-effects... 
 + 
 +  * [[https://​docs.python.org/​3/​faq/​programming.html?​highlight=global#​why-am-i-getting-an-unboundlocalerror-when-the-variable-has-a-value|Using (and changing) a global variable inside a script or module]] 
 +    * Simple module example\\ <​code>​_myvar = 10 
 + 
 +def set_myvar(new_val):​ 
 +    # Note: need to explicitly define a global variable (of a module) 
 +    # as '​global'​ BEFORE changing its value in a function! 
 +    # Otherwise, the value will not be REdefined outside the function 
 +    global _myvar 
 +    _myvar = new_val 
 + 
 +def get_myvar():​ 
 +    return _myvar 
 + 
 +def myfunc(nb_repeat = 10): 
 +    print(nb_repeat * _myvar)</​code>​ 
 +  * [[https://​docs.python.org/​3/​faq/​programming.html?​highlight=global#​how-do-i-share-global-variables-across-modules|Sharing global variables across modules]] 
 +===== Sorting ​=====
  
   * When dealing with **numerical values**, you should use the [[https://​numpy.org/​doc/​stable/​reference/​routines.sort.html|numpy sorting, searching, and counting routines]]!   * When dealing with **numerical values**, you should use the [[https://​numpy.org/​doc/​stable/​reference/​routines.sort.html|numpy sorting, searching, and counting routines]]!
Line 246: Line 313:
 ['​c',​ '​d',​ '​b',​ '​a'​]</​code>​ ['​c',​ '​d',​ '​b',​ '​a'​]</​code>​
  
-==== numpy related stuff ==== 
  
-=== Finding and counting unique values ===+===== Efficient looping with numpy, map, itertools and list comprehension ===== 
 + 
 +<wrap hi>Big, nested, explicit ''​for''​ loops should be avoided at all cost</​wrap>,​ in order to reduce a script execution time! 
 + 
 +  * **''​numpy''​ arrays** should be used when dealing with //numerical data// 
 +    * **Masked arrays** can be used to deal with //special cases// and remove tests from loops 
 + 
 +  * The built-in [[https://​docs.python.org/​3/​library/​functions.html?​highlight=map#​map|map]] function (and similar functions like [[https://​docs.python.org/​3/​library/​functions.html?​highlight=zip#​zip|zip]],​ [[https://​docs.python.org/​3/​library/​functions.html?​highlight=filter#​filter|filter]],​ ...) can be used to efficiently apply a function (possibly a //simple// [[https://​docs.python.org/​3/​tutorial/​controlflow.html#​lambda-expressions|lambda]] function) to all the elements of a list 
 +    * <​code>>>>​ my_ints = [1, 2, 3] 
 + 
 +>>>​ map(str, my_ints) 
 +['​1',​ '​2',​ '​3'​] 
 + 
 +>>>​ map(lambda ii: str(10*ii + 5), my_ints) 
 +['​15',​ '​25',​ '​35'​]</​code>​ 
 + 
 +  * The [[https://​docs.python.org/​3/​library/​itertools.html|itertools]] module defines many more fancy iterators that can be used for efficient looping 
 +    * Example: replacing nested loops with [[https://​docs.python.org/​3/​library/​itertools.html#​itertools.product|product]] 
 +      * <​code>>>>​ it.product('​AB',​ '​01'​) 
 +<​itertools.product object at 0x2b35a7b5f100>​ 
 + 
 +>>>​ list(it.product('​AB',​ '​01'​)) 
 +[('​A',​ '​0'​),​ ('​A',​ '​1'​),​ ('​B',​ '​0'​),​ ('​B',​ '​1'​)] 
 + 
 +>>>​ for c1, c2 in it.product('​AB',​ '​01'​):​ 
 +...   ​print(c1 + c2) 
 +... 
 +A0 
 +A1 
 +B0 
 +B1 
 + 
 +>>>​ for c1, c2 in it.product(['​A',​ '​B'​],​ ['​0',​ '​1'​]):​ 
 +...   ​print(c1 + c2) 
 +... 
 +A0 
 +A1 
 +B0 
 +B1 
 + 
 +>>>​ for c1, c2, c3 in it.product('​AB',​ '​01',​ '​$!'​):​ 
 +...   ​print(c1 + c2 + c3, end=', ') 
 +... 
 +A0$, A0!, A1$, A1!, B0$, B0!, B1$, B1!,</​code>​ 
 + 
 +  * The [[https://​docs.python.org/​3/​tutorial/​datastructures.html?​highlight=comprehension#​list-comprehensions|list comprehension]] (aka //implicit loops//) can also be used to generate lists from lists 
 +    * Example: converting a list of integers to a list of strings\\ Note: in that case, you should rather use the ''​map''​ function detailed above 
 +      * <​code>>>>​ my_ints = [1, 2, 3] 
 + 
 +>>>​ [ str(ii) for ii in my_ints ] 
 +['​1',​ '​2',​ '​3'​]</​code>​ 
 +===== numpy related stuff ===== 
 + 
 +==== Using a numpy array to store arbitrary objects ==== 
 + 
 +The numpy arrays are usually used to store [[https://​numpy.org/​doc/​stable/​reference/​arrays.scalars.html|scalars]] of the same type (see also the [[https://​numpy.org/​doc/​stable/​reference/​arrays.dtypes.html|Data type objects (dtype)]]), very often numerical values. 
 + 
 +It is also possible to store **arbitrary** Python objects in an array, rather than using nested lists or dictionaries! 
 + 
 +<​code>>>>​ some_array = np.empty((2,​ 3), dtype=object) 
 +>>>​ some_array 
 +array([[None,​ None, None], 
 +       ​[None,​ None, None]], dtype=object) 
 +>>>​ some_array.shape 
 +(2, 3) 
 +>>>​ print(some_array[-1,​ -1]) 
 +None 
 +>>>​ some_array[-1,​ 0] = filled_contour # e.g. save an existing cartopy filled contour object 
 +>>>​ some_array 
 +array([[None,​ None, None], 
 +       ​[<​cartopy.mpl.contour.GeoContourSet object at 0x2ab679e8bf10>,​ 
 +        None, None]], dtype=object)</​code>​ 
 + 
 +         
 +==== Dealing with a variable number of indices ==== 
 + 
 +[[https://​numpy.org/​doc/​stable/​user/​basics.indexing.html#​dealing-with-variable-indices|Official reference]] 
 + 
 +<​code>>>>​ i10 = np.identity(10) 
 +>>>​ i10 
 +array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], 
 +       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], 
 +... 
 +       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]) 
 +>>>​ i10.shape 
 +(10, 10) 
 + 
 +>>>​ i10[3:7, 4:6] 
 +array([[0., 0.], 
 +       [1., 0.], 
 +       [0., 1.], 
 +       [0., 0.]]) 
 +        
 +>>>​ s0 = slice(3, 7) 
 +>>>​ s1 = slice(4, 6) 
 +>>>​ i10[s0, s1] 
 +array([[0., 0.], 
 +       [1., 0.], 
 +       [0., 1.], 
 +       [0., 0.]]) 
 +        
 +>>>​ my_slices = (s0, s1) 
 +>>>​ i10[my_slices] 
 +array([[0., 0.], 
 +       [1., 0.], 
 +       [0., 1.], 
 +       [0., 0.]]) 
 +        
 +>>>​ my_fancy_slices = (s0, Ellipsis) 
 +>>>​ i10[my_fancy_slices] 
 +array([[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.], 
 +       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.], 
 +       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], 
 +       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.]]) 
 +>>>​ i10[my_fancy_slices].shape 
 +(4, 10) 
 + 
 +>>>​ # WARNING! DANGERRRR! NEVER forget that a VIEW is NOT A COPY 
 +>>>​ # and that you can change the content of the original array by mistake 
 +>>>​ my_view = i10[my_slices] 
 +>>>​ my_view[:, :] = -1 
 +>>>​ my_view 
 +array([[-1.,​ -1.], 
 +       [-1., -1.], 
 +       [-1., -1.], 
 +       [-1., -1.]]) 
 +>>>​ i10 
 +array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], 
 +       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], 
 +       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], 
 +       [ 0.,  0.,  0.,  1., -1., -1.,  0.,  0.,  0.,  0.], 
 +       [ 0.,  0.,  0.,  0., -1., -1.,  0.,  0.,  0.,  0.], 
 +       [ 0.,  0.,  0.,  0., -1., -1.,  0.,  0.,  0.,  0.], 
 +       [ 0.,  0.,  0.,  0., -1., -1.,  1.,  0.,  0.,  0.], 
 +       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.], 
 +       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.], 
 +       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])</​code>​ 
 + 
 + 
 +==== Finding and counting unique values ​====
  
 Use ''​np.unique'',​ do **not** try to use histogram related functions! Use ''​np.unique'',​ do **not** try to use histogram related functions!
Line 268: Line 473:
 array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])</​code>​ array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])</​code>​
  
-=== Applying a ufunc over all the elements of an array ===+ 
 +==== Applying a ufunc over all the elements of an array ====
  
 There are all sorts of //ufuncs// (Universal Functions), and we will just use below ''​add''​ from the [[https://​numpy.org/​doc/​stable/​reference/​ufuncs.html#​math-operations|math operations]],​ applied on the arrays defined in [[#​finding_and_counting_unique_values|Finding and counting unique values]] There are all sorts of //ufuncs// (Universal Functions), and we will just use below ''​add''​ from the [[https://​numpy.org/​doc/​stable/​reference/​ufuncs.html#​math-operations|math operations]],​ applied on the arrays defined in [[#​finding_and_counting_unique_values|Finding and counting unique values]]
Line 301: Line 507:
 (3.0, 4.5, 8.0)</​code>​ (3.0, 4.5, 8.0)</​code>​
  
-=== Applying a ufunc over specified sections of an array ===+ 
 +==== Applying a ufunc over specified sections of an array ====
  
 The [[https://​numpy.org/​doc/​stable/​reference/​generated/​numpy.ufunc.reduceat.html#​numpy.ufunc.reduceat|reduceat]] function can be used to avoid explicit python loops, and improve the speed (but not the readability...) of a script. The example below //​improves//​ what has been shown above The [[https://​numpy.org/​doc/​stable/​reference/​generated/​numpy.ufunc.reduceat.html#​numpy.ufunc.reduceat|reduceat]] function can be used to avoid explicit python loops, and improve the speed (but not the readability...) of a script. The example below //​improves//​ what has been shown above
Line 319: Line 526:
 >>>​ np.add.reduceat(np.sort(vals),​ slices_indices) >>>​ np.add.reduceat(np.sort(vals),​ slices_indices)
 array([3. , 4.5, 8. ])</​code>​ array([3. , 4.5, 8. ])</​code>​
 +
 +==== Exercise your brain with numpy ====
 +
 +Have a look at [[https://​github.com/​rougier/​numpy-100/​blob/​master/​100_Numpy_exercises.ipynb|100 numpy exercises]]
 +
 +===== matplotlib related stuff =====
 +
 +==== Working with time axes (and ticks) ====
 +
 +If you have problems setting the limits of a time axis, choosing the ticks' locations, or specifying the style of the labels, you should check the:
 +  * [[https://​matplotlib.org/​stable/​gallery/​index.html#​ticks|Ticks examples'​ gallery]]
 +  * [[https://​matplotlib.org/​stable/​gallery/​text_labels_and_annotations/​date.html|Date tick labels example]]
 +
 +
 +===== Data representation =====
 +
 +A few notes for a future section or page about about //data representation//​ (bits and bytes) on disk and in memory, vs //data format//
 +
 +FIXME Add parts (pages 28 to 37) of this [[https://​wiki.lsce.ipsl.fr/​pmip3/​doku.php/​other:​python:​jyp_steps#​part_2|old tutorial]] to this section
 +
 +==== Base notions ====
 +
 +  * **Never forget** that all the bits and pieces of information we use are coded in [[https://​en.wikipedia.org/​wiki/​Binary_number#​Counting_in_binary|base 2]] (''​0''​s and ''​1''​s ...), grouped in bytes!
 +    * Some things can be stored exactly (integers, characters, ...)
 +    * In other cases (**//real// numbers** that we work with all the time, compressed images/​videos/​music) we only store **//good enough approximation//​**
 +
 +  * 1 byte <=> 8 bits
 +    * ''​REAL*4''​ <=> 4 bytes <=> 32 bits
 +    * For easier written/​displayed representation,​ 1 byte is usually split into 2 groups of 4 bits, and displayed using base 16 and [[https://​en.wikipedia.org/​wiki/​Hexadecimal|hexadecimal representation]] (characters ''​0'',​ ''​1'',​ ..., ''​A'',​ ''​B'',​ ..., ''​F''​)
 +      * ''​0000''​ <=> ''​0'',​\\ ''​0010''​ <=> ''​1'',​ ...,\\ ''​1111''​ <=> ''​F''​
 +      * ''​1101''​ <=> ''​D''​ in hexadecimal <=> ''​13''​ in decimal (''​**1** * 8 + **1** * 4 + **0** * 2 + **1** * 1''​)
 +      * ''​11111101''​ in //base 2// <=> ''​1111 1101''​ <=> ''​FD''​ in //​hexadecimal//​ <=> ''​253''​ (''​15 * 16 + 13''​) in //decimal//
 +
 +  * Base conversion with Python
 +    * <​code>>>>​ hex(13) # Decimal to Hexadecimal conversion
 +'​0xd'​
 +>>>​ hex(253)
 +'​0xfd'​
 +>>>​ hex(256)
 +'​0x100'​
 +>>>​ int('​0x100',​ 16) # Hexadecimal to Decimal conversion
 +256
 +>>>​ int('​1111',​ 2) # Binary to Decimal conversion
 +15
 +>>>​ int('​11111101',​ 2) # '​11111101'​ <=> '1111 1101' <=> '​FD'​ <=> 15 * 16 + 13 = 253
 +253
 +>>>​ 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0
 +11
 +>>>​ int('​13',​ 8) # 1*8 + 3
 +11</​code>​
 +
 +  * More technical topics
 +    * [[https://​en.wikipedia.org/​wiki/​Bit_numbering|Bit numbering]]:​ the art of ordering bits, everything about MSB (Most Significant Byte) and LSB (Least Significant Byte)
 +    * [[https://​en.wikipedia.org/​wiki/​Endianness|Endianness]]:​ the art of ordering bytes
 +==== Numerical values ====
 +
 +  * Binary data representation of some numbers (only some common types are listed here):
 +    * Languages and packages **references** used below:
 +      * Python: [[https://​numpy.org/​doc/​stable/​reference/​arrays.scalars.html#​sized-aliases|NumPy Sized aliases]]
 +      * NetCDF: [[https://​docs.unidata.ucar.edu/​nug/​current/​md_types.html|Data Types]], [[https://​docs.unidata.ucar.edu/​netcdf-fortran/​current/​f90-variables.html#​f90-language-types-corresponding-to-netcdf-external-data-types|Fortran related Data Types]], [[https://​docs.unidata.ucar.edu/​nug/​current/​_c_d_l.html#​cdl_data_types|CDL Data Types]]
 +      * Fortran: Intel Fortran Compiler [[https://​www.intel.com/​content/​www/​us/​en/​docs/​fortran-compiler/​developer-guide-reference/​2023-1/​intrinsic-data-types.html|Intrinsic Data Types]]
 +    * [[https://​en.wikipedia.org/​wiki/​Integer_(computer_science)|Integers]]
 +      * Range:
 +        * 4-byte //signed// integers: ''​−2,​147,​483,​648''​ to ''​2,​147,​483,​647''​
 +          * Python: ''​numpy.int32''​
 +          * NetCDF: ''​int'',​ ''​NC_INT''​ or ''​NC_LONG'',​ ''​NF90_INT''​
 +          * Fortran: ''​INTEGER*4''​
 +        * 8-byte //signed// integers: ''​−9,​223,​372,​036,​854,​775,​808''​ to ''​9,​223,​372,​036,​854,​775,​807''​
 +          * Python: ''​numpy.int64''​
 +          * NetCDF: ''​int64'',​ ''​NC_INT64''​
 +          * Fortran: ''​INTEGER*8''​
 +      * Tech note: signed integers use [[https://​en.wikipedia.org/​wiki/​Two%27s_complement|two'​s complement]] for coding negative integers
 +    * [[https://​en.wikipedia.org/​wiki/​IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//​)
 +      * Range:
 +        * 4-byte float: ''​~8 significant digits * 10E±38''​
 +          * Python: ''​numpy.float32''​
 +          * NetCDF: ''​float'',​ ''​NC-FLOAT'',​ ''​NF90_FLOAT''​
 +          * Fortran:''​REAL*4''​
 +          * See also [[https://​en.wikipedia.org/​wiki/​Single-precision_floating-point_format|Single-precision floating-point format]]
 +        * 8-byte float: ''​~15 significant digits * 10E±308''​
 +          * Python: ''​numpy.float64''​
 +          * NetCDF: ''​double'',​ ''​NC_DOUBLE'',​ ''​NF90_DOUBLE''​
 +          * Fortran: ''​REAL*8''​
 +      * **Special values**:
 +        * [[https://​en.wikipedia.org/​wiki/​NaN|NaN]]:​ //Not a Number//
 +          * Python: ''​numpy.nan''​
 +        * Infinity
 +          * Python: ''​-numpy.inf''​ and ''​numpy.inf''​
 +        * Note: it is cleaner to use masks (and [[https://​numpy.org/​doc/​stable/​reference/​maskedarray.generic.html|Numpy masked arrays]]) rather than ''​NaN''​s,​ when you have to deal with missing values !
 +      * <wrap hi>The RISKS of working with (the wrong) floats</​wrap>:​
 +        * [[https://​en.wikipedia.org/​wiki/​Round-off_error|Round-off error]]
 +        * [[https://​en.wikipedia.org/​wiki/​Catastrophic_cancellation|Catastrophic cancellation]]
 +          * [[https://​docs.oracle.com/​cd/​E19957-01/​806-3568/​ncg_goldberg.html|What Every Computer Scientist Should Know About Floating-Point Arithmetic]]
 +    * A rather technical example: we //play// with a numpy 4-byte integer scalar
 +      * <​code>>>>​ one_int32 = np.int32(1)
 +>>>​ one_int32
 +1
 +>>>​ type(one_int32)
 +<class '​numpy.int32'>​
 +>>>​ one_int32.dtype
 +dtype('​int32'​)
 +>>>​ one_int32.shape # A numpy SCALAR, is an ARRAY WITH NO SHAPE !
 +()
 +>>>​ one_int32[0]
 +Traceback (most recent call last):
 +  File "<​stdin>",​ line 1, in <​module>​
 +IndexError: invalid index to scalar variable.
 +>>>​ one_int32[()] # Note how to access the single element, when there is NO SHAPE
 +1
 +>>>​ one_int32.ndim # NO SHAPE means no dimensions, but there is ONE element
 +0
 +>>>​ one_int32.size
 +1
 +>>>​ one_int32.nbytes # The element requires 4 bytes of storage
 +4
 +>>>​ hex(one_int32) # We can print the hexadecimal representation for INTEGERS scalars and arrays
 +'​0x1'​
 +>>>​ hex(one_int32 * 15)
 +'​0xf'​
 +>>>​ hex(one_int32 * 16)
 +'​0x10'​
 +
 +# '​Serialize'​ the data (i.e. change the data to a series of bytes)
 +# Note: the serialized data seems to be printed in the reverse order of '​hex(one_int32)'​
 +>>>​ one_int32_serialized = one_int32.tobytes()
 +>>>​ type(one_int32_serialized)
 +<class '​bytes'>​
 +>>>​ len(one_int32_serialized)
 +4
 +>>>​ one_int32_serialized ​
 +b'​\x01\x00\x00\x00'​
 +>>>​ one_int32_serialized.hex('​ ') # Another way to print the hexadecimal values
 +'01 00 00 00'
 +
 +# Use the following in the unlikely case where you need to change the endianness (bytes ordering)
 +>>>​ one_int32_reversed_endian = one_int32.byteswap()
 +>>>​ one_int32_reversed_endian # Same bytes in a different order represent a different number (of course)
 +16777216
 +>>>​ hex(one_int32_reversed_endian) # Compare to the output of hex(one_int32) above
 +'​0x1000000'​
 +>>>​ one_int32_reversed_endian.tobytes()
 +b'​\x00\x00\x00\x01'</​code>​
 +    * Another technical example: we use an array of 2 integers\\ When using ''​byteswap()'',​ notice how bytes are swapped by groups of 4 bytes, because int32 use 4 bytes
 +      * <​code>>>>​ array_example = np.asarray((3,​ 17), dtype=np.int32)
 +>>>​ array_example
 +array([ 3, 17], dtype=int32)
 +>>>​ array_example.shape,​ array_example.ndim,​ array_example.size,​ array_example.nbytes
 +((2,), 1, 2, 8)
 +>>>​ array_example.tobytes().hex('​ ', 4)
 +'​03000000 11000000'​
 +>>>​ array_example.byteswap().tobytes().hex('​ ', 4)
 +'​00000003 00000011'​
 +</​code>​
 +
 +  * Manipulating binary data with [[https://​docs.python.org/​3/​library/​stdtypes.html#​binary-sequence-types-bytes-bytearray-memoryview|bytes,​ bytearray, memoryview]]
 +
 +  * Array addressing
 +    * [[https://​www.geeksforgeeks.org/​calculation-of-address-of-element-of-1-d-2-d-and-3-d-using-row-major-and-column-major-order/​|Calculation of address of element of 1-D, 2-D, and 3-D using row-major and column-major order]]
 +      * In other words: //using indices to go from 1-D to n-Dimnensions data// ​
 +    * The [[https://​en.wikipedia.org/​wiki/​Array_(data_structure)|array]] structure
 +    * python/C vs Fortran...
 +
 +  * disk and ram usage: how to check the usage (available ram and disk), best practice on multi-user systems (how much allowed?)
 +    * ''​du'',​ ''​df'',​ ''​cat /​proc/​meminfo'',​ ''​top''​
 +
 +  * understanding and reverse-engineering //binary// format
 +    * ''​od'',​ ''​strings''​
 +
 +  * binary vs text format: ascii, utf, raw
 +    * text related functions in python: ''​str'',​ ''​int'',​ ''​float'',​ ''​ord'',​ ...
 +      * lists conversion with ''​map''​ and ''​join''​
 +
 +  * Misc : ''​md5sum''​
 +
 +==== Strings ====
 +
 +  * Encoding, [[https://​en.wikipedia.org/​wiki/​ASCII|ASCII]],​ [[https://​en.wikipedia.org/​wiki/​Unicode|unicode]],​ [[https://​en.wikipedia.org/​wiki/​UTF-8|UTF-8]],​ ...
 +
 +  * Getting the binary representation of a string
 +    * <​code>>>>​ test_string = 'A B 0 1 à µ'
 +>>>​ type(test_string)
 +<class '​str'>​
 +>>>​ len(test_string)
 +11
 +>>>​ test_string_bin = test_string.encode('​utf-8'​)
 +>>>​ test_string_bin
 +b'A B 0 1 \xc3\xa0 \xc2\xb5'​
 +>>>​ type(test_string_bin)
 +<class '​bytes'>​
 +>>>​ len(test_string_bin)
 +13
 +>>>​ test_string_bin.hex('​-'​)
 +'​41-20-42-20-30-20-31-20-c3-a0-20-c2-b5'​
 +</​code>​
 +
  
 /* /*
-==== Tip template ====+===== Tip template ​=====
  
 <​code>​Some code</​code>​ <​code>​Some code</​code>​
other/python/misc_by_jyp.1646757610.txt.gz · Last modified: 2022/03/08 16:40 by jypeter