This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
other:python:misc_by_jyp [2022/02/21 14:47] jypeter [numpy related stuff] Added np.unique example |
other:python:misc_by_jyp [2022/12/12 13:50] jypeter Improved by changing the sections' levels |
||
---|---|---|---|
Line 5: | Line 5: | ||
</WRAP> | </WRAP> | ||
- | ==== Reading/setting environments variables ==== | ||
+ | ===== Reading/setting environments variables ===== | ||
<code>>>> os.environ['TMPDIR'] | <code>>>> os.environ['TMPDIR'] | ||
Line 17: | Line 17: | ||
</code> | </code> | ||
- | ==== Generating (aka raising) an error ==== | + | |
+ | ===== Generating (aka raising) an error ===== | ||
This will stop the script, unless it is called in a function, and the code calling the function explicitely catches and deals with errors | This will stop the script, unless it is called in a function, and the code calling the function explicitely catches and deals with errors | ||
Line 25: | Line 26: | ||
- | ==== Stopping a script ==== | + | ===== Stopping a script ===== |
A user can use ''CTRL-C'' or ''kill'' to stop a script, or ''CTRL-Z'' to suspend it temporarily (use ''fg'' to resume a suspended script). The code below can be used by the script itself to interrupt its execution, instead of raising an error | A user can use ''CTRL-C'' or ''kill'' to stop a script, or ''CTRL-Z'' to suspend it temporarily (use ''fg'' to resume a suspended script). The code below can be used by the script itself to interrupt its execution, instead of raising an error | ||
Line 32: | Line 33: | ||
- | ==== Checking if a file/directory is writable by the current user ==== | + | ===== Checking if a file/directory is writable by the current user ===== |
<code>>>> os.access('/', os.W_OK) | <code>>>> os.access('/', os.W_OK) | ||
Line 38: | Line 39: | ||
>>> os.access('/home/jypmce/.bashrc', os.W_OK) | >>> os.access('/home/jypmce/.bashrc', os.W_OK) | ||
True</code> | True</code> | ||
+ | |||
+ | |||
+ | ===== Playing with strings ===== | ||
+ | |||
+ | ==== Filenames, etc... ==== | ||
+ | |||
+ | Check [[other:python:misc_by_jyp#working_with_paths_and_filenames|Working with paths and filenames]] and [[other:python:misc_by_jyp#generating_file_names|Generating file names]] | ||
+ | |||
+ | ==== Splitting strings ==== | ||
+ | |||
+ | It's easy to split a string with multiple blank delimiters, or a specific delimiter, but it can be harder to deal with sub-strings | ||
+ | |||
+ | <code>>>> str_with_blanks = 'one two\t3\t\tFOUR' | ||
+ | >>> str_with_blanks.split() | ||
+ | ['one', 'two', '3', 'FOUR'] | ||
+ | |||
+ | >>> str_with_simple_delimiters = '1,2,3.14, 4' | ||
+ | >>> str_with_simple_delimiters.split(',') | ||
+ | ['1', '2', '3.14', ' 4'] | ||
+ | |||
+ | >>> complex_string='-o 1 --long "A string with accented chars: é è à ç"' | ||
+ | >>> complex_string.split() | ||
+ | ['-o', '1', '--long', '"A', 'string', 'with', 'accented', 'chars:', '\xc3\xa9', '\xc3\xa8', '\xc3\xa0', '\xc3\xa7"'] | ||
+ | |||
+ | >>> import shlex | ||
+ | >>> shlex.split(complex_string) | ||
+ | ['-o', '1', '--long', 'A string with accented chars: \xc3\xa9 \xc3\xa8 \xc3\xa0 \xc3\xa7']</code> | ||
+ | |||
==== Working with paths and filenames ==== | ==== Working with paths and filenames ==== | ||
Line 124: | Line 153: | ||
>>> f_tmp.close() | >>> f_tmp.close() | ||
>>> os.remove(f_tmp.name)</code> | >>> os.remove(f_tmp.name)</code> | ||
- | ==== Using command-line arguments ==== | ||
- | === The extremely easy but non-flexible way: sys.argv === | + | |
+ | ===== Using command-line arguments ===== | ||
+ | |||
+ | ==== The extremely easy but non-flexible way: sys.argv ==== | ||
The name of a script, the number of arguments (including the name of the script), and the arguments (as strings) can be accessed through the ''sys.argv'' strings' list | The name of a script, the number of arguments (including the name of the script), and the arguments (as strings) can be accessed through the ''sys.argv'' strings' list | ||
Line 148: | Line 179: | ||
2 tas_tes.nc</code> | 2 tas_tes.nc</code> | ||
- | === The C-style way: getopt === | + | |
+ | ==== The C-style way: getopt ==== | ||
Use [[https://docs.python.org/3/library/getopt.html|getopt]] (//C-style parser for command line options//) | Use [[https://docs.python.org/3/library/getopt.html|getopt]] (//C-style parser for command line options//) | ||
- | === The deprecated Python way: optparse === | + | |
+ | ==== The deprecated Python way: optparse ==== | ||
[[https://docs.python.org/3/library/optparse.html|optparse]] (//parser for command line options//) is **deprecated since Python version 3.2**! You should now use argparse (check [[https://docs.python.org/3/library/argparse.html#upgrading-optparse-code|Upgrading optparse code]] for converting from ''optparse'' to ''argparse'') | [[https://docs.python.org/3/library/optparse.html|optparse]] (//parser for command line options//) is **deprecated since Python version 3.2**! You should now use argparse (check [[https://docs.python.org/3/library/argparse.html#upgrading-optparse-code|Upgrading optparse code]] for converting from ''optparse'' to ''argparse'') | ||
- | === The current Python way: argparse === | + | |
+ | ==== The current Python way: argparse ==== | ||
[[https://docs.python.org/3/library/argparse.html|argparse]] (//parser for command-line options, arguments and sub-commands//) is available since Python version 3.2 | [[https://docs.python.org/3/library/argparse.html|argparse]] (//parser for command-line options, arguments and sub-commands//) is available since Python version 3.2 | ||
- | ==== Using ordered dictionaries ==== | + | |
+ | ===== Using ordered dictionaries ===== | ||
**Dictionary order is guaranteed to be insertion order**! Note that the [[https://docs.python.org/3/library/stdtypes.html#dict|usual Python dictionary]] also guarantees the order since version **3.6** | **Dictionary order is guaranteed to be insertion order**! Note that the [[https://docs.python.org/3/library/stdtypes.html#dict|usual Python dictionary]] also guarantees the order since version **3.6** | ||
Line 166: | Line 201: | ||
Check the [[https://docs.python.org/3/library/collections.html#collections.OrderedDict|OrderedDict class]] (''from collections import OrderedDict'') and the [[https://realpython.com/python-ordereddict/|OrderedDict vs dict in Python: The Right Tool for the Job]] tutorial | Check the [[https://docs.python.org/3/library/collections.html#collections.OrderedDict|OrderedDict class]] (''from collections import OrderedDict'') and the [[https://realpython.com/python-ordereddict/|OrderedDict vs dict in Python: The Right Tool for the Job]] tutorial | ||
- | ==== Using sets ==== | + | |
+ | ===== Using sets ===== | ||
[[https://docs.python.org/3/tutorial/datastructures.html#sets|Python sets]] are **groups of unique elements**. They can be used to easily find all the unique elements of //something// and you can easily determine the **intersection**, **union** (and other similar operations) of sets. | [[https://docs.python.org/3/tutorial/datastructures.html#sets|Python sets]] are **groups of unique elements**. They can be used to easily find all the unique elements of //something// and you can easily determine the **intersection**, **union** (and other similar operations) of sets. | ||
- | ==== Printing a readable version of long lists or dictionaries ==== | + | |
+ | ===== Printing a readable version of long lists or dictionaries ===== | ||
The [[https://docs.python.org/3/library/pprint.html|pprint]] module can be used for //pretty printing// objects (lists, dictionaries, ...). It will wrap long lines in a meaningful way | The [[https://docs.python.org/3/library/pprint.html|pprint]] module can be used for //pretty printing// objects (lists, dictionaries, ...). It will wrap long lines in a meaningful way | ||
Line 204: | Line 241: | ||
</code> | </code> | ||
- | ==== Sorting ==== | + | |
+ | ===== Storing objects and data in a file (shelve and friends) ===== | ||
+ | |||
+ | The built-in [[other:python:jyp_steps#the_shelve_package|shelve]] module can be **easily** used for storing temporary/intermediate data | ||
+ | |||
+ | More options: | ||
+ | * Some [[other:python:jyp_steps#data_file_formats|non-NetCDF]] file formats | ||
+ | * Working with [[other:python:jyp_steps#netcdf_filesusing_cdms2_xarray_and_netcdf4|NetCDF]] files | ||
+ | |||
+ | |||
+ | ===== Using a configuration file ===== | ||
+ | |||
+ | The built-in [[https://docs.python.org/3/library/configparser.html|configparser]] module can be easily used for reading (**and** writing!) text configuration files. | ||
+ | |||
+ | Note: a configuration file is also a way to easily store and exchange text data ! | ||
+ | |||
+ | ===== Sorting ===== | ||
* When dealing with **numerical values**, you should use the [[https://numpy.org/doc/stable/reference/routines.sort.html|numpy sorting, searching, and counting routines]]! | * When dealing with **numerical values**, you should use the [[https://numpy.org/doc/stable/reference/routines.sort.html|numpy sorting, searching, and counting routines]]! | ||
Line 221: | Line 274: | ||
['c', 'd', 'b', 'a']</code> | ['c', 'd', 'b', 'a']</code> | ||
- | ==== numpy related stuff ==== | + | ===== numpy related stuff ===== |
- | === Finding and counting unique values === | + | ==== Using a numpy array to store arbitrary objects ==== |
+ | |||
+ | The numpy arrays are usually used to store [[https://numpy.org/doc/stable/reference/arrays.scalars.html|scalars]] of the same type (see also the [[https://numpy.org/doc/stable/reference/arrays.dtypes.html|Data type objects (dtype)]]), very often numerical values. | ||
+ | |||
+ | It is also possible to store **arbitrary** Python objects in an array, rather than using nested lists or dictionaries! | ||
+ | |||
+ | <code>>>> some_array = np.empty((2, 3), dtype=object) | ||
+ | >>> some_array | ||
+ | array([[None, None, None], | ||
+ | [None, None, None]], dtype=object) | ||
+ | >>> some_array.shape | ||
+ | (2, 3) | ||
+ | >>> print(some_array[-1, -1]) | ||
+ | None | ||
+ | >>> some_array[-1, 0] = filled_contour # e.g. save an existing cartopy filled contour object | ||
+ | >>> some_array | ||
+ | array([[None, None, None], | ||
+ | [<cartopy.mpl.contour.GeoContourSet object at 0x2ab679e8bf10>, | ||
+ | None, None]], dtype=object)</code> | ||
+ | |||
+ | |||
+ | ==== Dealing with a variable number of indices ==== | ||
+ | |||
+ | [[https://numpy.org/doc/stable/user/basics.indexing.html#dealing-with-variable-indices|Official reference]] | ||
+ | |||
+ | <code>>>> i10 = np.identity(10) | ||
+ | >>> i10 | ||
+ | array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], | ||
+ | [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], | ||
+ | ... | ||
+ | [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]) | ||
+ | >>> i10.shape | ||
+ | (10, 10) | ||
+ | |||
+ | >>> i10[3:7, 4:6] | ||
+ | array([[0., 0.], | ||
+ | [1., 0.], | ||
+ | [0., 1.], | ||
+ | [0., 0.]]) | ||
+ | |||
+ | >>> s0 = slice(3, 7) | ||
+ | >>> s1 = slice(4, 6) | ||
+ | >>> i10[s0, s1] | ||
+ | array([[0., 0.], | ||
+ | [1., 0.], | ||
+ | [0., 1.], | ||
+ | [0., 0.]]) | ||
+ | |||
+ | >>> my_slices = (s0, s1) | ||
+ | >>> i10[my_slices] | ||
+ | array([[0., 0.], | ||
+ | [1., 0.], | ||
+ | [0., 1.], | ||
+ | [0., 0.]]) | ||
+ | |||
+ | >>> my_fancy_slices = (s0, Ellipsis) | ||
+ | >>> i10[my_fancy_slices] | ||
+ | array([[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.], | ||
+ | [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.], | ||
+ | [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], | ||
+ | [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.]]) | ||
+ | >>> i10[my_fancy_slices].shape | ||
+ | (4, 10) | ||
+ | |||
+ | >>> # WARNING! DANGERRRR! NEVER forget that a VIEW is NOT A COPY | ||
+ | >>> # and that you can change the content of the original array by mistake | ||
+ | >>> my_view = i10[my_slices] | ||
+ | >>> my_view[:, :] = -1 | ||
+ | >>> my_view | ||
+ | array([[-1., -1.], | ||
+ | [-1., -1.], | ||
+ | [-1., -1.], | ||
+ | [-1., -1.]]) | ||
+ | >>> i10 | ||
+ | array([[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], | ||
+ | [ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], | ||
+ | [ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.], | ||
+ | [ 0., 0., 0., 1., -1., -1., 0., 0., 0., 0.], | ||
+ | [ 0., 0., 0., 0., -1., -1., 0., 0., 0., 0.], | ||
+ | [ 0., 0., 0., 0., -1., -1., 0., 0., 0., 0.], | ||
+ | [ 0., 0., 0., 0., -1., -1., 1., 0., 0., 0.], | ||
+ | [ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.], | ||
+ | [ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.], | ||
+ | [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])</code> | ||
+ | |||
+ | |||
+ | ==== Finding and counting unique values ==== | ||
Use ''np.unique'', do **not** try to use histogram related functions! | Use ''np.unique'', do **not** try to use histogram related functions! | ||
Line 230: | Line 369: | ||
>>> vals | >>> vals | ||
array([1. , 2. , 1. , 2. , 2. , 1.5, 1. , 1.5, 2. , 1.5]) | array([1. , 2. , 1. , 2. , 2. , 1.5, 1. , 1.5, 2. , 1.5]) | ||
+ | |||
>>> np.unique(vals) | >>> np.unique(vals) | ||
array([1. , 1.5, 2. ]) | array([1. , 1.5, 2. ]) | ||
- | >>> np.unique(vals, return_counts=True) | + | >>> unique_vals, nb_unique = np.unique(vals, return_counts=True) |
- | (array([1. , 1.5, 2. ]), array([3, 3, 4])) | + | >>> unique_vals |
- | >>> np.sort(vals) # Sorted copy, in order to check the result | + | array([1. , 1.5, 2. ]) |
+ | >>> nb_unique | ||
+ | array([3, 3, 4]) | ||
+ | |||
+ | >>> sorted_vals = np.sort(vals) # Sorted copy, in order to check the result | ||
+ | >>> sorted_vals | ||
array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])</code> | array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])</code> | ||
+ | |||
+ | |||
+ | ==== Applying a ufunc over all the elements of an array ==== | ||
+ | |||
+ | There are all sorts of //ufuncs// (Universal Functions), and we will just use below ''add'' from the [[https://numpy.org/doc/stable/reference/ufuncs.html#math-operations|math operations]], applied on the arrays defined in [[#finding_and_counting_unique_values|Finding and counting unique values]] | ||
+ | |||
+ | <code># Get the sum of all the elements of 'vals' | ||
+ | >>> np.add.reduce(vals) | ||
+ | 15.5 | ||
+ | >>> np.add.reduce(sorted_vals) | ||
+ | 15.5 | ||
+ | >>> vals.sum() # The usual and easy way to do it | ||
+ | 15.5 | ||
+ | |||
+ | # Compute the sum of the elements of 'nb_unique' | ||
+ | # AND keep (accumulate) the intermediate results | ||
+ | >>> nb_unique | ||
+ | array([3, 3, 4]) | ||
+ | >>> np.add.accumulate(nb_unique) | ||
+ | array([ 3, 6, 10]) | ||
+ | |||
+ | # The accumulated values can be used as indices to separate the different groups of sorted values! | ||
+ | >>> sorted_vals | ||
+ | array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ]) | ||
+ | >>> sorted_vals[0:3] | ||
+ | array([1., 1., 1.]) | ||
+ | >>> sorted_vals[3:6] | ||
+ | array([1.5, 1.5, 1.5]) | ||
+ | >>> sorted_vals[6:10] | ||
+ | array([2., 2., 2., 2.]) | ||
+ | |||
+ | # Compute the sum of each equal-value group | ||
+ | >>> sorted_vals[0:3].sum(), sorted_vals[3:6].sum(), sorted_vals[6:10].sum() | ||
+ | (3.0, 4.5, 8.0)</code> | ||
+ | |||
+ | |||
+ | ==== Applying a ufunc over specified sections of an array ==== | ||
+ | |||
+ | The [[https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html#numpy.ufunc.reduceat|reduceat]] function can be used to avoid explicit python loops, and improve the speed (but not the readability...) of a script. The example below //improves// what has been shown above | ||
+ | |||
+ | <code># Define a list with the boundaries of the intervals we want to apply the 'add' function to | ||
+ | # We need to add the beginning index (0), AND remove the last index | ||
+ | # (reduceat will automatically go to the end of the input array | ||
+ | >>> nb_unique | ||
+ | array([3, 3, 4]) | ||
+ | >>> slices_indices = [0] + list(np.add.accumulate(nb_unique)) | ||
+ | >>> slices_indices.pop() # Remove last element | ||
+ | 10 | ||
+ | >>> slices_indices | ||
+ | [0, 3, 6] | ||
+ | |||
+ | # Compute the sums over the selected intervals with just one call | ||
+ | >>> np.add.reduceat(np.sort(vals), slices_indices) | ||
+ | array([3. , 4.5, 8. ])</code> | ||
/* | /* | ||
- | ==== Tip template ==== | + | ===== Tip template ===== |
<code>Some code</code> | <code>Some code</code> |