User Tools

Site Tools


other:python:misc_by_jyp

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
other:python:misc_by_jyp [2021/08/27 11:56]
jypeter Added file name generation section
other:python:misc_by_jyp [2022/07/08 14:00]
jypeter [numpy related stuff] Added the arbitrary object array
Line 39: Line 39:
 True</​code>​ True</​code>​
  
 +==== Playing with strings ====
 +
 +=== Filenames, etc... ===
 +
 +Check [[other:​python:​misc_by_jyp#​working_with_paths_and_filenames|Working with paths and filenames]] and [[other:​python:​misc_by_jyp#​generating_file_names|Generating file names]]
 +
 +=== Splitting strings ===
 +
 +It's easy to split a string with multiple blank delimiters, or a specific delimiter, but it can be harder to deal with sub-strings
 +
 +<​code>>>>​ str_with_blanks = '​one ​   two\t3\t\tFOUR'​
 +>>>​ str_with_blanks.split()
 +['​one',​ '​two',​ '​3',​ '​FOUR'​]
 +
 +>>>​ str_with_simple_delimiters = '​1,​2,​3.14, ​ 4'
 +>>>​ str_with_simple_delimiters.split(','​)
 +['​1',​ '​2',​ '​3.14',​ ' ​ 4']
 +
 +>>>​ complex_string='​-o 1 --long "A string with accented chars: é è à ç"'​
 +>>>​ complex_string.split()
 +['​-o',​ '​1',​ '​--long',​ '"​A',​ '​string',​ '​with',​ '​accented',​ '​chars:',​ '​\xc3\xa9',​ '​\xc3\xa8',​ '​\xc3\xa0',​ '​\xc3\xa7"'​]
 +
 +>>>​ import shlex
 +>>>​ shlex.split(complex_string)
 +['​-o',​ '​1',​ '​--long',​ 'A string with accented chars: \xc3\xa9 \xc3\xa8 \xc3\xa0 \xc3\xa7'​]</​code>​
 ==== Working with paths and filenames ==== ==== Working with paths and filenames ====
  
Line 165: Line 190:
  
 Check the [[https://​docs.python.org/​3/​library/​collections.html#​collections.OrderedDict|OrderedDict class]] (''​from collections import OrderedDict''​) and the [[https://​realpython.com/​python-ordereddict/​|OrderedDict vs dict in Python: The Right Tool for the Job]] tutorial Check the [[https://​docs.python.org/​3/​library/​collections.html#​collections.OrderedDict|OrderedDict class]] (''​from collections import OrderedDict''​) and the [[https://​realpython.com/​python-ordereddict/​|OrderedDict vs dict in Python: The Right Tool for the Job]] tutorial
 +
 +==== Using sets ====
 +
 +[[https://​docs.python.org/​3/​tutorial/​datastructures.html#​sets|Python sets]] are **groups of unique elements**. They can be used to easily find all the unique elements of //​something//​ and you can easily determine the **intersection**,​ **union** (and other similar operations) of sets.
  
 ==== Printing a readable version of long lists or dictionaries ==== ==== Printing a readable version of long lists or dictionaries ====
Line 170: Line 199:
 The [[https://​docs.python.org/​3/​library/​pprint.html|pprint]] module can be used for //pretty printing// objects (lists, dictionaries,​ ...). It will wrap long lines in a meaningful way The [[https://​docs.python.org/​3/​library/​pprint.html|pprint]] module can be used for //pretty printing// objects (lists, dictionaries,​ ...). It will wrap long lines in a meaningful way
  
-<​code>>>> ​from collections ​import ​OrderedDict+<​code>>>>​ import ​pprint
  
->>>​ test_dic = OrderedDict([('​AWI-ESM-1-1-LR_AWI'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})('​CESM2_NCAR'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})('​IPSL-CM6A-LR_IPSL'​{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ '​r1i1p1f2':​ {'​grid':​ '​gr'​},​ '​r1i1p1f3':​ {'​grid':​ '​gr'​},​ '​r1i1p1f4':​ {'​grid':​ '​gr'​}})])+>>>​ test_dic = {'​AWI-ESM-1-1-LR_AWI'​:{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ '​CESM2_NCAR'​:{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ '​IPSL-CM6A-LR_IPSL'​:{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ '​r1i1p1f2':​ {'​grid':​ '​gr'​},​ '​r1i1p1f3':​ {'​grid':​ '​gr'​},​ '​r1i1p1f4':​ {'​grid':​ '​gr'​}}}
  
 >>>​ print(test_dic) >>>​ print(test_dic)
-OrderedDict([('​AWI-ESM-1-1-LR_AWI'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})('​CESM2_NCAR'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})('​IPSL-CM6A-LR_IPSL'​{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ '​r1i1p1f2':​ {'​grid':​ '​gr'​},​ '​r1i1p1f3':​ {'​grid':​ '​gr'​},​ '​r1i1p1f4':​ {'​grid':​ '​gr'​}})])+{'​AWI-ESM-1-1-LR_AWI'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ '​CESM2_NCAR'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ '​IPSL-CM6A-LR_IPSL'​{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ '​r1i1p1f2':​ {'​grid':​ '​gr'​},​ '​r1i1p1f3':​ {'​grid':​ '​gr'​},​ '​r1i1p1f4':​ {'​grid':​ '​gr'​}}}
  
 >>>​ pprint.pprint(test_dic) >>>​ pprint.pprint(test_dic)
-OrderedDict([('​AWI-ESM-1-1-LR_AWI'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})+{'​AWI-ESM-1-1-LR_AWI'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ 
-             ('​CESM2_NCAR'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}})+ '​CESM2_NCAR'​{'​r1i1p1f1':​ {'​grid':​ '​gn'​}},​ 
-             ('​IPSL-CM6A-LR_IPSL'​+ '​IPSL-CM6A-LR_IPSL'​{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ 
-              ​{'​r1i1p1f1':​ {'​grid':​ '​gr'​},​ +                       ​'​r1i1p1f2':​ {'​grid':​ '​gr'​},​ 
-               ​'​r1i1p1f2':​ {'​grid':​ '​gr'​},​ +                       ​'​r1i1p1f3':​ {'​grid':​ '​gr'​},​ 
-               ​'​r1i1p1f3':​ {'​grid':​ '​gr'​},​ +                       ​'​r1i1p1f4':​ {'​grid':​ '​gr'​}}
-               ​'​r1i1p1f4':​ {'​grid':​ '​gr'​}})])+                        
 +>>>​ dir(test_dic) 
 +['​__class__',​ '​__contains__',​ '​__delattr__',​ [... lots of unreadable stuff removed...'​setdefault',​ '​update',​ '​values'​] 
 + 
 +>>>​ pprint.pprint(dir(test_dic)
 +['​__class__',​ 
 + '​__contains__',​ 
 + 
 +[... lots of lines removed in this example ] 
 + 
 + '​setdefault',​ 
 + '​update',​ 
 + '​values'​] 
 </​code>​ </​code>​
 +
 +==== Sorting ====
 +
 +  * When dealing with **numerical values**, you should use the [[https://​numpy.org/​doc/​stable/​reference/​routines.sort.html|numpy sorting, searching, and counting routines]]!
 +  * [[https://​docs.python.org/​3/​howto/​sorting.html|Sorting HOW TO]]
 +  * Example: sorting the keys and the values of a dictionary, and then using the ''​key''​ parameter to sort the keys of a dictionary according to the value associated with the key
 +    * If we provide a ''​key''​ function, the ''​sort''​ function will sort the elements by the values returned by the function, instead of sorting by the initial values. The function used for generating the key below is very simple and we can use a //lambda// (i.e //in place//) function
 +    * <​code>>>>​ demo_dic = {'​a':​10,​ '​b':​5,​ '​c':​-1,​ '​d':​0}
 +
 +>>>​ sorted(demo_dic.keys())
 +['​a',​ '​b',​ '​c',​ '​d'​]
 +
 +>>>​ sorted(demo_dic.values())
 +[-1, 0, 5, 10]
 +
 +>>>​ sorted(demo_dic.keys(),​ key=lambda key_name:​demo_dic[key_name])
 +['​c',​ '​d',​ '​b',​ '​a'​]</​code>​
 +
 +==== numpy related stuff ====
 +
 +=== Using a numpy array to store arbitrary objects ===
 +
 +The numpy arrays are usually used to store [[https://​numpy.org/​doc/​stable/​reference/​arrays.scalars.html|scalars]] of the same type (see also the [[https://​numpy.org/​doc/​stable/​reference/​arrays.dtypes.html|Data type objects (dtype)]]), very often numerical values.
 +
 +It is also possible to store **arbitrary** Python objects in an array, rather than using nested lists or dictionaries!
 +
 +<​code>>>>​ some_array = np.empty((2,​ 3), dtype=object)
 +>>>​ some_array
 +array([[None,​ None, None],
 +       ​[None,​ None, None]], dtype=object)
 +>>>​ some_array.shape
 +(2, 3)
 +>>>​ print(some_array[-1,​ -1])
 +None
 +>>>​ some_array[-1,​ 0] = filled_contour # e.g. save an existing cartopy filled contour object
 +>>>​ some_array
 +array([[None,​ None, None],
 +       ​[<​cartopy.mpl.contour.GeoContourSet object at 0x2ab679e8bf10>,​
 +        None, None]], dtype=object)</​code>​
 +        ​
 +=== Dealing with a variable number of indices ===
 +
 +[[https://​numpy.org/​doc/​stable/​user/​basics.indexing.html#​dealing-with-variable-indices|Official reference]]
 +
 +<​code>>>>​ i10 = np.identity(10)
 +>>>​ i10
 +array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
 +       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
 +...
 +       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
 +>>>​ i10.shape
 +(10, 10)
 +
 +>>>​ i10[3:7, 4:6]
 +array([[0., 0.],
 +       [1., 0.],
 +       [0., 1.],
 +       [0., 0.]])
 +       
 +>>>​ s0 = slice(3, 7)
 +>>>​ s1 = slice(4, 6)
 +>>>​ i10[s0, s1]
 +array([[0., 0.],
 +       [1., 0.],
 +       [0., 1.],
 +       [0., 0.]])
 +       
 +>>>​ my_slices = (s0, s1)
 +>>>​ i10[my_slices]
 +array([[0., 0.],
 +       [1., 0.],
 +       [0., 1.],
 +       [0., 0.]])
 +       
 +>>>​ my_fancy_slices = (s0, Ellipsis)
 +>>>​ i10[my_fancy_slices]
 +array([[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
 +       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
 +       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
 +       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.]])
 +>>>​ i10[my_fancy_slices].shape
 +(4, 10)
 +
 +>>>​ # WARNING! DANGERRRR! NEVER forget that a VIEW is NOT A COPY
 +>>>​ # and that you can change the content of the original array by mistake
 +>>>​ my_view = i10[my_slices]
 +>>>​ my_view[:, :] = -1
 +>>>​ my_view
 +array([[-1.,​ -1.],
 +       [-1., -1.],
 +       [-1., -1.],
 +       [-1., -1.]])
 +>>>​ i10
 +array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
 +       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
 +       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
 +       [ 0.,  0.,  0.,  1., -1., -1.,  0.,  0.,  0.,  0.],
 +       [ 0.,  0.,  0.,  0., -1., -1.,  0.,  0.,  0.,  0.],
 +       [ 0.,  0.,  0.,  0., -1., -1.,  0.,  0.,  0.,  0.],
 +       [ 0.,  0.,  0.,  0., -1., -1.,  1.,  0.,  0.,  0.],
 +       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
 +       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
 +       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])</​code>​
 +
 +=== Finding and counting unique values ===
 +
 +Use ''​np.unique'',​ do **not** try to use histogram related functions!
 +
 +<​code>>>>​ vals = np.random.randint(2,​ 5, (10,)) * 0.5 # Get 10 discreet float values
 +>>>​ vals
 +array([1. , 2. , 1. , 2. , 2. , 1.5, 1. , 1.5, 2. , 1.5])
 +
 +>>>​ np.unique(vals)
 +array([1. , 1.5, 2. ])
 +>>>​ unique_vals,​ nb_unique = np.unique(vals,​ return_counts=True)
 +>>>​ unique_vals
 +array([1. , 1.5, 2. ])
 +>>>​ nb_unique
 +array([3, 3, 4])
 +
 +>>>​ sorted_vals = np.sort(vals) # Sorted copy, in order to check the result
 +>>>​ sorted_vals
 +array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])</​code>​
 +
 +=== Applying a ufunc over all the elements of an array ===
 +
 +There are all sorts of //ufuncs// (Universal Functions), and we will just use below ''​add''​ from the [[https://​numpy.org/​doc/​stable/​reference/​ufuncs.html#​math-operations|math operations]],​ applied on the arrays defined in [[#​finding_and_counting_unique_values|Finding and counting unique values]]
 +
 +<​code>#​ Get the sum of all the elements of '​vals'​
 +>>>​ np.add.reduce(vals)
 +15.5
 +>>>​ np.add.reduce(sorted_vals)
 +15.5
 +>>>​ vals.sum() # The usual and easy way to do it
 +15.5
 +
 +# Compute the sum of the elements of '​nb_unique'​
 +# AND keep (accumulate) the intermediate results
 +>>>​ nb_unique
 +array([3, 3, 4])
 +>>>​ np.add.accumulate(nb_unique)
 +array([ 3,  6, 10])
 +
 +# The accumulated values can be used as indices to separate the different groups of sorted values!
 +>>>​ sorted_vals
 +array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])
 +>>>​ sorted_vals[0:​3]
 +array([1., 1., 1.])
 +>>>​ sorted_vals[3:​6]
 +array([1.5, 1.5, 1.5])
 +>>>​ sorted_vals[6:​10]
 +array([2., 2., 2., 2.])
 +
 +# Compute the sum of each equal-value group
 +>>>​ sorted_vals[0:​3].sum(),​ sorted_vals[3:​6].sum(),​ sorted_vals[6:​10].sum()
 +(3.0, 4.5, 8.0)</​code>​
 +
 +=== Applying a ufunc over specified sections of an array ===
 +
 +The [[https://​numpy.org/​doc/​stable/​reference/​generated/​numpy.ufunc.reduceat.html#​numpy.ufunc.reduceat|reduceat]] function can be used to avoid explicit python loops, and improve the speed (but not the readability...) of a script. The example below //​improves//​ what has been shown above
 +
 +<​code>#​ Define a list with the boundaries of the intervals we want to apply the '​add'​ function to
 +# We need to add the beginning index (0), AND remove the last index
 +# (reduceat will automatically go to the end of the input array
 +>>>​ nb_unique
 +array([3, 3, 4])
 +>>>​ slices_indices = [0] + list(np.add.accumulate(nb_unique))
 +>>>​ slices_indices.pop() # Remove last element
 +10
 +>>>​ slices_indices
 +[0, 3, 6]
 +
 +# Compute the sums over the selected intervals with just one call
 +>>>​ np.add.reduceat(np.sort(vals),​ slices_indices)
 +array([3. , 4.5, 8. ])</​code>​
  
 /* /*
other/python/misc_by_jyp.txt · Last modified: 2023/12/08 15:51 by jypeter