Differences

This shows you the differences between two versions of the page.

--- other:python:misc_by_jyp [2022/02/21 12:47]
jypeter [Sorting] Added link to numpy routines
+++ other:python:misc_by_jyp [2023/04/26 15:50]
jypeter Started a Data represenation section
@@ Line 5: / Line 5: @@
 </WRAP>
-==== Reading/setting environments variables ====
+===== Reading/setting environments variables =====
 <code>>>> os.environ['TMPDIR']
@@ Line 17: / Line 17: @@
 </code>
-==== Generating (aka raising) an error ====
+===== Generating (aka raising) an error =====
 This will stop the script, unless it is called in a function, and the code calling the function explicitely catches and deals with errors
@@ Line 25: / Line 26: @@
-==== Stopping a script ====
+===== Stopping a script =====
 A user can use ''CTRL-C'' or ''kill'' to stop a script, or ''CTRL-Z'' to suspend it temporarily (use ''fg'' to resume a suspended script). The code below can be used by the script itself to interrupt its execution, instead of raising an error
@@ Line 31: / Line 32: @@
 <code>sys.exit('Some optional message about why we are stopping')</code>
+===== Data representation =====
-==== Checking if a file/directory is writable by the current user ====
+A few notes for a future section or page about about data representation (bits and bytes) on disk and in memory, vs data format
+  * Binary data representation
+    * [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]]
+    * [[https://en.wikipedia.org/wiki/Endianness|Endianness]]
+    * [[https://en.wikipedia.org/wiki/Integer_(computer_science)|Integers]]
+      * Using [[https://en.wikipedia.org/wiki/Two%27s_complement|two's complement]] for negative integers
+      * Range:
+        * 4-byte integers: −2,147,483,648 to 2,147,483,647
+        * 8-byte integers: −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
+    * [[https://en.wikipedia.org/wiki/IEEE_754|Floating point numbers]] (//IEEE 754// standard)
+      * Range:
+        * 4-byte float: ~8 significant digits ^10E±38
+          * See also [[https://en.wikipedia.org/wiki/Single-precision_floating-point_format|Single-precision floating-point format|Single-precision floating-point format]]
+        * 8-byte float: ~15 significant digits ^10E±308
+  * Array addressing
+  * disk and ram usage: how to check the usage (available ram and disk), best practice on multi-user systems (how much allowed?)
+    * ''du'', ''df'', ''cat /proc/meminfo'', ''top''
+  * understanding and reverse-engineering //binary// format
+    * ''od'', ''strings''
+  * binary vs text format: ascii, utf, raw
+    * text related functions in python: ''str'', ''int'', ''float'', ''ord'', ...
+      * lists conversion with ''map'' and ''join''
+  * Misc : ''md5sum''
+===== Checking if a file/directory is writable by the current user =====
 <code>>>> os.access('/', os.W_OK)
@@ Line 38: / Line 69: @@
 >>> os.access('/home/jypmce/.bashrc', os.W_OK)
 True</code>
+===== Playing with strings =====
+==== Splitting (complex) strings ====
+It's easy to split a string with multiple blank delimiters, or a specific delimiter, but it can be harder to deal with sub-strings
+<code>>>> str_with_blanks = 'one    two\t3\t\tFOUR'
+>>> str_with_blanks.split()
+['one', 'two', '3', 'FOUR']
+>>> str_with_simple_delimiters = '1,2,3.14,  4'
+>>> str_with_simple_delimiters.split(',')
+['1', '2', '3.14', '  4']
+>>> complex_string='-o 1 --long "A string with accented chars: é è à ç"'
+>>> complex_string.split()
+['-o', '1', '--long', '"A', 'string', 'with', 'accented', 'chars:', '\xc3\xa9', '\xc3\xa8', '\xc3\xa0', '\xc3\xa7"']
+>>> import shlex
+>>> shlex.split(complex_string)
+['-o', '1', '--long', 'A string with accented chars: \xc3\xa9 \xc3\xa8 \xc3\xa0 \xc3\xa7']</code>
 ==== Working with paths and filenames ====
@@ Line 124: / Line 180: @@
 >>> f_tmp.close()
 >>> os.remove(f_tmp.name)</code>
-==== Using command-line arguments ====
-=== The extremely easy but non-flexible way: sys.argv ===
+===== Using command-line arguments =====
+==== The extremely easy but non-flexible way: sys.argv ====
 The name of a script, the number of arguments (including the name of the script), and the arguments (as strings) can be accessed through the ''sys.argv'' strings' list
@@ Line 148: / Line 206: @@
 tas_tes.nc</code>
-=== The C-style way: getopt ===
+==== The C-style way: getopt ====
 Use [[https://docs.python.org/3/library/getopt.html|getopt]] (//C-style parser for command line options//)
-=== The deprecated Python way: optparse ===
+==== The deprecated Python way: optparse ====
 [[https://docs.python.org/3/library/optparse.html|optparse]] (//parser for command line options//) is **deprecated since Python version 3.2**! You should now use argparse (check [[https://docs.python.org/3/library/argparse.html#upgrading-optparse-code|Upgrading optparse code]] for converting from ''optparse'' to ''argparse'')
-=== The current Python way: argparse ===
+==== The current Python way: argparse ====
 [[https://docs.python.org/3/library/argparse.html|argparse]] (//parser for command-line options, arguments and sub-commands//) is available since Python version 3.2
-==== Using ordered dictionaries ====
+===== Using ordered dictionaries =====
 **Dictionary order is guaranteed to be insertion order**! Note that the [[https://docs.python.org/3/library/stdtypes.html#dict|usual Python dictionary]] also guarantees the order since version **3.6**
@@ Line 166: / Line 228: @@
 Check the [[https://docs.python.org/3/library/collections.html#collections.OrderedDict|OrderedDict class]] (''from collections import OrderedDict'') and the [[https://realpython.com/python-ordereddict/|OrderedDict vs dict in Python: The Right Tool for the Job]] tutorial
-==== Using sets ====
+===== Using sets =====
 [[https://docs.python.org/3/tutorial/datastructures.html#sets|Python sets]] are **groups of unique elements**. They can be used to easily find all the unique elements of //something// and you can easily determine the **intersection**, **union** (and other similar operations) of sets.
-==== Printing a readable version of long lists or dictionaries ====
+===== Printing a readable version of long lists or dictionaries =====
 The [[https://docs.python.org/3/library/pprint.html|pprint]] module can be used for //pretty printing// objects (lists, dictionaries, ...). It will wrap long lines in a meaningful way
@@ Line 204: / Line 268: @@
 </code>
-==== Sorting ====
+===== Storing objects and data in a file (shelve and friends) =====
+The built-in [[other:python:jyp_steps#the_shelve_package|shelve]] module can be **easily** used for storing temporary/intermediate data
+More options:
+  * Some [[other:python:jyp_steps#data_file_formats|non-NetCDF]] file formats
+  * Working with [[other:python:jyp_steps#netcdf_filesusing_cdms2_xarray_and_netcdf4|NetCDF]] files
+===== Using a configuration file =====
+The built-in [[https://docs.python.org/3/library/configparser.html|configparser]] module can be easily used for reading (**and** writing!) text configuration files.
+Note: a configuration file is also a way to easily store and exchange text data !
+===== Working with global variables =====
+There is a good chance you don't actually want/need a //global// variable. Be sure to use the ''global'' statement correctly if you want to avoid side-effects...
+  * [[https://docs.python.org/3/faq/programming.html?highlight=global#why-am-i-getting-an-unboundlocalerror-when-the-variable-has-a-value|Using (and changing) a global variable inside a script or module]]
+    * Simple module example\\ <code>_myvar = 10
+def set_myvar(new_val):
+    # Note: need to explicitly define a global variable (of a module)
+    # as 'global' BEFORE changing its value in a function!
+    # Otherwise, the value will not be REdefined outside the function
+    global _myvar
+    _myvar = new_val
+def get_myvar():
+    return _myvar
+def myfunc(nb_repeat = 10):
+    print(nb_repeat * _myvar)</code>
+  * [[https://docs.python.org/3/faq/programming.html?highlight=global#how-do-i-share-global-variables-across-modules|Sharing global variables across modules]]
+===== Sorting =====
   * When dealing with **numerical values**, you should use the [[https://numpy.org/doc/stable/reference/routines.sort.html|numpy sorting, searching, and counting routines]]!
@@ Line 220: / Line 321: @@
 >>> sorted(demo_dic.keys(), key=lambda key_name:demo_dic[key_name])
 ['c', 'd', 'b', 'a']</code>
+===== numpy related stuff =====
+==== Using a numpy array to store arbitrary objects ====
+The numpy arrays are usually used to store [[https://numpy.org/doc/stable/reference/arrays.scalars.html|scalars]] of the same type (see also the [[https://numpy.org/doc/stable/reference/arrays.dtypes.html|Data type objects (dtype)]]), very often numerical values.
+It is also possible to store **arbitrary** Python objects in an array, rather than using nested lists or dictionaries!
+<code>>>> some_array = np.empty((2, 3), dtype=object)
+>>> some_array
+array([[None, None, None],
+       [None, None, None]], dtype=object)
+>>> some_array.shape
+(2, 3)
+>>> print(some_array[-1, -1])
+None
+>>> some_array[-1, 0] = filled_contour # e.g. save an existing cartopy filled contour object
+>>> some_array
+array([[None, None, None],
+       [<cartopy.mpl.contour.GeoContourSet object at 0x2ab679e8bf10>,
+        None, None]], dtype=object)</code>
+==== Dealing with a variable number of indices ====
+[[https://numpy.org/doc/stable/user/basics.indexing.html#dealing-with-variable-indices|Official reference]]
+<code>>>> i10 = np.identity(10)
+>>> i10
+array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
+       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
+...
+       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
+>>> i10.shape
+(10, 10)
+>>> i10[3:7, 4:6]
+array([[0., 0.],
+       [1., 0.],
+       [0., 1.],
+       [0., 0.]])
+>>> s0 = slice(3, 7)
+>>> s1 = slice(4, 6)
+>>> i10[s0, s1]
+array([[0., 0.],
+       [1., 0.],
+       [0., 1.],
+       [0., 0.]])
+>>> my_slices = (s0, s1)
+>>> i10[my_slices]
+array([[0., 0.],
+       [1., 0.],
+       [0., 1.],
+       [0., 0.]])
+>>> my_fancy_slices = (s0, Ellipsis)
+>>> i10[my_fancy_slices]
+array([[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
+       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
+       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
+       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.]])
+>>> i10[my_fancy_slices].shape
+(4, 10)
+>>> # WARNING! DANGERRRR! NEVER forget that a VIEW is NOT A COPY
+>>> # and that you can change the content of the original array by mistake
+>>> my_view = i10[my_slices]
+>>> my_view[:, :] = -1
+>>> my_view
+array([[-1., -1.],
+       [-1., -1.],
+       [-1., -1.],
+       [-1., -1.]])
+>>> i10
+array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
+       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
+       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
+       [ 0.,  0.,  0.,  1., -1., -1.,  0.,  0.,  0.,  0.],
+       [ 0.,  0.,  0.,  0., -1., -1.,  0.,  0.,  0.,  0.],
+       [ 0.,  0.,  0.,  0., -1., -1.,  0.,  0.,  0.,  0.],
+       [ 0.,  0.,  0.,  0., -1., -1.,  1.,  0.,  0.,  0.],
+       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
+       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
+       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])</code>
+==== Finding and counting unique values ====
+Use ''np.unique'', do **not** try to use histogram related functions!
+<code>>>> vals = np.random.randint(2, 5, (10,)) * 0.5 # Get 10 discreet float values
+>>> vals
+array([1. , 2. , 1. , 2. , 2. , 1.5, 1. , 1.5, 2. , 1.5])
+>>> np.unique(vals)
+array([1. , 1.5, 2. ])
+>>> unique_vals, nb_unique = np.unique(vals, return_counts=True)
+>>> unique_vals
+array([1. , 1.5, 2. ])
+>>> nb_unique
+array([3, 3, 4])
+>>> sorted_vals = np.sort(vals) # Sorted copy, in order to check the result
+>>> sorted_vals
+array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])</code>
+==== Applying a ufunc over all the elements of an array ====
+There are all sorts of //ufuncs// (Universal Functions), and we will just use below ''add'' from the [[https://numpy.org/doc/stable/reference/ufuncs.html#math-operations|math operations]], applied on the arrays defined in [[#finding_and_counting_unique_values|Finding and counting unique values]]
+<code># Get the sum of all the elements of 'vals'
+>>> np.add.reduce(vals)
+.5
+>>> np.add.reduce(sorted_vals)
+.5
+>>> vals.sum() # The usual and easy way to do it
+.5
+# Compute the sum of the elements of 'nb_unique'
+# AND keep (accumulate) the intermediate results
+>>> nb_unique
+array([3, 3, 4])
+>>> np.add.accumulate(nb_unique)
+array([ 3,  6, 10])
+# The accumulated values can be used as indices to separate the different groups of sorted values!
+>>> sorted_vals
+array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])
+>>> sorted_vals[0:3]
+array([1., 1., 1.])
+>>> sorted_vals[3:6]
+array([1.5, 1.5, 1.5])
+>>> sorted_vals[6:10]
+array([2., 2., 2., 2.])
+# Compute the sum of each equal-value group
+>>> sorted_vals[0:3].sum(), sorted_vals[3:6].sum(), sorted_vals[6:10].sum()
+(3.0, 4.5, 8.0)</code>
+==== Applying a ufunc over specified sections of an array ====
+The [[https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html#numpy.ufunc.reduceat|reduceat]] function can be used to avoid explicit python loops, and improve the speed (but not the readability...) of a script. The example below //improves// what has been shown above
+<code># Define a list with the boundaries of the intervals we want to apply the 'add' function to
+# We need to add the beginning index (0), AND remove the last index
+# (reduceat will automatically go to the end of the input array
+>>> nb_unique
+array([3, 3, 4])
+>>> slices_indices = [0] + list(np.add.accumulate(nb_unique))
+>>> slices_indices.pop() # Remove last element
+>>> slices_indices
+[0, 3, 6]
+# Compute the sums over the selected intervals with just one call
+>>> np.add.reduceat(np.sort(vals), slices_indices)
+array([3. , 4.5, 8. ])</code>
+===== matplotlib related stuff =====
+==== Working with time axes (and ticks) ====
+If you have problems setting the limits of a time axis, choosing the ticks' locations, or specifying the style of the labels, you should check the:
+  * [[https://matplotlib.org/stable/gallery/index.html#ticks|Ticks examples' gallery]]
+  * [[https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html|Date tick labels example]]
 /*
-==== Tip template ====
+===== Tip template =====
 <code>Some code</code>

PMIP3 wiki

User Tools

Site Tools

Differences

Page Tools