Differences

This shows you the differences between two versions of the page.

--- other:python:misc_by_jyp [2023/05/04 13:19] – [Base notions] jypeter
+++ other:python:misc_by_jyp [2025/08/01 14:52] (current) – [Extra tutorials] Added haversine jypeter
@@ Line 5: / Line 5: @@
 </WRAP>
+===== Extra tutorials =====
+Only **when you have already read all the content of this page several times**, and you are looking for new ideas
+  * [[https://medium.com/data-science/calculating-distance-between-two-geolocations-in-python-26ad3afe287b|Calculating distance between two geo-locations in Python]]:
+    * ''[[https://github.com/mapado/haversine|haversine]]'', ''[[https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.haversine_distances.html|haversine_distances]] @ scikit-learn'' and [[https://en.wikipedia.org/wiki/Haversine_formula|Haversine formula]]
+  * Looking at table data with ''pandas''
+    * [[https://blog.devgenius.io/data-profiling-in-python-common-ways-to-explore-your-data-part-1-0efd0dedff75|Summary information]]
+    * [[https://blog.devgenius.io/data-profiling-in-python-common-ways-to-explore-your-data-part-2-396384522e91|More detailed information]]
+    * [[https://blog.devgenius.io/data-cleansing-in-python-common-ways-to-clean-your-data-3459a256dd85|Table data cleaning]]
+  * [[https://medium.com/pythons-gurus/clean-code-in-python-good-vs-bad-practices-examples-2df344bddacc|Clean Code in Python: Good vs. Bad Practices Examples]]
+  * [[https://peps.python.org/pep-0008/|PEP 8 – Style Guide for Python Code]]
+    * [[https://realpython.com/python-pep8/|How to Write Beautiful Python Code With PEP 8]]
+    * [[https://www.datacamp.com/tutorial/pep8-tutorial-python-code|PEP-8 Tutorial: Code Standards in Python]]
+    * Some checkers/linters: [[https://docs.astral.sh/ruff/|ruff]], [[https://flake8.pycqa.org/en/stable/|flake8]]
+  * [[https://medium.com/@yaduvanshineelam09/ultimate-python-cheat-sheet-practical-python-for-everyday-tasks-8a33abc0892f|Ultimate Python Cheat Sheet: Practical Python For Everyday Tasks]]
+  * [[https://medium.com/pythoneers/16-hacks-that-will-take-your-python-skills-to-the-next-level-12e7a9b97421|16 Hacks That Will Take Your Python Skills to the Next Level]]
+  * [[https://levelup.gitconnected.com/modular-coding-in-python-finally-solve-your-import-errors-af2fd172fcf7|Modular Coding in Python: Finally Solve your Import Errors]] (understanding and fixing ModuleNotFoundError and ImportError)
+  * [[https://medium.com/@moraneus/understanding-multithreading-and-multiprocessing-in-python-1ed39bb078d5|Understanding Multithreading and Multiprocessing in Python]]
 ===== Reading/setting environments variables =====
@@ Line 26: / Line 44: @@
+===== Using log files (aka logging) =====
+It is always possible to display information messages using the ''print()'' command, but it is more efficient to use //logging// tools when you want to **display correctly a lot of information about a script progress
+**
+  * [[https://loguru.readthedocs.io/|Loguru]] is a library which aims to bring enjoyable logging in Python
+    * See also [[https://betterstack.com/community/guides/logging/loguru/|A Complete Guide to Logging in Python with Loguru]]
+  * More on [[https://betterstack.com/community/guides/logging/#python|logging with python]]
+  * The default (but not easy to use) Python ''[[https://docs.python.org/3/library/logging.html|logging]]'' module
 ===== Stopping a script =====
@@ Line 41: / Line 67: @@
 ===== Playing with strings =====
+==== String formatting ====
+  * Knowing how to display/print a string correctly is always useful for information and debugging purpose
+  * There are lots of different ways to display strings
+=== String formatting examples ===
+You will find below some examples of //quick printing//, as well as using //old style formatting//, //formatted string literals (f-strings)// and the //String ''format()'' Method//. More details in the next section
+<code python>
+>>> # Basic (but quick and efficient) printing
+>>> year = 1984
+>>> print(year)
+>>> print('[', year, 'is a famous book ]')
+[ 1984 is a famous book ]
+>>> # Old style formatting
+>>> print('[ %i is a famous book ]' % (year,))
+[ 1984 is a famous book ]
+>>> print('[ %10i is a famous book ]' % (year,))
+[       1984 is a famous book ]
+>>> print('[ %-10i is a famous book ]' % (year,))
+[ 1984       is a famous book ]
+>>> print('[ %010i is a famous book ]' % (year,))
+[ 0000001984 is a famous book ]
+>>> # Formatted string literals (f-strings)
+>>> print(f'[ {year} is a famous book ]')
+[ 1984 is a famous book ]
+>>> print(f'[ {year=} is a famous book ]')
+[ year=1984 is a famous book ]
+>>> print(f'[ {year:10} is a famous book ]')
+[       1984 is a famous book ]
+>>> print(f'[ {year:<10} is a famous book ]')
+[ 1984       is a famous book ]
+>>> print(f'[ {year:010} is a famous book ]')
+[ 0000001984 is a famous book ]
+>>> print(f'[ {year:10.2f} is a famous book (yes, {year}!) ]')
+[    1984.00 is a famous book (yes, 1984!) ]
+>>> # The String format() Method
+>>> print('[ {} is a famous book ]'.format(year))
+[ 1984 is a famous book ]
+>>> print('[ {:10} is a famous book ]'.format(year))
+[       1984 is a famous book ]
+>>> print('[ {:<10} is a famous book ]'.format(year))
+[ 1984       is a famous book ]
+>>> print('[ {:010} is a famous book ]'.format(year))
+[ 0000001984 is a famous book ]
+>>> print('[ {:10.2f} is a famous book  (yes, {}!) ]'.format(year, year))
+[    1984.00 is a famous book  (yes, 1984!) ]
+>>> print('[ {title:10.2f} is a famous book  (yes, {title}!) ]'.format(title=year))
+[    1984.00 is a famous book  (yes, 1984!) ]
+>>> print('[ {title:10.2e} is a famous book ]'.format(title=year))
+[   1.98e+03 is a famous book ]</code>
+=== String formatting references ===
+  * [[https://docs.python.org/3/tutorial/inputoutput.html#formatted-string-literals|Formatted String Literals]] (//f-strings//)
+    * Available in Python >= 3.6
+    * [[https://docs.python.org/3/reference/lexical_analysis.html#f-strings|More documentation]]
+    * [[https://docs.python.org/3/library/string.html#formatspec|Format Specification Mini-Language]]
+      * See also the [[https://pyformat.info/|PyFormat site]]
+  * [[https://docs.python.org/3/tutorial/inputoutput.html#the-string-format-method|The String format() Method]]
+    * [[https://docs.python.org/3/library/string.html#formatspec|Format Specification Mini-Language]]
+      * See also the [[https://pyformat.info/|PyFormat site]]
+  * [[https://pyformat.info/|PyFormat site]]: string formatting using the //old style// and the //String ''format()'' method//
+    * <wrap hi>Hint</wrap>: this can also be used as an **easy documentation for //f-strings// format**!
+  * [[https://docs.python.org/3/tutorial/inputoutput.html#the-string-format-method|Old string formatting]]
 ==== Splitting (complex) strings ====
@@ Line 65: / Line 167: @@
 ==== Working with paths and filenames ====
-If you are in a hurry, you can just use string functions to work with path and file names. But you will need some specific functions to check if a file exists, and similar operations. All these are available in 2 libraries that have similar functions. Both of these libraries can deal with Unix-type paths on Linux computers, and Windows-type paths on Windows computers
+If you are in a hurry, you can just use string functions to work with paths and file names.
+You will need some specific objects and functions to check if a file exists, and similar operations. Check the libraries listed below, that can automatically deal with Unix-type paths on Linux and MacOS computers, and Windows-type paths on Windows computers
-  * [[https://docs.python.org/3/library/os.path.html|os.path]] //Common pathname manipulations//
+  * [[https://docs.python.org/3/library/os.path.html|os.path]]: //common pathname manipulations//
     * Available since... a long time! Use this if you want to avoid backward compatibility problems
     * Some functions are directly in [[https://docs.python.org/3/library/os.html|os]] //Miscellaneous operating system interfaces//\\ e.g. [[https://docs.python.org/3/library/os.html#os.remove|os.remove]] and [[https://docs.python.org/3/library/os.html#os.rmdir|os.rmdir]]
-  * [[https://docs.python.org/3/library/pathlib.html|pathlib]] //Object-oriented filesystem paths//
+  * [[https://docs.python.org/3/library/pathlib.html|pathlib]]: a **more recent** //object-oriented// way to deal with //filesystem paths//
     * Available since Python version 3.4
     * [[https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]]
-  * [[https://docs.python.org/3/library/shutil.html|High-level file operations]]
+  * [[https://docs.python.org/3/library/shutil.html|shutil]]: High-level file operations, e.g copy/move a file or directory tree
-=== Example: getting the full path of the Python used ===
+=== Example: getting the full path of the Python executable used ===
 Note: the actual python may be different from the default python!
@@ Line 83: / Line 188: @@
 /usr/bin/python
-$ /modfs/modtools/miniconda3//envs/analyse_3.6_test/bin/python
+$ /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/bin/python
 >>> import sys, shutil
 >>> shutil.which('python')
 '/usr/bin/python'
 >>> sys.executable
-'/modfs/modtools/miniconda3//envs/analyse_3.6_test/bin/python'</code>
+'/home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/bin/python'</code>
@@ Line 104: / Line 209: @@
 </code>
+=== Example: system independent paths with pathlib ===
+Note: the following example was generated on a Linux server and uses a <wrap em>/</wrap> character as a path separator
+<code>>>> my_home = Path.home()
+>>> my_home
+PosixPath('/home/users/my_login')
+>>> my_conf = my_home / '.config' / 'evince'
+>>> my_conf
+PosixPath('/home/users/my_login/.config/evince')
+>>> my_conf.is_dir()
+True
+>>> my_conf.is_file()
+False
+>>> list(my_conf.glob('*'))
+[PosixPath('/home/users/my_login/.config/evince/evince_toolbar.xml'), PosixPath(' /home/users/my_login/.config/evince/accels')]
+>>> [ ff.name for ff in my_conf.glob('*') ]
+['evince_toolbar.xml', 'accels']
+</code>
 === Example: getting the size(s) of all the files in a directory ===
@@ Line 290: / Line 415: @@
 ['c', 'd', 'b', 'a']</code>
+===== Efficient looping with numpy, map, itertools and list comprehension =====
+<wrap hi>Big, nested, explicit ''for'' loops should be avoided at all cost</wrap>, in order to reduce a script execution time!
+  * **''numpy'' arrays** should be used when dealing with //numerical data//
+    * **Masked arrays** can be used to deal with //special cases// and remove tests from loops
+  * The built-in [[https://docs.python.org/3/library/functions.html?highlight=map#map|map]] function (and similar functions like [[https://docs.python.org/3/library/functions.html?highlight=zip#zip|zip]], [[https://docs.python.org/3/library/functions.html?highlight=filter#filter|filter]], ...) can be used to efficiently apply a function (possibly a //simple// [[https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions|lambda]] function) to all the elements of a list
+    * <code>>>> my_ints = [1, 2, 3]
+>>> map(str, my_ints)
+['1', '2', '3']
+>>> map(lambda ii: str(10*ii + 5), my_ints)
+['15', '25', '35']</code>
+  * The [[https://docs.python.org/3/library/itertools.html|itertools]] module defines many more fancy iterators that can be used for efficient looping
+    * Example: replacing nested loops with [[https://docs.python.org/3/library/itertools.html#itertools.product|product]]
+      * <code>>>> it.product('AB', '01')
+<itertools.product object at 0x2b35a7b5f100>
+>>> list(it.product('AB', '01'))
+[('A', '0'), ('A', '1'), ('B', '0'), ('B', '1')]
+>>> for c1, c2 in it.product('AB', '01'):
+...   print(c1 + c2)
+...
+A0
+A1
+B0
+B1
+>>> for c1, c2 in it.product(['A', 'B'], ['0', '1']):
+...   print(c1 + c2)
+...
+A0
+A1
+B0
+B1
+>>> for c1, c2, c3 in it.product('AB', '01', '$!'):
+...   print(c1 + c2 + c3, end=', ')
+...
+A0$, A0!, A1$, A1!, B0$, B0!, B1$, B1!,</code>
+  * The [[https://docs.python.org/3/tutorial/datastructures.html?highlight=comprehension#list-comprehensions|list comprehension]] (aka //implicit loops//) can also be used to generate lists from lists
+    * Example: converting a list of integers to a list of strings\\ Note: in that case, you should rather use the ''map'' function detailed above
+      * <code>>>> my_ints = [1, 2, 3]
+>>> [ str(ii) for ii in my_ints ]
+['1', '2', '3']</code>
 ===== numpy related stuff =====
@@ Line 452: / Line 629: @@
 array([3. , 4.5, 8. ])</code>
+==== Exercise your brain with numpy ====
+Have a look at [[https://github.com/rougier/numpy-100/blob/master/100_Numpy_exercises.ipynb|100 numpy exercises]]
 ===== matplotlib related stuff =====
@@ Line 466: / Line 646: @@
 A few notes for a future section or page about about //data representation// (bits and bytes) on disk and in memory, vs //data format//
-FIXME Add parts (pages 28 to 37) of this [[https://wiki.lsce.ipsl.fr/pmip3/doku.php/other:python:jyp_steps#part_2|old tutorial]] to this section
+FIXME Add parts (pages 28 to 37) of this [[http://www.lsce.ipsl.fr/Phocea/file.php?class=page&file=5/pythonCDAT_jyp_2sur2_070306.pdf|old tutorial]] to this section
 ==== Base notions ====
-  * **Never forget** that all the bits and pieces of information we use are coded in [[https://en.wikipedia.org/wiki/Binary_number#Counting_in_binary|base 2]] (''0''s and ''1''s), grouped in bytes!
+  * **Never forget** that all the bits and pieces of information we use are coded in [[https://en.wikipedia.org/wiki/Binary_number#Counting_in_binary|base 2]] (''0''s and ''1''s ...), grouped in bytes!
     * Some things can be stored exactly (integers, characters, ...)
     * In other cases (**//real// numbers** that we work with all the time, compressed images/videos/music) we only store **//good enough approximation//**
@@ Line 479: / Line 659: @@
       * ''0000'' <=> ''0'',\\ ''0010'' <=> ''1'', ...,\\ ''1111'' <=> ''F''
       * ''1101'' <=> ''D'' in hexadecimal <=> ''13'' in decimal (''**1** * 8 + **1** * 4 + **0** * 2 + **1** * 1'')
-      * ''11111101'' <=> ''1111 1101'' <=> ''FD'' in hexadecimal <=> ''253'' in decimal (''15 * 16 + 13'')
+      * ''11111101'' in //base 2// <=> ''1111 1101'' <=> ''FD'' in //hexadecimal// <=> ''253'' (''15 * 16 + 13'') in //decimal//
-  * Conversion with Python
+  * Base conversion with Python
     * <code>>>> hex(13) # Decimal to Hexadecimal conversion
 '0xd'
@@ Line 490: / Line 670: @@
 >>> int('0x100', 16) # Hexadecimal to Decimal conversion
->>> int('11', 2)
 >>> int('1111', 2) # Binary to Decimal conversion
->>> int('11111101', 2)
+>>> int('11111101', 2) # '11111101' <=> '1111 1101' <=> 'FD' <=> 15 * 16 + 13 = 253
->>> 15 * 16 + 13
 >>> 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0
@@ Line 502: / Line 678: @@
 >>> int('13', 8) # 1*8 + 3
 </code>
+  * More technical topics
+    * [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]]: the art of ordering bits, everything about MSB (Most Significant Byte) and LSB (Least Significant Byte)
+    * [[https://en.wikipedia.org/wiki/Endianness|Endianness]]: the art of ordering bytes
 ==== Numerical values ====
-  * Binary data representation of some numbers (not everythin is listed here):
+  * Binary data representation of some numbers (only some common types are listed here):
+    * Languages and packages **references** used below:
+      * Python: [[https://numpy.org/doc/stable/reference/arrays.scalars.html#sized-aliases|NumPy Sized aliases]]
+      * NetCDF: [[https://docs.unidata.ucar.edu/nug/current/md_types.html|Data Types]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|Fortran related Data Types]], [[https://docs.unidata.ucar.edu/nug/current/_c_d_l.html#cdl_data_types|CDL Data Types]]
+      * Fortran: Intel Fortran Compiler [[https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-1/intrinsic-data-types.html|Intrinsic Data Types]]
     * [[https://en.wikipedia.org/wiki/Integer_(computer_science)|Integers]]
       * Range:
-        * 4-byte integers: −2,147,483,648 to 2,147,483,647
+        * 4-byte //signed// integers: ''−2,147,483,648'' to ''2,147,483,647''
           * Python: ''numpy.int32''
-          * [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]: ''int'', ''NC_INT64'', ''NF90_INT''
+          * NetCDF: ''int'', ''NC_INT'' or ''NC_LONG'', ''NF90_INT''
-          * Fortran:
+          * Fortran: ''INTEGER*4''
-        * 8-byte integers: −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
+        * 8-byte //signed// integers: ''−9,223,372,036,854,775,808'' to ''9,223,372,036,854,775,807''
           * Python: ''numpy.int64''
-          * [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]]: ''int64'', ''NC_INT64''
+          * NetCDF: ''int64'', ''NC_INT64''
-          * Fortran:
+          * Fortran: ''INTEGER*8''
       * Tech note: signed integers use [[https://en.wikipedia.org/wiki/Two%27s_complement|two's complement]] for coding negative integers
     * [[https://en.wikipedia.org/wiki/IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//)
       * Range:
-        * 4-byte float: ~8 significant digits * 10E±38
+        * 4-byte float: ''~8 significant digits * 10E±38''
           * Python: ''numpy.float32''
-          * [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]:
+          * NetCDF: ''float'', ''NC-FLOAT'', ''NF90_FLOAT''
-          * Fortran:
+          * Fortran:''REAL*4''
           * See also [[https://en.wikipedia.org/wiki/Single-precision_floating-point_format|Single-precision floating-point format]]
-        * 8-byte float: ~15 significant digits * 10E±308
+        * 8-byte float: ''~15 significant digits * 10E±308''
           * Python: ''numpy.float64''
-          * [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]:
+          * NetCDF: ''double'', ''NC_DOUBLE'', ''NF90_DOUBLE''
-          * Fortran:
+          * Fortran: ''REAL*8''
-      * Special values:
+      * **Special values**:
-        * [[https://en.wikipedia.org/wiki/NaN|NaN]] (''numpy.nan''): //Not a Number//
+        * [[https://en.wikipedia.org/wiki/NaN|NaN]]: //Not a Number//
-        * Infinity (''-numpy.inf'' and ''numpy.inf'')
+          * Python: ''numpy.nan''
-        * Note: it is cleaner to use masks (and [[https://numpy.org/doc/stable/reference/maskedarray.generic.html|Numpy masked arrays]]) than NaNs, when you have to deal with missing values !
+        * Infinity
-    * [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]]
+          * Python: ''-numpy.inf'' and ''numpy.inf''
-    * [[https://en.wikipedia.org/wiki/Endianness|Endianness]]
+        * Note: it is cleaner to use masks (and [[https://numpy.org/doc/stable/reference/maskedarray.generic.html|Numpy masked arrays]]) rather than ''NaN''s, when you have to deal with missing values !
+      * <wrap hi>The RISKS of working with (the wrong) floats</wrap>:
+        * [[https://en.wikipedia.org/wiki/Round-off_error|Round-off error]]
+        * [[https://en.wikipedia.org/wiki/Catastrophic_cancellation|Catastrophic cancellation]]
+          * [[https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html|What Every Computer Scientist Should Know About Floating-Point Arithmetic]]
     * A rather technical example: we //play// with a numpy 4-byte integer scalar
       * <code>>>> one_int32 = np.int32(1)