User Tools

Site Tools


other:python:misc_by_jyp

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
other:python:misc_by_jyp [2023/04/28 14:16]
jypeter Moved the new Data Representation to the end
other:python:misc_by_jyp [2024/04/19 12:02] (current)
jypeter [Data representation] Corrected link to an old JYP tutorial
Line 5: Line 5:
 </​WRAP>​ </​WRAP>​
  
 +===== Extra tutorials =====
  
 +Only **when you have already read all the content of this page several times**, and you are looking for new ideas
 +
 +  * [[https://​medium.com/​@yaduvanshineelam09/​ultimate-python-cheat-sheet-practical-python-for-everyday-tasks-8a33abc0892f|Ultimate Python Cheat Sheet: Practical Python For Everyday Tasks]]
 ===== Reading/​setting environments variables ===== ===== Reading/​setting environments variables =====
  
Line 65: Line 69:
 ==== Working with paths and filenames ==== ==== Working with paths and filenames ====
  
-If you are in a hurry, you can just use string functions to work with path and file names. ​But you will need some specific functions to check if a file exists, and similar operations. All these are available in 2 libraries that have similar functions. Both of these libraries can deal with Unix-type paths on Linux computers, and Windows-type paths on Windows computers+If you are in a hurry, you can just use string functions to work with paths and file names.
  
-  ​* [[https://​docs.python.org/​3/​library/​os.path.html|os.path]] //Common ​pathname manipulations//​+ 
 +You will need some specific objects and functions to check if a file exists, and similar operations. Check the libraries listed below, that can automatically deal with Unix-type paths on Linux and MacOS computers, and Windows-type paths on Windows computers 
 + 
 +  ​* [[https://​docs.python.org/​3/​library/​os.path.html|os.path]]//common ​pathname manipulations//​
     * Available since... a long time! Use this if you want to avoid backward compatibility problems     * Available since... a long time! Use this if you want to avoid backward compatibility problems
     * Some functions are directly in [[https://​docs.python.org/​3/​library/​os.html|os]] //​Miscellaneous operating system interfaces//​\\ e.g. [[https://​docs.python.org/​3/​library/​os.html#​os.remove|os.remove]] and [[https://​docs.python.org/​3/​library/​os.html#​os.rmdir|os.rmdir]]     * Some functions are directly in [[https://​docs.python.org/​3/​library/​os.html|os]] //​Miscellaneous operating system interfaces//​\\ e.g. [[https://​docs.python.org/​3/​library/​os.html#​os.remove|os.remove]] and [[https://​docs.python.org/​3/​library/​os.html#​os.rmdir|os.rmdir]]
-  * [[https://​docs.python.org/​3/​library/​pathlib.html|pathlib]] //Object-oriented filesystem paths//+  * [[https://​docs.python.org/​3/​library/​pathlib.html|pathlib]]: a **more recent** ​//object-oriented// way to deal with //filesystem paths//
     * Available since Python version 3.4     * Available since Python version 3.4
     * [[https://​docs.python.org/​3/​library/​pathlib.html#​correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]]     * [[https://​docs.python.org/​3/​library/​pathlib.html#​correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]]
-  * [[https://​docs.python.org/​3/​library/​shutil.html|High-level file operations]]+  * [[https://​docs.python.org/​3/​library/​shutil.html|shutil]]: ​High-level file operations, e.g copy/move a file or directory tree
  
  
-=== Example: getting the full path of the Python used ===+=== Example: getting the full path of the Python ​executable ​used ===
  
 Note: the actual python may be different from the default python! Note: the actual python may be different from the default python!
Line 83: Line 90:
 /​usr/​bin/​python /​usr/​bin/​python
  
-$ /modfs/modtools/miniconda3//envs/analyse_3.6_test/bin/python+$ /home/share/unix_files/cdat/​miniconda3_21-02/envs/cdatm_py3/bin/python
 >>>​ import sys, shutil >>>​ import sys, shutil
 >>>​ shutil.which('​python'​) >>>​ shutil.which('​python'​)
 '/​usr/​bin/​python'​ '/​usr/​bin/​python'​
 >>>​ sys.executable >>>​ sys.executable
-'/modfs/modtools/miniconda3//envs/analyse_3.6_test/​bin/​python'</​code>​+'/home/share/unix_files/cdat/​miniconda3_21-02/envs/cdatm_py3/​bin/​python'</​code>​
  
  
Line 104: Line 111:
 </​code>​ </​code>​
  
 +
 +=== Example: system independent paths with pathlib ===
 +
 +Note: the following example was generated on a Linux server and uses a <wrap em>/</​wrap>​ character as a path separator
 +
 +<​code>>>>​ my_home = Path.home()
 +>>>​ my_home
 +PosixPath('/​home/​users/​my_login'​)
 +>>>​ my_conf = my_home / '​.config'​ / '​evince'​
 +>>>​ my_conf
 +PosixPath('/​home/​users/​my_login/​.config/​evince'​)
 +>>>​ my_conf.is_dir()
 +True
 +>>>​ my_conf.is_file()
 +False
 +>>>​ list(my_conf.glob('​*'​))
 +[PosixPath('/​home/​users/​my_login/​.config/​evince/​evince_toolbar.xml'​),​ PosixPath('​ /​home/​users/​my_login/​.config/​evince/​accels'​)]
 +>>>​ [ ff.name for ff in my_conf.glob('​*'​) ]
 +['​evince_toolbar.xml',​ '​accels'​]
 +</​code>​
  
 === Example: getting the size(s) of all the files in a directory === === Example: getting the size(s) of all the files in a directory ===
Line 290: Line 317:
 ['​c',​ '​d',​ '​b',​ '​a'​]</​code>​ ['​c',​ '​d',​ '​b',​ '​a'​]</​code>​
  
 +
 +===== Efficient looping with numpy, map, itertools and list comprehension =====
 +
 +<wrap hi>Big, nested, explicit ''​for''​ loops should be avoided at all cost</​wrap>,​ in order to reduce a script execution time!
 +
 +  * **''​numpy''​ arrays** should be used when dealing with //numerical data//
 +    * **Masked arrays** can be used to deal with //special cases// and remove tests from loops
 +
 +  * The built-in [[https://​docs.python.org/​3/​library/​functions.html?​highlight=map#​map|map]] function (and similar functions like [[https://​docs.python.org/​3/​library/​functions.html?​highlight=zip#​zip|zip]],​ [[https://​docs.python.org/​3/​library/​functions.html?​highlight=filter#​filter|filter]],​ ...) can be used to efficiently apply a function (possibly a //simple// [[https://​docs.python.org/​3/​tutorial/​controlflow.html#​lambda-expressions|lambda]] function) to all the elements of a list
 +    * <​code>>>>​ my_ints = [1, 2, 3]
 +
 +>>>​ map(str, my_ints)
 +['​1',​ '​2',​ '​3'​]
 +
 +>>>​ map(lambda ii: str(10*ii + 5), my_ints)
 +['​15',​ '​25',​ '​35'​]</​code>​
 +
 +  * The [[https://​docs.python.org/​3/​library/​itertools.html|itertools]] module defines many more fancy iterators that can be used for efficient looping
 +    * Example: replacing nested loops with [[https://​docs.python.org/​3/​library/​itertools.html#​itertools.product|product]]
 +      * <​code>>>>​ it.product('​AB',​ '​01'​)
 +<​itertools.product object at 0x2b35a7b5f100>​
 +
 +>>>​ list(it.product('​AB',​ '​01'​))
 +[('​A',​ '​0'​),​ ('​A',​ '​1'​),​ ('​B',​ '​0'​),​ ('​B',​ '​1'​)]
 +
 +>>>​ for c1, c2 in it.product('​AB',​ '​01'​):​
 +...   ​print(c1 + c2)
 +...
 +A0
 +A1
 +B0
 +B1
 +
 +>>>​ for c1, c2 in it.product(['​A',​ '​B'​],​ ['​0',​ '​1'​]):​
 +...   ​print(c1 + c2)
 +...
 +A0
 +A1
 +B0
 +B1
 +
 +>>>​ for c1, c2, c3 in it.product('​AB',​ '​01',​ '​$!'​):​
 +...   ​print(c1 + c2 + c3, end=', ')
 +...
 +A0$, A0!, A1$, A1!, B0$, B0!, B1$, B1!,</​code>​
 +
 +  * The [[https://​docs.python.org/​3/​tutorial/​datastructures.html?​highlight=comprehension#​list-comprehensions|list comprehension]] (aka //implicit loops//) can also be used to generate lists from lists
 +    * Example: converting a list of integers to a list of strings\\ Note: in that case, you should rather use the ''​map''​ function detailed above
 +      * <​code>>>>​ my_ints = [1, 2, 3]
 +
 +>>>​ [ str(ii) for ii in my_ints ]
 +['​1',​ '​2',​ '​3'​]</​code>​
 ===== numpy related stuff ===== ===== numpy related stuff =====
  
Line 452: Line 531:
 array([3. , 4.5, 8. ])</​code>​ array([3. , 4.5, 8. ])</​code>​
  
 +==== Exercise your brain with numpy ====
 +
 +Have a look at [[https://​github.com/​rougier/​numpy-100/​blob/​master/​100_Numpy_exercises.ipynb|100 numpy exercises]]
  
 ===== matplotlib related stuff ===== ===== matplotlib related stuff =====
Line 466: Line 548:
 A few notes for a future section or page about about //data representation//​ (bits and bytes) on disk and in memory, vs //data format// A few notes for a future section or page about about //data representation//​ (bits and bytes) on disk and in memory, vs //data format//
  
 +FIXME Add parts (pages 28 to 37) of this [[http://​www.lsce.ipsl.fr/​Phocea/​file.php?​class=page&​file=5/​pythonCDAT_jyp_2sur2_070306.pdf|old tutorial]] to this section
 +
 +==== Base notions ====
 +
 +  * **Never forget** that all the bits and pieces of information we use are coded in [[https://​en.wikipedia.org/​wiki/​Binary_number#​Counting_in_binary|base 2]] (''​0''​s and ''​1''​s ...), grouped in bytes!
 +    * Some things can be stored exactly (integers, characters, ...)
 +    * In other cases (**//real// numbers** that we work with all the time, compressed images/​videos/​music) we only store **//good enough approximation//​**
 +
 +  * 1 byte <=> 8 bits
 +    * ''​REAL*4''​ <=> 4 bytes <=> 32 bits
 +    * For easier written/​displayed representation,​ 1 byte is usually split into 2 groups of 4 bits, and displayed using base 16 and [[https://​en.wikipedia.org/​wiki/​Hexadecimal|hexadecimal representation]] (characters ''​0'',​ ''​1'',​ ..., ''​A'',​ ''​B'',​ ..., ''​F''​)
 +      * ''​0000''​ <=> ''​0'',​\\ ''​0010''​ <=> ''​1'',​ ...,\\ ''​1111''​ <=> ''​F''​
 +      * ''​1101''​ <=> ''​D''​ in hexadecimal <=> ''​13''​ in decimal (''​**1** * 8 + **1** * 4 + **0** * 2 + **1** * 1''​)
 +      * ''​11111101''​ in //base 2// <=> ''​1111 1101''​ <=> ''​FD''​ in //​hexadecimal//​ <=> ''​253''​ (''​15 * 16 + 13''​) in //decimal//
 +
 +  * Base conversion with Python
 +    * <​code>>>>​ hex(13) # Decimal to Hexadecimal conversion
 +'​0xd'​
 +>>>​ hex(253)
 +'​0xfd'​
 +>>>​ hex(256)
 +'​0x100'​
 +>>>​ int('​0x100',​ 16) # Hexadecimal to Decimal conversion
 +256
 +>>>​ int('​1111',​ 2) # Binary to Decimal conversion
 +15
 +>>>​ int('​11111101',​ 2) # '​11111101'​ <=> '1111 1101' <=> '​FD'​ <=> 15 * 16 + 13 = 253
 +253
 +>>>​ 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0
 +11
 +>>>​ int('​13',​ 8) # 1*8 + 3
 +11</​code>​
  
 +  * More technical topics
 +    * [[https://​en.wikipedia.org/​wiki/​Bit_numbering|Bit numbering]]:​ the art of ordering bits, everything about MSB (Most Significant Byte) and LSB (Least Significant Byte)
 +    * [[https://​en.wikipedia.org/​wiki/​Endianness|Endianness]]:​ the art of ordering bytes
 ==== Numerical values ==== ==== Numerical values ====
  
-  * Binary data representation of some numbers:+  * Binary data representation of some numbers ​(only some common types are listed here): 
 +    * Languages and packages **references** used below: 
 +      * Python: [[https://​numpy.org/​doc/​stable/​reference/​arrays.scalars.html#​sized-aliases|NumPy Sized aliases]] 
 +      * NetCDF: [[https://​docs.unidata.ucar.edu/​nug/​current/​md_types.html|Data Types]], [[https://​docs.unidata.ucar.edu/​netcdf-fortran/​current/​f90-variables.html#​f90-language-types-corresponding-to-netcdf-external-data-types|Fortran related Data Types]], [[https://​docs.unidata.ucar.edu/​nug/​current/​_c_d_l.html#​cdl_data_types|CDL Data Types]] 
 +      * Fortran: Intel Fortran Compiler [[https://​www.intel.com/​content/​www/​us/​en/​docs/​fortran-compiler/​developer-guide-reference/​2023-1/​intrinsic-data-types.html|Intrinsic Data Types]]
     * [[https://​en.wikipedia.org/​wiki/​Integer_(computer_science)|Integers]]     * [[https://​en.wikipedia.org/​wiki/​Integer_(computer_science)|Integers]]
       * Range:       * Range:
-        * 4-byte integers ​(''​numpy.int32''​): ​−2,​147,​483,​648 to 2,​147,​483,​647 +        * 4-byte ​//​signed// ​integers''​−2,​147,​483,​648'' ​to ''​2,​147,​483,​647''​ 
-        8-byte integers (''​numpy.int64''​): −9,​223,​372,​036,​854,​775,​808 to 9,​223,​372,​036,​854,​775,​807+          Python: ​''​numpy.int32''​ 
 +          * NetCDF''​int'',​ ''​NC_INT''​ or ''​NC_LONG'',​ ''​NF90_INT''​ 
 +          * Fortran: ''​INTEGER*4''​ 
 +        * 8-byte //signed// integers: ''​−9,​223,​372,​036,​854,​775,​808'' ​to ''​9,​223,​372,​036,​854,​775,​807''​ 
 +          * Python: ''​numpy.int64''​ 
 +          * NetCDF: ''​int64'',​ ''​NC_INT64''​ 
 +          * Fortran: ''​INTEGER*8''​
       * Tech note: signed integers use [[https://​en.wikipedia.org/​wiki/​Two%27s_complement|two'​s complement]] for coding negative integers       * Tech note: signed integers use [[https://​en.wikipedia.org/​wiki/​Two%27s_complement|two'​s complement]] for coding negative integers
     * [[https://​en.wikipedia.org/​wiki/​IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//​)     * [[https://​en.wikipedia.org/​wiki/​IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//​)
       * Range:       * Range:
-        * 4-byte float (''​numpy.float32''​)~8 significant digits ​10E±38+        * 4-byte float: ''​~8 significant digits * 10E±38''​ 
 +          * Python: ​''​numpy.float32''​ 
 +          * NetCDF''​float'',​ ''​NC-FLOAT'',​ ''​NF90_FLOAT''​ 
 +          ​Fortran:''​REAL*4''​
           * See also [[https://​en.wikipedia.org/​wiki/​Single-precision_floating-point_format|Single-precision floating-point format]]           * See also [[https://​en.wikipedia.org/​wiki/​Single-precision_floating-point_format|Single-precision floating-point format]]
-        * 8-byte float (''​numpy.float64''​)~15 significant digits ​10E±308 +        * 8-byte float: ''​~15 significant digits * 10E±308''​ 
-      * Special values: +          * Python: ​''​numpy.float64''​ 
-        * [[https://​en.wikipedia.org/​wiki/​NaN|NaN]] ​(''​numpy.nan''​): //Not a Number// +          * NetCDF''​double'',​ ''​NC_DOUBLE'',​ ''​NF90_DOUBLE''​ 
-        * Infinity ​(''​-numpy.inf''​ and ''​numpy.inf''​) +          ​Fortran: ''​REAL*8''​ 
-        * Note: it is cleaner to use masks (and [[https://​numpy.org/​doc/​stable/​reference/​maskedarray.generic.html|Numpy masked arrays]]) than NaNs, when you have to deal with missing values ! +      ​* **Special values**
-    * [[https://​en.wikipedia.org/​wiki/​Bit_numbering|Bit numbering]] +        * [[https://​en.wikipedia.org/​wiki/​NaN|NaN]]:​ //Not a Number// 
-    * [[https://​en.wikipedia.org/​wiki/​Endianness|Endianness]]+          * Python: ''​numpy.nan''​ 
 +        * Infinity 
 +          * Python: ​''​-numpy.inf''​ and ''​numpy.inf''​ 
 +        * Note: it is cleaner to use masks (and [[https://​numpy.org/​doc/​stable/​reference/​maskedarray.generic.html|Numpy masked arrays]]) ​rather ​than ''​NaN''​s, when you have to deal with missing values ! 
 +      * <wrap hi>The RISKS of working with (the wrong) floats</​wrap>:​ 
 +        ​* [[https://​en.wikipedia.org/​wiki/​Round-off_error|Round-off error]] 
 +        * [[https://​en.wikipedia.org/​wiki/​Catastrophic_cancellation|Catastrophic cancellation]] 
 +          * [[https://​docs.oracle.com/​cd/​E19957-01/​806-3568/​ncg_goldberg.html|What Every Computer Scientist Should Know About Floating-Point Arithmetic]]
     * A rather technical example: we //play// with a numpy 4-byte integer scalar     * A rather technical example: we //play// with a numpy 4-byte integer scalar
       * <​code>>>>​ one_int32 = np.int32(1)       * <​code>>>>​ one_int32 = np.int32(1)
other/python/misc_by_jyp.1682691377.txt.gz · Last modified: 2023/04/28 14:16 by jypeter