User Tools

Site Tools


other:python:misc_by_jyp

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
other:python:misc_by_jyp [2023/04/28 16:11] – [Numerical values] Added low level array addressing jypeterother:python:misc_by_jyp [2024/11/04 15:01] (current) – [Extra tutorials] Added links to ruff and flake8 jypeter
Line 5: Line 5:
 </WRAP> </WRAP>
  
 +===== Extra tutorials =====
  
 +Only **when you have already read all the content of this page several times**, and you are looking for new ideas
 +
 +  * [[https://medium.com/pythons-gurus/clean-code-in-python-good-vs-bad-practices-examples-2df344bddacc|Clean Code in Python: Good vs. Bad Practices Examples]]
 +  * [[https://peps.python.org/pep-0008/|PEP 8 – Style Guide for Python Code]]
 +    * [[https://realpython.com/python-pep8/|How to Write Beautiful Python Code With PEP 8]]
 +    * [[https://www.datacamp.com/tutorial/pep8-tutorial-python-code|PEP-8 Tutorial: Code Standards in Python]]
 +    * Some checkers/linters: [[https://docs.astral.sh/ruff/|ruff]], [[https://flake8.pycqa.org/en/stable/|flake8]]
 +  * [[https://medium.com/@yaduvanshineelam09/ultimate-python-cheat-sheet-practical-python-for-everyday-tasks-8a33abc0892f|Ultimate Python Cheat Sheet: Practical Python For Everyday Tasks]]
 +  * [[https://medium.com/pythoneers/16-hacks-that-will-take-your-python-skills-to-the-next-level-12e7a9b97421|16 Hacks That Will Take Your Python Skills to the Next Level]]
 +  * [[https://levelup.gitconnected.com/modular-coding-in-python-finally-solve-your-import-errors-af2fd172fcf7|Modular Coding in Python: Finally Solve your Import Errors]] (understanding and fixing ModuleNotFoundError and ImportError)
 +  * [[https://medium.com/@moraneus/understanding-multithreading-and-multiprocessing-in-python-1ed39bb078d5|Understanding Multithreading and Multiprocessing in Python]]
 ===== Reading/setting environments variables ===== ===== Reading/setting environments variables =====
  
Line 26: Line 38:
  
  
 +===== Using log files (aka logging) =====
 +
 +It is always possible to display information messages using the ''print()'' command, but it is more efficient to use //logging// tools when you want to **display correctly a lot of information about a script progress
 +**
 +  * [[https://loguru.readthedocs.io/|Loguru]] is a library which aims to bring enjoyable logging in Python
 +    * See also [[https://betterstack.com/community/guides/logging/loguru/|A Complete Guide to Logging in Python with Loguru]]
 +  * More on [[https://betterstack.com/community/guides/logging/#python|logging with python]]
 +  * The default (but not easy to use) Python ''[[https://docs.python.org/3/library/logging.html|logging]]'' module
 ===== Stopping a script ===== ===== Stopping a script =====
  
Line 31: Line 51:
  
 <code>sys.exit('Some optional message about why we are stopping')</code> <code>sys.exit('Some optional message about why we are stopping')</code>
 +===== Checking if a file/directory is writable by the current user =====
  
-===== Data representation =====+<code>>>> os.access('/', os.W_OK) 
 +False 
 +>>> os.access('/home/jypmce/.bashrc', os.W_OK) 
 +True</code>
  
-A few notes for a future section or page about about //data representation// (bits and bytes) on disk and in memory, vs //data format// 
  
 +===== Playing with strings =====
  
-==== Numerical values ====+==== String formatting ====
  
-  * Binary data representation of some numbers: +  * Knowing how to display/print string correctly is always useful for information and debugging purpose 
-    * [[https://en.wikipedia.org/wiki/Integer_(computer_science)|Integers]] +  There are lots of different ways to display strings
-      * Range: +
-        * 4-byte integers (''numpy.int32''): −2,147,483,648 to 2,147,483,647 +
-        * 8-byte integers (''numpy.int64''): −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 +
-      * Tech note: signed integers use [[https://en.wikipedia.org/wiki/Two%27s_complement|two's complement]] for coding negative integers +
-    * [[https://en.wikipedia.org/wiki/IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//+
-      * Range: +
-        * 4-byte float (''numpy.float32''): ~8 significant digits * 10E±38 +
-          * See also [[https://en.wikipedia.org/wiki/Single-precision_floating-point_format|Single-precision floating-point format]] +
-        * 8-byte float (''numpy.float64''): ~15 significant digits * 10E±308 +
-      * Special values: +
-        * [[https://en.wikipedia.org/wiki/NaN|NaN]] (''numpy.nan''): //Not Number// +
-        * Infinity (''-numpy.inf'' and ''numpy.inf''+
-        * Note: it is cleaner to use masks (and [[https://numpy.org/doc/stable/reference/maskedarray.generic.html|Numpy masked arrays]]) than NaNs, when you have to deal with missing values ! +
-    [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]] +
-    * [[https://en.wikipedia.org/wiki/Endianness|Endianness]] +
-    * A rather technical example: we //play// with a numpy 4-byte integer scalar +
-      * <code>>>> one_int32 = np.int32(1) +
->>> one_int32 +
-+
->>> type(one_int32) +
-<class 'numpy.int32'> +
->>> one_int32.dtype +
-dtype('int32'+
->>> one_int32.shape # A numpy SCALAR, is an ARRAY WITH NO SHAPE ! +
-() +
->>> one_int32[0] +
-Traceback (most recent call last): +
-  File "<stdin>", line 1, in <module> +
-IndexError: invalid index to scalar variable. +
->>> one_int32[()] # Note how to access the single element, when there is NO SHAPE +
-+
->>> one_int32.ndim # NO SHAPE means no dimensions, but there is ONE element +
-+
->>> one_int32.size +
-+
->>> one_int32.nbytes # The element requires 4 bytes of storage +
-+
->>> hex(one_int32) # We can print the hexadecimal representation for INTEGERS scalars and arrays +
-'0x1' +
->>> hex(one_int32 * 15) +
-'0xf' +
->>> hex(one_int32 * 16) +
-'0x10'+
  
-# 'Serialize' the data (i.e. change the data to a series of bytes) +=== String formatting examples ===
-# Note: the serialized data seems to be printed in the reverse order of 'hex(one_int32)' +
->>> one_int32_serialized one_int32.tobytes() +
->>> type(one_int32_serialized) +
-<class 'bytes'> +
->>> len(one_int32_serialized) +
-+
->>> one_int32_serialized  +
-b'\x01\x00\x00\x00' +
->>> one_int32_serialized.hex(' ') # Another way to print the hexadecimal values +
-'01 00 00 00'+
  
-# Use the following in the unlikely case where you need to change the endianness (bytes ordering) +You will find below some examples of //quick printing//, as well as using //old style formatting////formatted string literals (f-strings)// and the //String ''format()'' Method//. More details in the next section
->>> one_int32_reversed_endian = one_int32.byteswap() +
->>> one_int32_reversed_endian # Same bytes in a different order represent a different number (of course) +
-16777216 +
->>> hex(one_int32_reversed_endian) # Compare to the output of hex(one_int32) above +
-'0x1000000' +
->>> one_int32_reversed_endian.tobytes() +
-b'\x00\x00\x00\x01'</code> +
-    * Another technical example: we use an array of 2 integers\\ When using ''byteswap()''notice how bytes are swapped by groups of 4 bytes, because int32 use 4 bytes +
-      * <code>>>> array_example = np.asarray((3, 17), dtype=np.int32) +
->>> array_example +
-array([ 3, 17], dtype=int32) +
->>> array_example.shape, array_example.ndim, array_example.size, array_example.nbytes +
-((2,), 1, 2, 8) +
->>> array_example.tobytes().hex(' ', 4) +
-'03000000 11000000' +
->>> array_example.byteswap().tobytes().hex(' ', 4) +
-'00000003 00000011' +
-</code>+
  
-  * Manipulating binary data with [[https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview|bytes, bytearray, memoryview]]+<code python
 +>>> Basic (but quick and efficient) printing
  
-  * Array addressing +>>> year = 1984 
-    [[https://www.geeksforgeeks.org/calculation-of-address-of-element-of-1-d-2-d-and-3-d-using-row-major-and-column-major-order/|Calculation of address of element of 1-D2-Dand 3-D using row-major and column-major order]+>>> print(year) 
-      * In other words: //using indices to go from 1-D to n-Dimnensions data//  +1984 
-    * The [[https://en.wikipedia.org/wiki/Array_(data_structure)|array]] structure +>>> print('['year'is a famous book ]') 
-    * python/C vs Fortran...+1984 is a famous book ]
  
-  * disk and ram usage: how to check the usage (available ram and disk), best practice on multi-user systems (how much allowed?) +>>> # Old style formatting
-    * ''du'', ''df'', ''cat /proc/meminfo'', ''top''+
  
-  * understanding and reverse-engineering //binary// format +>>> print('[ %i is a famous book ]% (year,)) 
-    * ''od'', ''strings''+[ 1984 is a famous book ] 
 +>>> print('[ %10i is a famous book ]% (year,)) 
 +[       1984 is a famous book ] 
 +>>> print('[ %-10i is a famous book ]% (year,)) 
 +[ 1984       is a famous book ] 
 +>>> print('[ %010i is a famous book ]% (year,)) 
 +[ 0000001984 is a famous book ]
  
-  * binary vs text format: ascii, utf, raw +>>> # Formatted string literals (f-strings)
-    * text related functions in python: ''str'', ''int'', ''float'', ''ord'', ... +
-      * lists conversion with ''map'' and ''join''+
  
-  * Misc : ''md5sum''+>>> print(f'[ {year} is a famous book ]')  
 +[ 1984 is a famous book ] 
 +>>> print(f'[ {year=} is a famous book ]') 
 +[ year=1984 is a famous book ] 
 +>>> print(f'[ {year:10} is a famous book ]'
 +[       1984 is a famous book ] 
 +>>> print(f'[ {year:<10} is a famous book ]'
 +[ 1984       is a famous book ] 
 +>>> print(f'[ {year:010} is a famous book ]') 
 +[ 0000001984 is a famous book ] 
 +>>> print(f'[ {year:10.2f} is a famous book (yes, {year}!) ]'
 +[    1984.00 is a famous book (yes, 1984!) ]
  
-==== Strings ====+>>> # The String format() Method
  
-  * Encoding, [[https://en.wikipedia.org/wiki/ASCII|ASCII]][[https://en.wikipedia.org/wiki/Unicode|unicode]], [[https://en.wikipedia.org/wiki/UTF-8|UTF-8]], ...+>>> print('{} is a famous book ]'.format(year)) 
 +1984 is a famous book ] 
 +>>> print('[ {:10} is a famous book ]'.format(year)) 
 +[       1984 is a famous book ] 
 +>>> print('[ {:<10} is a famous book ]'.format(year)) 
 +[ 1984       is a famous book ] 
 +>>> print('[ {:010} is a famous book ]'.format(year)) 
 +0000001984 is a famous book ] 
 +>>> print('{:10.2f} is a famous book  (yes, {}!) ]'.format(yearyear)) 
 +   1984.00 is a famous book  (yes, 1984!) ] 
 +>>> print('{title:10.2f} is a famous book  (yes, {title}!) ]'.format(title=year)) 
 +[    1984.00 is a famous book  (yes, 1984!) ] 
 +>>> print('[ {title:10.2e} is a famous book ]'.format(title=year)) 
 +[   1.98e+03 is a famous book ]</code>
  
-  * Getting the binary representation of a string +=== String formatting references ===
-    * <code>>>> test_string 'A B 0 1 à µ' +
->>> type(test_string) +
-<class 'str'> +
->>> len(test_string) +
-11 +
->>> test_string_bin test_string.encode('utf-8'+
->>> test_string_bin +
-b'A B 0 1 \xc3\xa0 \xc2\xb5' +
->>> type(test_string_bin) +
-<class 'bytes'> +
->>> len(test_string_bin) +
-13 +
->>> test_string_bin.hex('-'+
-'41-20-42-20-30-20-31-20-c3-a0-20-c2-b5' +
-</code>+
  
-===== Checking if a file/directory is writable by the current user =====+  * [[https://docs.python.org/3/tutorial/inputoutput.html#formatted-string-literals|Formatted String Literals]] (//f-strings//
 +    * Available in Python >3.6 
 +    * [[https://docs.python.org/3/reference/lexical_analysis.html#f-strings|More documentation]] 
 +    * [[https://docs.python.org/3/library/string.html#formatspec|Format Specification Mini-Language]] 
 +      * See also the [[https://pyformat.info/|PyFormat site]]
  
-<code>>>> os.access('/', os.W_OK+  * [[https://docs.python.org/3/tutorial/inputoutput.html#the-string-format-method|The String format(Method]] 
-False +    * [[https://docs.python.org/3/library/string.html#formatspec|Format Specification Mini-Language]] 
->>> os.access('/home/jypmce/.bashrc', os.W_OK) +      * See also the [[https://pyformat.info/|PyFormat site]]
-True</code> +
- +
- +
-===== Playing with strings =====+
  
 +  * [[https://pyformat.info/|PyFormat site]]: string formatting using the //old style// and the //String ''format()'' method//
 +    * <wrap hi>Hint</wrap>: this can also be used as an **easy documentation for //f-strings// format**!
  
 +  * [[https://docs.python.org/3/tutorial/inputoutput.html#the-string-format-method|Old string formatting]]
 ==== Splitting (complex) strings ==== ==== Splitting (complex) strings ====
  
Line 192: Line 161:
 ==== Working with paths and filenames ==== ==== Working with paths and filenames ====
  
-If you are in a hurry, you can just use string functions to work with path and file names. But you will need some specific functions to check if a file exists, and similar operations. All these are available in 2 libraries that have similar functions. Both of these libraries can deal with Unix-type paths on Linux computers, and Windows-type paths on Windows computers+If you are in a hurry, you can just use string functions to work with paths and file names.
  
-  * [[https://docs.python.org/3/library/os.path.html|os.path]] //Common pathname manipulations//+ 
 +You will need some specific objects and functions to check if a file exists, and similar operations. Check the libraries listed below, that can automatically deal with Unix-type paths on Linux and MacOS computers, and Windows-type paths on Windows computers 
 + 
 +  * [[https://docs.python.org/3/library/os.path.html|os.path]]//common pathname manipulations//
     * Available since... a long time! Use this if you want to avoid backward compatibility problems     * Available since... a long time! Use this if you want to avoid backward compatibility problems
     * Some functions are directly in [[https://docs.python.org/3/library/os.html|os]] //Miscellaneous operating system interfaces//\\ e.g. [[https://docs.python.org/3/library/os.html#os.remove|os.remove]] and [[https://docs.python.org/3/library/os.html#os.rmdir|os.rmdir]]     * Some functions are directly in [[https://docs.python.org/3/library/os.html|os]] //Miscellaneous operating system interfaces//\\ e.g. [[https://docs.python.org/3/library/os.html#os.remove|os.remove]] and [[https://docs.python.org/3/library/os.html#os.rmdir|os.rmdir]]
-  * [[https://docs.python.org/3/library/pathlib.html|pathlib]] //Object-oriented filesystem paths//+  * [[https://docs.python.org/3/library/pathlib.html|pathlib]]: a **more recent** //object-oriented// way to deal with //filesystem paths//
     * Available since Python version 3.4     * Available since Python version 3.4
     * [[https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]]     * [[https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]]
-  * [[https://docs.python.org/3/library/shutil.html|High-level file operations]]+  * [[https://docs.python.org/3/library/shutil.html|shutil]]: High-level file operations, e.g copy/move a file or directory tree
  
  
-=== Example: getting the full path of the Python used ===+=== Example: getting the full path of the Python executable used ===
  
 Note: the actual python may be different from the default python! Note: the actual python may be different from the default python!
Line 210: Line 182:
 /usr/bin/python /usr/bin/python
  
-$ /modfs/modtools/miniconda3//envs/analyse_3.6_test/bin/python+$ /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/bin/python
 >>> import sys, shutil >>> import sys, shutil
 >>> shutil.which('python') >>> shutil.which('python')
 '/usr/bin/python' '/usr/bin/python'
 >>> sys.executable >>> sys.executable
-'/modfs/modtools/miniconda3//envs/analyse_3.6_test/bin/python'</code>+'/home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/bin/python'</code>
  
  
Line 231: Line 203:
 </code> </code>
  
 +
 +=== Example: system independent paths with pathlib ===
 +
 +Note: the following example was generated on a Linux server and uses a <wrap em>/</wrap> character as a path separator
 +
 +<code>>>> my_home = Path.home()
 +>>> my_home
 +PosixPath('/home/users/my_login')
 +>>> my_conf = my_home / '.config' / 'evince'
 +>>> my_conf
 +PosixPath('/home/users/my_login/.config/evince')
 +>>> my_conf.is_dir()
 +True
 +>>> my_conf.is_file()
 +False
 +>>> list(my_conf.glob('*'))
 +[PosixPath('/home/users/my_login/.config/evince/evince_toolbar.xml'), PosixPath(' /home/users/my_login/.config/evince/accels')]
 +>>> [ ff.name for ff in my_conf.glob('*') ]
 +['evince_toolbar.xml', 'accels']
 +</code>
  
 === Example: getting the size(s) of all the files in a directory === === Example: getting the size(s) of all the files in a directory ===
Line 417: Line 409:
 ['c', 'd', 'b', 'a']</code> ['c', 'd', 'b', 'a']</code>
  
 +
 +===== Efficient looping with numpy, map, itertools and list comprehension =====
 +
 +<wrap hi>Big, nested, explicit ''for'' loops should be avoided at all cost</wrap>, in order to reduce a script execution time!
 +
 +  * **''numpy'' arrays** should be used when dealing with //numerical data//
 +    * **Masked arrays** can be used to deal with //special cases// and remove tests from loops
 +
 +  * The built-in [[https://docs.python.org/3/library/functions.html?highlight=map#map|map]] function (and similar functions like [[https://docs.python.org/3/library/functions.html?highlight=zip#zip|zip]], [[https://docs.python.org/3/library/functions.html?highlight=filter#filter|filter]], ...) can be used to efficiently apply a function (possibly a //simple// [[https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions|lambda]] function) to all the elements of a list
 +    * <code>>>> my_ints = [1, 2, 3]
 +
 +>>> map(str, my_ints)
 +['1', '2', '3']
 +
 +>>> map(lambda ii: str(10*ii + 5), my_ints)
 +['15', '25', '35']</code>
 +
 +  * The [[https://docs.python.org/3/library/itertools.html|itertools]] module defines many more fancy iterators that can be used for efficient looping
 +    * Example: replacing nested loops with [[https://docs.python.org/3/library/itertools.html#itertools.product|product]]
 +      * <code>>>> it.product('AB', '01')
 +<itertools.product object at 0x2b35a7b5f100>
 +
 +>>> list(it.product('AB', '01'))
 +[('A', '0'), ('A', '1'), ('B', '0'), ('B', '1')]
 +
 +>>> for c1, c2 in it.product('AB', '01'):
 +...   print(c1 + c2)
 +...
 +A0
 +A1
 +B0
 +B1
 +
 +>>> for c1, c2 in it.product(['A', 'B'], ['0', '1']):
 +...   print(c1 + c2)
 +...
 +A0
 +A1
 +B0
 +B1
 +
 +>>> for c1, c2, c3 in it.product('AB', '01', '$!'):
 +...   print(c1 + c2 + c3, end=', ')
 +...
 +A0$, A0!, A1$, A1!, B0$, B0!, B1$, B1!,</code>
 +
 +  * The [[https://docs.python.org/3/tutorial/datastructures.html?highlight=comprehension#list-comprehensions|list comprehension]] (aka //implicit loops//) can also be used to generate lists from lists
 +    * Example: converting a list of integers to a list of strings\\ Note: in that case, you should rather use the ''map'' function detailed above
 +      * <code>>>> my_ints = [1, 2, 3]
 +
 +>>> [ str(ii) for ii in my_ints ]
 +['1', '2', '3']</code>
 ===== numpy related stuff ===== ===== numpy related stuff =====
  
Line 579: Line 623:
 array([3. , 4.5, 8. ])</code> array([3. , 4.5, 8. ])</code>
  
 +==== Exercise your brain with numpy ====
 +
 +Have a look at [[https://github.com/rougier/numpy-100/blob/master/100_Numpy_exercises.ipynb|100 numpy exercises]]
  
 ===== matplotlib related stuff ===== ===== matplotlib related stuff =====
Line 587: Line 634:
   * [[https://matplotlib.org/stable/gallery/index.html#ticks|Ticks examples' gallery]]   * [[https://matplotlib.org/stable/gallery/index.html#ticks|Ticks examples' gallery]]
   * [[https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html|Date tick labels example]]   * [[https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html|Date tick labels example]]
 +
 +
 +===== Data representation =====
 +
 +A few notes for a future section or page about about //data representation// (bits and bytes) on disk and in memory, vs //data format//
 +
 +FIXME Add parts (pages 28 to 37) of this [[http://www.lsce.ipsl.fr/Phocea/file.php?class=page&file=5/pythonCDAT_jyp_2sur2_070306.pdf|old tutorial]] to this section
 +
 +==== Base notions ====
 +
 +  * **Never forget** that all the bits and pieces of information we use are coded in [[https://en.wikipedia.org/wiki/Binary_number#Counting_in_binary|base 2]] (''0''s and ''1''s ...), grouped in bytes!
 +    * Some things can be stored exactly (integers, characters, ...)
 +    * In other cases (**//real// numbers** that we work with all the time, compressed images/videos/music) we only store **//good enough approximation//**
 +
 +  * 1 byte <=> 8 bits
 +    * ''REAL*4'' <=> 4 bytes <=> 32 bits
 +    * For easier written/displayed representation, 1 byte is usually split into 2 groups of 4 bits, and displayed using base 16 and [[https://en.wikipedia.org/wiki/Hexadecimal|hexadecimal representation]] (characters ''0'', ''1'', ..., ''A'', ''B'', ..., ''F'')
 +      * ''0000'' <=> ''0'',\\ ''0010'' <=> ''1'', ...,\\ ''1111'' <=> ''F''
 +      * ''1101'' <=> ''D'' in hexadecimal <=> ''13'' in decimal (''**1** * 8 + **1** * 4 + **0** * 2 + **1** * 1'')
 +      * ''11111101'' in //base 2// <=> ''1111 1101'' <=> ''FD'' in //hexadecimal// <=> ''253'' (''15 * 16 + 13'') in //decimal//
 +
 +  * Base conversion with Python
 +    * <code>>>> hex(13) # Decimal to Hexadecimal conversion
 +'0xd'
 +>>> hex(253)
 +'0xfd'
 +>>> hex(256)
 +'0x100'
 +>>> int('0x100', 16) # Hexadecimal to Decimal conversion
 +256
 +>>> int('1111', 2) # Binary to Decimal conversion
 +15
 +>>> int('11111101', 2) # '11111101' <=> '1111 1101' <=> 'FD' <=> 15 * 16 + 13 = 253
 +253
 +>>> 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0
 +11
 +>>> int('13', 8) # 1*8 + 3
 +11</code>
 +
 +  * More technical topics
 +    * [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]]: the art of ordering bits, everything about MSB (Most Significant Byte) and LSB (Least Significant Byte)
 +    * [[https://en.wikipedia.org/wiki/Endianness|Endianness]]: the art of ordering bytes
 +==== Numerical values ====
 +
 +  * Binary data representation of some numbers (only some common types are listed here):
 +    * Languages and packages **references** used below:
 +      * Python: [[https://numpy.org/doc/stable/reference/arrays.scalars.html#sized-aliases|NumPy Sized aliases]]
 +      * NetCDF: [[https://docs.unidata.ucar.edu/nug/current/md_types.html|Data Types]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|Fortran related Data Types]], [[https://docs.unidata.ucar.edu/nug/current/_c_d_l.html#cdl_data_types|CDL Data Types]]
 +      * Fortran: Intel Fortran Compiler [[https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-1/intrinsic-data-types.html|Intrinsic Data Types]]
 +    * [[https://en.wikipedia.org/wiki/Integer_(computer_science)|Integers]]
 +      * Range:
 +        * 4-byte //signed// integers: ''−2,147,483,648'' to ''2,147,483,647''
 +          * Python: ''numpy.int32''
 +          * NetCDF: ''int'', ''NC_INT'' or ''NC_LONG'', ''NF90_INT''
 +          * Fortran: ''INTEGER*4''
 +        * 8-byte //signed// integers: ''−9,223,372,036,854,775,808'' to ''9,223,372,036,854,775,807''
 +          * Python: ''numpy.int64''
 +          * NetCDF: ''int64'', ''NC_INT64''
 +          * Fortran: ''INTEGER*8''
 +      * Tech note: signed integers use [[https://en.wikipedia.org/wiki/Two%27s_complement|two's complement]] for coding negative integers
 +    * [[https://en.wikipedia.org/wiki/IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//)
 +      * Range:
 +        * 4-byte float: ''~8 significant digits * 10E±38''
 +          * Python: ''numpy.float32''
 +          * NetCDF: ''float'', ''NC-FLOAT'', ''NF90_FLOAT''
 +          * Fortran:''REAL*4''
 +          * See also [[https://en.wikipedia.org/wiki/Single-precision_floating-point_format|Single-precision floating-point format]]
 +        * 8-byte float: ''~15 significant digits * 10E±308''
 +          * Python: ''numpy.float64''
 +          * NetCDF: ''double'', ''NC_DOUBLE'', ''NF90_DOUBLE''
 +          * Fortran: ''REAL*8''
 +      * **Special values**:
 +        * [[https://en.wikipedia.org/wiki/NaN|NaN]]: //Not a Number//
 +          * Python: ''numpy.nan''
 +        * Infinity
 +          * Python: ''-numpy.inf'' and ''numpy.inf''
 +        * Note: it is cleaner to use masks (and [[https://numpy.org/doc/stable/reference/maskedarray.generic.html|Numpy masked arrays]]) rather than ''NaN''s, when you have to deal with missing values !
 +      * <wrap hi>The RISKS of working with (the wrong) floats</wrap>:
 +        * [[https://en.wikipedia.org/wiki/Round-off_error|Round-off error]]
 +        * [[https://en.wikipedia.org/wiki/Catastrophic_cancellation|Catastrophic cancellation]]
 +          * [[https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html|What Every Computer Scientist Should Know About Floating-Point Arithmetic]]
 +    * A rather technical example: we //play// with a numpy 4-byte integer scalar
 +      * <code>>>> one_int32 = np.int32(1)
 +>>> one_int32
 +1
 +>>> type(one_int32)
 +<class 'numpy.int32'>
 +>>> one_int32.dtype
 +dtype('int32')
 +>>> one_int32.shape # A numpy SCALAR, is an ARRAY WITH NO SHAPE !
 +()
 +>>> one_int32[0]
 +Traceback (most recent call last):
 +  File "<stdin>", line 1, in <module>
 +IndexError: invalid index to scalar variable.
 +>>> one_int32[()] # Note how to access the single element, when there is NO SHAPE
 +1
 +>>> one_int32.ndim # NO SHAPE means no dimensions, but there is ONE element
 +0
 +>>> one_int32.size
 +1
 +>>> one_int32.nbytes # The element requires 4 bytes of storage
 +4
 +>>> hex(one_int32) # We can print the hexadecimal representation for INTEGERS scalars and arrays
 +'0x1'
 +>>> hex(one_int32 * 15)
 +'0xf'
 +>>> hex(one_int32 * 16)
 +'0x10'
 +
 +# 'Serialize' the data (i.e. change the data to a series of bytes)
 +# Note: the serialized data seems to be printed in the reverse order of 'hex(one_int32)'
 +>>> one_int32_serialized = one_int32.tobytes()
 +>>> type(one_int32_serialized)
 +<class 'bytes'>
 +>>> len(one_int32_serialized)
 +4
 +>>> one_int32_serialized 
 +b'\x01\x00\x00\x00'
 +>>> one_int32_serialized.hex(' ') # Another way to print the hexadecimal values
 +'01 00 00 00'
 +
 +# Use the following in the unlikely case where you need to change the endianness (bytes ordering)
 +>>> one_int32_reversed_endian = one_int32.byteswap()
 +>>> one_int32_reversed_endian # Same bytes in a different order represent a different number (of course)
 +16777216
 +>>> hex(one_int32_reversed_endian) # Compare to the output of hex(one_int32) above
 +'0x1000000'
 +>>> one_int32_reversed_endian.tobytes()
 +b'\x00\x00\x00\x01'</code>
 +    * Another technical example: we use an array of 2 integers\\ When using ''byteswap()'', notice how bytes are swapped by groups of 4 bytes, because int32 use 4 bytes
 +      * <code>>>> array_example = np.asarray((3, 17), dtype=np.int32)
 +>>> array_example
 +array([ 3, 17], dtype=int32)
 +>>> array_example.shape, array_example.ndim, array_example.size, array_example.nbytes
 +((2,), 1, 2, 8)
 +>>> array_example.tobytes().hex(' ', 4)
 +'03000000 11000000'
 +>>> array_example.byteswap().tobytes().hex(' ', 4)
 +'00000003 00000011'
 +</code>
 +
 +  * Manipulating binary data with [[https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview|bytes, bytearray, memoryview]]
 +
 +  * Array addressing
 +    * [[https://www.geeksforgeeks.org/calculation-of-address-of-element-of-1-d-2-d-and-3-d-using-row-major-and-column-major-order/|Calculation of address of element of 1-D, 2-D, and 3-D using row-major and column-major order]]
 +      * In other words: //using indices to go from 1-D to n-Dimnensions data// 
 +    * The [[https://en.wikipedia.org/wiki/Array_(data_structure)|array]] structure
 +    * python/C vs Fortran...
 +
 +  * disk and ram usage: how to check the usage (available ram and disk), best practice on multi-user systems (how much allowed?)
 +    * ''du'', ''df'', ''cat /proc/meminfo'', ''top''
 +
 +  * understanding and reverse-engineering //binary// format
 +    * ''od'', ''strings''
 +
 +  * binary vs text format: ascii, utf, raw
 +    * text related functions in python: ''str'', ''int'', ''float'', ''ord'', ...
 +      * lists conversion with ''map'' and ''join''
 +
 +  * Misc : ''md5sum''
 +
 +==== Strings ====
 +
 +  * Encoding, [[https://en.wikipedia.org/wiki/ASCII|ASCII]], [[https://en.wikipedia.org/wiki/Unicode|unicode]], [[https://en.wikipedia.org/wiki/UTF-8|UTF-8]], ...
 +
 +  * Getting the binary representation of a string
 +    * <code>>>> test_string = 'A B 0 1 à µ'
 +>>> type(test_string)
 +<class 'str'>
 +>>> len(test_string)
 +11
 +>>> test_string_bin = test_string.encode('utf-8')
 +>>> test_string_bin
 +b'A B 0 1 \xc3\xa0 \xc2\xb5'
 +>>> type(test_string_bin)
 +<class 'bytes'>
 +>>> len(test_string_bin)
 +13
 +>>> test_string_bin.hex('-')
 +'41-20-42-20-30-20-31-20-c3-a0-20-c2-b5'
 +</code>
  
  
other/python/misc_by_jyp.1682691109.txt.gz · Last modified: 2023/04/28 16:11 by jypeter

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki