Both sides previous revisionPrevious revisionNext revision | Previous revision |
other:python:misc_by_jyp [2023/04/28 16:11] – [Numerical values] Added low level array addressing jypeter | other:python:misc_by_jyp [2024/11/04 15:01] (current) – [Extra tutorials] Added links to ruff and flake8 jypeter |
---|
</WRAP> | </WRAP> |
| |
| ===== Extra tutorials ===== |
| |
| Only **when you have already read all the content of this page several times**, and you are looking for new ideas |
| |
| * [[https://medium.com/pythons-gurus/clean-code-in-python-good-vs-bad-practices-examples-2df344bddacc|Clean Code in Python: Good vs. Bad Practices Examples]] |
| * [[https://peps.python.org/pep-0008/|PEP 8 – Style Guide for Python Code]] |
| * [[https://realpython.com/python-pep8/|How to Write Beautiful Python Code With PEP 8]] |
| * [[https://www.datacamp.com/tutorial/pep8-tutorial-python-code|PEP-8 Tutorial: Code Standards in Python]] |
| * Some checkers/linters: [[https://docs.astral.sh/ruff/|ruff]], [[https://flake8.pycqa.org/en/stable/|flake8]] |
| * [[https://medium.com/@yaduvanshineelam09/ultimate-python-cheat-sheet-practical-python-for-everyday-tasks-8a33abc0892f|Ultimate Python Cheat Sheet: Practical Python For Everyday Tasks]] |
| * [[https://medium.com/pythoneers/16-hacks-that-will-take-your-python-skills-to-the-next-level-12e7a9b97421|16 Hacks That Will Take Your Python Skills to the Next Level]] |
| * [[https://levelup.gitconnected.com/modular-coding-in-python-finally-solve-your-import-errors-af2fd172fcf7|Modular Coding in Python: Finally Solve your Import Errors]] (understanding and fixing ModuleNotFoundError and ImportError) |
| * [[https://medium.com/@moraneus/understanding-multithreading-and-multiprocessing-in-python-1ed39bb078d5|Understanding Multithreading and Multiprocessing in Python]] |
===== Reading/setting environments variables ===== | ===== Reading/setting environments variables ===== |
| |
| |
| |
| ===== Using log files (aka logging) ===== |
| |
| It is always possible to display information messages using the ''print()'' command, but it is more efficient to use //logging// tools when you want to **display correctly a lot of information about a script progress |
| ** |
| * [[https://loguru.readthedocs.io/|Loguru]] is a library which aims to bring enjoyable logging in Python |
| * See also [[https://betterstack.com/community/guides/logging/loguru/|A Complete Guide to Logging in Python with Loguru]] |
| * More on [[https://betterstack.com/community/guides/logging/#python|logging with python]] |
| * The default (but not easy to use) Python ''[[https://docs.python.org/3/library/logging.html|logging]]'' module |
===== Stopping a script ===== | ===== Stopping a script ===== |
| |
| |
<code>sys.exit('Some optional message about why we are stopping')</code> | <code>sys.exit('Some optional message about why we are stopping')</code> |
| ===== Checking if a file/directory is writable by the current user ===== |
| |
===== Data representation ===== | <code>>>> os.access('/', os.W_OK) |
| False |
| >>> os.access('/home/jypmce/.bashrc', os.W_OK) |
| True</code> |
| |
A few notes for a future section or page about about //data representation// (bits and bytes) on disk and in memory, vs //data format// | |
| |
| ===== Playing with strings ===== |
| |
==== Numerical values ==== | ==== String formatting ==== |
| |
* Binary data representation of some numbers: | * Knowing how to display/print a string correctly is always useful for information and debugging purpose |
* [[https://en.wikipedia.org/wiki/Integer_(computer_science)|Integers]] | * There are lots of different ways to display strings |
* Range: | |
* 4-byte integers (''numpy.int32''): −2,147,483,648 to 2,147,483,647 | |
* 8-byte integers (''numpy.int64''): −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | |
* Tech note: signed integers use [[https://en.wikipedia.org/wiki/Two%27s_complement|two's complement]] for coding negative integers | |
* [[https://en.wikipedia.org/wiki/IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//) | |
* Range: | |
* 4-byte float (''numpy.float32''): ~8 significant digits * 10E±38 | |
* See also [[https://en.wikipedia.org/wiki/Single-precision_floating-point_format|Single-precision floating-point format]] | |
* 8-byte float (''numpy.float64''): ~15 significant digits * 10E±308 | |
* Special values: | |
* [[https://en.wikipedia.org/wiki/NaN|NaN]] (''numpy.nan''): //Not a Number// | |
* Infinity (''-numpy.inf'' and ''numpy.inf'') | |
* Note: it is cleaner to use masks (and [[https://numpy.org/doc/stable/reference/maskedarray.generic.html|Numpy masked arrays]]) than NaNs, when you have to deal with missing values ! | |
* [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]] | |
* [[https://en.wikipedia.org/wiki/Endianness|Endianness]] | |
* A rather technical example: we //play// with a numpy 4-byte integer scalar | |
* <code>>>> one_int32 = np.int32(1) | |
>>> one_int32 | |
1 | |
>>> type(one_int32) | |
<class 'numpy.int32'> | |
>>> one_int32.dtype | |
dtype('int32') | |
>>> one_int32.shape # A numpy SCALAR, is an ARRAY WITH NO SHAPE ! | |
() | |
>>> one_int32[0] | |
Traceback (most recent call last): | |
File "<stdin>", line 1, in <module> | |
IndexError: invalid index to scalar variable. | |
>>> one_int32[()] # Note how to access the single element, when there is NO SHAPE | |
1 | |
>>> one_int32.ndim # NO SHAPE means no dimensions, but there is ONE element | |
0 | |
>>> one_int32.size | |
1 | |
>>> one_int32.nbytes # The element requires 4 bytes of storage | |
4 | |
>>> hex(one_int32) # We can print the hexadecimal representation for INTEGERS scalars and arrays | |
'0x1' | |
>>> hex(one_int32 * 15) | |
'0xf' | |
>>> hex(one_int32 * 16) | |
'0x10' | |
| |
# 'Serialize' the data (i.e. change the data to a series of bytes) | === String formatting examples === |
# Note: the serialized data seems to be printed in the reverse order of 'hex(one_int32)' | |
>>> one_int32_serialized = one_int32.tobytes() | |
>>> type(one_int32_serialized) | |
<class 'bytes'> | |
>>> len(one_int32_serialized) | |
4 | |
>>> one_int32_serialized | |
b'\x01\x00\x00\x00' | |
>>> one_int32_serialized.hex(' ') # Another way to print the hexadecimal values | |
'01 00 00 00' | |
| |
# Use the following in the unlikely case where you need to change the endianness (bytes ordering) | You will find below some examples of //quick printing//, as well as using //old style formatting//, //formatted string literals (f-strings)// and the //String ''format()'' Method//. More details in the next section |
>>> one_int32_reversed_endian = one_int32.byteswap() | |
>>> one_int32_reversed_endian # Same bytes in a different order represent a different number (of course) | |
16777216 | |
>>> hex(one_int32_reversed_endian) # Compare to the output of hex(one_int32) above | |
'0x1000000' | |
>>> one_int32_reversed_endian.tobytes() | |
b'\x00\x00\x00\x01'</code> | |
* Another technical example: we use an array of 2 integers\\ When using ''byteswap()'', notice how bytes are swapped by groups of 4 bytes, because int32 use 4 bytes | |
* <code>>>> array_example = np.asarray((3, 17), dtype=np.int32) | |
>>> array_example | |
array([ 3, 17], dtype=int32) | |
>>> array_example.shape, array_example.ndim, array_example.size, array_example.nbytes | |
((2,), 1, 2, 8) | |
>>> array_example.tobytes().hex(' ', 4) | |
'03000000 11000000' | |
>>> array_example.byteswap().tobytes().hex(' ', 4) | |
'00000003 00000011' | |
</code> | |
| |
* Manipulating binary data with [[https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview|bytes, bytearray, memoryview]] | <code python> |
| >>> # Basic (but quick and efficient) printing |
| |
* Array addressing | >>> year = 1984 |
* [[https://www.geeksforgeeks.org/calculation-of-address-of-element-of-1-d-2-d-and-3-d-using-row-major-and-column-major-order/|Calculation of address of element of 1-D, 2-D, and 3-D using row-major and column-major order]] | >>> print(year) |
* In other words: //using indices to go from 1-D to n-Dimnensions data// | 1984 |
* The [[https://en.wikipedia.org/wiki/Array_(data_structure)|array]] structure | >>> print('[', year, 'is a famous book ]') |
* python/C vs Fortran... | [ 1984 is a famous book ] |
| |
* disk and ram usage: how to check the usage (available ram and disk), best practice on multi-user systems (how much allowed?) | >>> # Old style formatting |
* ''du'', ''df'', ''cat /proc/meminfo'', ''top'' | |
| |
* understanding and reverse-engineering //binary// format | >>> print('[ %i is a famous book ]' % (year,)) |
* ''od'', ''strings'' | [ 1984 is a famous book ] |
| >>> print('[ %10i is a famous book ]' % (year,)) |
| [ 1984 is a famous book ] |
| >>> print('[ %-10i is a famous book ]' % (year,)) |
| [ 1984 is a famous book ] |
| >>> print('[ %010i is a famous book ]' % (year,)) |
| [ 0000001984 is a famous book ] |
| |
* binary vs text format: ascii, utf, raw | >>> # Formatted string literals (f-strings) |
* text related functions in python: ''str'', ''int'', ''float'', ''ord'', ... | |
* lists conversion with ''map'' and ''join'' | |
| |
* Misc : ''md5sum'' | >>> print(f'[ {year} is a famous book ]') |
| [ 1984 is a famous book ] |
| >>> print(f'[ {year=} is a famous book ]') |
| [ year=1984 is a famous book ] |
| >>> print(f'[ {year:10} is a famous book ]') |
| [ 1984 is a famous book ] |
| >>> print(f'[ {year:<10} is a famous book ]') |
| [ 1984 is a famous book ] |
| >>> print(f'[ {year:010} is a famous book ]') |
| [ 0000001984 is a famous book ] |
| >>> print(f'[ {year:10.2f} is a famous book (yes, {year}!) ]') |
| [ 1984.00 is a famous book (yes, 1984!) ] |
| |
==== Strings ==== | >>> # The String format() Method |
| |
* Encoding, [[https://en.wikipedia.org/wiki/ASCII|ASCII]], [[https://en.wikipedia.org/wiki/Unicode|unicode]], [[https://en.wikipedia.org/wiki/UTF-8|UTF-8]], ... | >>> print('[ {} is a famous book ]'.format(year)) |
| [ 1984 is a famous book ] |
| >>> print('[ {:10} is a famous book ]'.format(year)) |
| [ 1984 is a famous book ] |
| >>> print('[ {:<10} is a famous book ]'.format(year)) |
| [ 1984 is a famous book ] |
| >>> print('[ {:010} is a famous book ]'.format(year)) |
| [ 0000001984 is a famous book ] |
| >>> print('[ {:10.2f} is a famous book (yes, {}!) ]'.format(year, year)) |
| [ 1984.00 is a famous book (yes, 1984!) ] |
| >>> print('[ {title:10.2f} is a famous book (yes, {title}!) ]'.format(title=year)) |
| [ 1984.00 is a famous book (yes, 1984!) ] |
| >>> print('[ {title:10.2e} is a famous book ]'.format(title=year)) |
| [ 1.98e+03 is a famous book ]</code> |
| |
* Getting the binary representation of a string | === String formatting references === |
* <code>>>> test_string = 'A B 0 1 à µ' | |
>>> type(test_string) | |
<class 'str'> | |
>>> len(test_string) | |
11 | |
>>> test_string_bin = test_string.encode('utf-8') | |
>>> test_string_bin | |
b'A B 0 1 \xc3\xa0 \xc2\xb5' | |
>>> type(test_string_bin) | |
<class 'bytes'> | |
>>> len(test_string_bin) | |
13 | |
>>> test_string_bin.hex('-') | |
'41-20-42-20-30-20-31-20-c3-a0-20-c2-b5' | |
</code> | |
| |
===== Checking if a file/directory is writable by the current user ===== | * [[https://docs.python.org/3/tutorial/inputoutput.html#formatted-string-literals|Formatted String Literals]] (//f-strings//) |
| * Available in Python >= 3.6 |
| * [[https://docs.python.org/3/reference/lexical_analysis.html#f-strings|More documentation]] |
| * [[https://docs.python.org/3/library/string.html#formatspec|Format Specification Mini-Language]] |
| * See also the [[https://pyformat.info/|PyFormat site]] |
| |
<code>>>> os.access('/', os.W_OK) | * [[https://docs.python.org/3/tutorial/inputoutput.html#the-string-format-method|The String format() Method]] |
False | * [[https://docs.python.org/3/library/string.html#formatspec|Format Specification Mini-Language]] |
>>> os.access('/home/jypmce/.bashrc', os.W_OK) | * See also the [[https://pyformat.info/|PyFormat site]] |
True</code> | |
| |
| |
===== Playing with strings ===== | |
| |
| * [[https://pyformat.info/|PyFormat site]]: string formatting using the //old style// and the //String ''format()'' method// |
| * <wrap hi>Hint</wrap>: this can also be used as an **easy documentation for //f-strings// format**! |
| |
| * [[https://docs.python.org/3/tutorial/inputoutput.html#the-string-format-method|Old string formatting]] |
==== Splitting (complex) strings ==== | ==== Splitting (complex) strings ==== |
| |
==== Working with paths and filenames ==== | ==== Working with paths and filenames ==== |
| |
If you are in a hurry, you can just use string functions to work with path and file names. But you will need some specific functions to check if a file exists, and similar operations. All these are available in 2 libraries that have similar functions. Both of these libraries can deal with Unix-type paths on Linux computers, and Windows-type paths on Windows computers | If you are in a hurry, you can just use string functions to work with paths and file names. |
| |
* [[https://docs.python.org/3/library/os.path.html|os.path]] //Common pathname manipulations// | |
| You will need some specific objects and functions to check if a file exists, and similar operations. Check the libraries listed below, that can automatically deal with Unix-type paths on Linux and MacOS computers, and Windows-type paths on Windows computers |
| |
| * [[https://docs.python.org/3/library/os.path.html|os.path]]: //common pathname manipulations// |
* Available since... a long time! Use this if you want to avoid backward compatibility problems | * Available since... a long time! Use this if you want to avoid backward compatibility problems |
* Some functions are directly in [[https://docs.python.org/3/library/os.html|os]] //Miscellaneous operating system interfaces//\\ e.g. [[https://docs.python.org/3/library/os.html#os.remove|os.remove]] and [[https://docs.python.org/3/library/os.html#os.rmdir|os.rmdir]] | * Some functions are directly in [[https://docs.python.org/3/library/os.html|os]] //Miscellaneous operating system interfaces//\\ e.g. [[https://docs.python.org/3/library/os.html#os.remove|os.remove]] and [[https://docs.python.org/3/library/os.html#os.rmdir|os.rmdir]] |
* [[https://docs.python.org/3/library/pathlib.html|pathlib]] //Object-oriented filesystem paths// | * [[https://docs.python.org/3/library/pathlib.html|pathlib]]: a **more recent** //object-oriented// way to deal with //filesystem paths// |
* Available since Python version 3.4 | * Available since Python version 3.4 |
* [[https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]] | * [[https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module|Matching pathlib, and os or os.path functions]] |
* [[https://docs.python.org/3/library/shutil.html|High-level file operations]] | * [[https://docs.python.org/3/library/shutil.html|shutil]]: High-level file operations, e.g copy/move a file or directory tree |
| |
| |
=== Example: getting the full path of the Python used === | === Example: getting the full path of the Python executable used === |
| |
Note: the actual python may be different from the default python! | Note: the actual python may be different from the default python! |
/usr/bin/python | /usr/bin/python |
| |
$ /modfs/modtools/miniconda3//envs/analyse_3.6_test/bin/python | $ /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/bin/python |
>>> import sys, shutil | >>> import sys, shutil |
>>> shutil.which('python') | >>> shutil.which('python') |
'/usr/bin/python' | '/usr/bin/python' |
>>> sys.executable | >>> sys.executable |
'/modfs/modtools/miniconda3//envs/analyse_3.6_test/bin/python'</code> | '/home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/bin/python'</code> |
| |
| |
</code> | </code> |
| |
| |
| === Example: system independent paths with pathlib === |
| |
| Note: the following example was generated on a Linux server and uses a <wrap em>/</wrap> character as a path separator |
| |
| <code>>>> my_home = Path.home() |
| >>> my_home |
| PosixPath('/home/users/my_login') |
| >>> my_conf = my_home / '.config' / 'evince' |
| >>> my_conf |
| PosixPath('/home/users/my_login/.config/evince') |
| >>> my_conf.is_dir() |
| True |
| >>> my_conf.is_file() |
| False |
| >>> list(my_conf.glob('*')) |
| [PosixPath('/home/users/my_login/.config/evince/evince_toolbar.xml'), PosixPath(' /home/users/my_login/.config/evince/accels')] |
| >>> [ ff.name for ff in my_conf.glob('*') ] |
| ['evince_toolbar.xml', 'accels'] |
| </code> |
| |
=== Example: getting the size(s) of all the files in a directory === | === Example: getting the size(s) of all the files in a directory === |
['c', 'd', 'b', 'a']</code> | ['c', 'd', 'b', 'a']</code> |
| |
| |
| ===== Efficient looping with numpy, map, itertools and list comprehension ===== |
| |
| <wrap hi>Big, nested, explicit ''for'' loops should be avoided at all cost</wrap>, in order to reduce a script execution time! |
| |
| * **''numpy'' arrays** should be used when dealing with //numerical data// |
| * **Masked arrays** can be used to deal with //special cases// and remove tests from loops |
| |
| * The built-in [[https://docs.python.org/3/library/functions.html?highlight=map#map|map]] function (and similar functions like [[https://docs.python.org/3/library/functions.html?highlight=zip#zip|zip]], [[https://docs.python.org/3/library/functions.html?highlight=filter#filter|filter]], ...) can be used to efficiently apply a function (possibly a //simple// [[https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions|lambda]] function) to all the elements of a list |
| * <code>>>> my_ints = [1, 2, 3] |
| |
| >>> map(str, my_ints) |
| ['1', '2', '3'] |
| |
| >>> map(lambda ii: str(10*ii + 5), my_ints) |
| ['15', '25', '35']</code> |
| |
| * The [[https://docs.python.org/3/library/itertools.html|itertools]] module defines many more fancy iterators that can be used for efficient looping |
| * Example: replacing nested loops with [[https://docs.python.org/3/library/itertools.html#itertools.product|product]] |
| * <code>>>> it.product('AB', '01') |
| <itertools.product object at 0x2b35a7b5f100> |
| |
| >>> list(it.product('AB', '01')) |
| [('A', '0'), ('A', '1'), ('B', '0'), ('B', '1')] |
| |
| >>> for c1, c2 in it.product('AB', '01'): |
| ... print(c1 + c2) |
| ... |
| A0 |
| A1 |
| B0 |
| B1 |
| |
| >>> for c1, c2 in it.product(['A', 'B'], ['0', '1']): |
| ... print(c1 + c2) |
| ... |
| A0 |
| A1 |
| B0 |
| B1 |
| |
| >>> for c1, c2, c3 in it.product('AB', '01', '$!'): |
| ... print(c1 + c2 + c3, end=', ') |
| ... |
| A0$, A0!, A1$, A1!, B0$, B0!, B1$, B1!,</code> |
| |
| * The [[https://docs.python.org/3/tutorial/datastructures.html?highlight=comprehension#list-comprehensions|list comprehension]] (aka //implicit loops//) can also be used to generate lists from lists |
| * Example: converting a list of integers to a list of strings\\ Note: in that case, you should rather use the ''map'' function detailed above |
| * <code>>>> my_ints = [1, 2, 3] |
| |
| >>> [ str(ii) for ii in my_ints ] |
| ['1', '2', '3']</code> |
===== numpy related stuff ===== | ===== numpy related stuff ===== |
| |
array([3. , 4.5, 8. ])</code> | array([3. , 4.5, 8. ])</code> |
| |
| ==== Exercise your brain with numpy ==== |
| |
| Have a look at [[https://github.com/rougier/numpy-100/blob/master/100_Numpy_exercises.ipynb|100 numpy exercises]] |
| |
===== matplotlib related stuff ===== | ===== matplotlib related stuff ===== |
* [[https://matplotlib.org/stable/gallery/index.html#ticks|Ticks examples' gallery]] | * [[https://matplotlib.org/stable/gallery/index.html#ticks|Ticks examples' gallery]] |
* [[https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html|Date tick labels example]] | * [[https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html|Date tick labels example]] |
| |
| |
| ===== Data representation ===== |
| |
| A few notes for a future section or page about about //data representation// (bits and bytes) on disk and in memory, vs //data format// |
| |
| FIXME Add parts (pages 28 to 37) of this [[http://www.lsce.ipsl.fr/Phocea/file.php?class=page&file=5/pythonCDAT_jyp_2sur2_070306.pdf|old tutorial]] to this section |
| |
| ==== Base notions ==== |
| |
| * **Never forget** that all the bits and pieces of information we use are coded in [[https://en.wikipedia.org/wiki/Binary_number#Counting_in_binary|base 2]] (''0''s and ''1''s ...), grouped in bytes! |
| * Some things can be stored exactly (integers, characters, ...) |
| * In other cases (**//real// numbers** that we work with all the time, compressed images/videos/music) we only store **//good enough approximation//** |
| |
| * 1 byte <=> 8 bits |
| * ''REAL*4'' <=> 4 bytes <=> 32 bits |
| * For easier written/displayed representation, 1 byte is usually split into 2 groups of 4 bits, and displayed using base 16 and [[https://en.wikipedia.org/wiki/Hexadecimal|hexadecimal representation]] (characters ''0'', ''1'', ..., ''A'', ''B'', ..., ''F'') |
| * ''0000'' <=> ''0'',\\ ''0010'' <=> ''1'', ...,\\ ''1111'' <=> ''F'' |
| * ''1101'' <=> ''D'' in hexadecimal <=> ''13'' in decimal (''**1** * 8 + **1** * 4 + **0** * 2 + **1** * 1'') |
| * ''11111101'' in //base 2// <=> ''1111 1101'' <=> ''FD'' in //hexadecimal// <=> ''253'' (''15 * 16 + 13'') in //decimal// |
| |
| * Base conversion with Python |
| * <code>>>> hex(13) # Decimal to Hexadecimal conversion |
| '0xd' |
| >>> hex(253) |
| '0xfd' |
| >>> hex(256) |
| '0x100' |
| >>> int('0x100', 16) # Hexadecimal to Decimal conversion |
| 256 |
| >>> int('1111', 2) # Binary to Decimal conversion |
| 15 |
| >>> int('11111101', 2) # '11111101' <=> '1111 1101' <=> 'FD' <=> 15 * 16 + 13 = 253 |
| 253 |
| >>> 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0 |
| 11 |
| >>> int('13', 8) # 1*8 + 3 |
| 11</code> |
| |
| * More technical topics |
| * [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]]: the art of ordering bits, everything about MSB (Most Significant Byte) and LSB (Least Significant Byte) |
| * [[https://en.wikipedia.org/wiki/Endianness|Endianness]]: the art of ordering bytes |
| ==== Numerical values ==== |
| |
| * Binary data representation of some numbers (only some common types are listed here): |
| * Languages and packages **references** used below: |
| * Python: [[https://numpy.org/doc/stable/reference/arrays.scalars.html#sized-aliases|NumPy Sized aliases]] |
| * NetCDF: [[https://docs.unidata.ucar.edu/nug/current/md_types.html|Data Types]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|Fortran related Data Types]], [[https://docs.unidata.ucar.edu/nug/current/_c_d_l.html#cdl_data_types|CDL Data Types]] |
| * Fortran: Intel Fortran Compiler [[https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-1/intrinsic-data-types.html|Intrinsic Data Types]] |
| * [[https://en.wikipedia.org/wiki/Integer_(computer_science)|Integers]] |
| * Range: |
| * 4-byte //signed// integers: ''−2,147,483,648'' to ''2,147,483,647'' |
| * Python: ''numpy.int32'' |
| * NetCDF: ''int'', ''NC_INT'' or ''NC_LONG'', ''NF90_INT'' |
| * Fortran: ''INTEGER*4'' |
| * 8-byte //signed// integers: ''−9,223,372,036,854,775,808'' to ''9,223,372,036,854,775,807'' |
| * Python: ''numpy.int64'' |
| * NetCDF: ''int64'', ''NC_INT64'' |
| * Fortran: ''INTEGER*8'' |
| * Tech note: signed integers use [[https://en.wikipedia.org/wiki/Two%27s_complement|two's complement]] for coding negative integers |
| * [[https://en.wikipedia.org/wiki/IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//) |
| * Range: |
| * 4-byte float: ''~8 significant digits * 10E±38'' |
| * Python: ''numpy.float32'' |
| * NetCDF: ''float'', ''NC-FLOAT'', ''NF90_FLOAT'' |
| * Fortran:''REAL*4'' |
| * See also [[https://en.wikipedia.org/wiki/Single-precision_floating-point_format|Single-precision floating-point format]] |
| * 8-byte float: ''~15 significant digits * 10E±308'' |
| * Python: ''numpy.float64'' |
| * NetCDF: ''double'', ''NC_DOUBLE'', ''NF90_DOUBLE'' |
| * Fortran: ''REAL*8'' |
| * **Special values**: |
| * [[https://en.wikipedia.org/wiki/NaN|NaN]]: //Not a Number// |
| * Python: ''numpy.nan'' |
| * Infinity |
| * Python: ''-numpy.inf'' and ''numpy.inf'' |
| * Note: it is cleaner to use masks (and [[https://numpy.org/doc/stable/reference/maskedarray.generic.html|Numpy masked arrays]]) rather than ''NaN''s, when you have to deal with missing values ! |
| * <wrap hi>The RISKS of working with (the wrong) floats</wrap>: |
| * [[https://en.wikipedia.org/wiki/Round-off_error|Round-off error]] |
| * [[https://en.wikipedia.org/wiki/Catastrophic_cancellation|Catastrophic cancellation]] |
| * [[https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html|What Every Computer Scientist Should Know About Floating-Point Arithmetic]] |
| * A rather technical example: we //play// with a numpy 4-byte integer scalar |
| * <code>>>> one_int32 = np.int32(1) |
| >>> one_int32 |
| 1 |
| >>> type(one_int32) |
| <class 'numpy.int32'> |
| >>> one_int32.dtype |
| dtype('int32') |
| >>> one_int32.shape # A numpy SCALAR, is an ARRAY WITH NO SHAPE ! |
| () |
| >>> one_int32[0] |
| Traceback (most recent call last): |
| File "<stdin>", line 1, in <module> |
| IndexError: invalid index to scalar variable. |
| >>> one_int32[()] # Note how to access the single element, when there is NO SHAPE |
| 1 |
| >>> one_int32.ndim # NO SHAPE means no dimensions, but there is ONE element |
| 0 |
| >>> one_int32.size |
| 1 |
| >>> one_int32.nbytes # The element requires 4 bytes of storage |
| 4 |
| >>> hex(one_int32) # We can print the hexadecimal representation for INTEGERS scalars and arrays |
| '0x1' |
| >>> hex(one_int32 * 15) |
| '0xf' |
| >>> hex(one_int32 * 16) |
| '0x10' |
| |
| # 'Serialize' the data (i.e. change the data to a series of bytes) |
| # Note: the serialized data seems to be printed in the reverse order of 'hex(one_int32)' |
| >>> one_int32_serialized = one_int32.tobytes() |
| >>> type(one_int32_serialized) |
| <class 'bytes'> |
| >>> len(one_int32_serialized) |
| 4 |
| >>> one_int32_serialized |
| b'\x01\x00\x00\x00' |
| >>> one_int32_serialized.hex(' ') # Another way to print the hexadecimal values |
| '01 00 00 00' |
| |
| # Use the following in the unlikely case where you need to change the endianness (bytes ordering) |
| >>> one_int32_reversed_endian = one_int32.byteswap() |
| >>> one_int32_reversed_endian # Same bytes in a different order represent a different number (of course) |
| 16777216 |
| >>> hex(one_int32_reversed_endian) # Compare to the output of hex(one_int32) above |
| '0x1000000' |
| >>> one_int32_reversed_endian.tobytes() |
| b'\x00\x00\x00\x01'</code> |
| * Another technical example: we use an array of 2 integers\\ When using ''byteswap()'', notice how bytes are swapped by groups of 4 bytes, because int32 use 4 bytes |
| * <code>>>> array_example = np.asarray((3, 17), dtype=np.int32) |
| >>> array_example |
| array([ 3, 17], dtype=int32) |
| >>> array_example.shape, array_example.ndim, array_example.size, array_example.nbytes |
| ((2,), 1, 2, 8) |
| >>> array_example.tobytes().hex(' ', 4) |
| '03000000 11000000' |
| >>> array_example.byteswap().tobytes().hex(' ', 4) |
| '00000003 00000011' |
| </code> |
| |
| * Manipulating binary data with [[https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview|bytes, bytearray, memoryview]] |
| |
| * Array addressing |
| * [[https://www.geeksforgeeks.org/calculation-of-address-of-element-of-1-d-2-d-and-3-d-using-row-major-and-column-major-order/|Calculation of address of element of 1-D, 2-D, and 3-D using row-major and column-major order]] |
| * In other words: //using indices to go from 1-D to n-Dimnensions data// |
| * The [[https://en.wikipedia.org/wiki/Array_(data_structure)|array]] structure |
| * python/C vs Fortran... |
| |
| * disk and ram usage: how to check the usage (available ram and disk), best practice on multi-user systems (how much allowed?) |
| * ''du'', ''df'', ''cat /proc/meminfo'', ''top'' |
| |
| * understanding and reverse-engineering //binary// format |
| * ''od'', ''strings'' |
| |
| * binary vs text format: ascii, utf, raw |
| * text related functions in python: ''str'', ''int'', ''float'', ''ord'', ... |
| * lists conversion with ''map'' and ''join'' |
| |
| * Misc : ''md5sum'' |
| |
| ==== Strings ==== |
| |
| * Encoding, [[https://en.wikipedia.org/wiki/ASCII|ASCII]], [[https://en.wikipedia.org/wiki/Unicode|unicode]], [[https://en.wikipedia.org/wiki/UTF-8|UTF-8]], ... |
| |
| * Getting the binary representation of a string |
| * <code>>>> test_string = 'A B 0 1 à µ' |
| >>> type(test_string) |
| <class 'str'> |
| >>> len(test_string) |
| 11 |
| >>> test_string_bin = test_string.encode('utf-8') |
| >>> test_string_bin |
| b'A B 0 1 \xc3\xa0 \xc2\xb5' |
| >>> type(test_string_bin) |
| <class 'bytes'> |
| >>> len(test_string_bin) |
| 13 |
| >>> test_string_bin.hex('-') |
| '41-20-42-20-30-20-31-20-c3-a0-20-c2-b5' |
| </code> |
| |
| |