User Tools

Site Tools


other:python:misc_by_jyp

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
other:python:misc_by_jyp [2023/03/28 14:10]
jypeter [matplotlib related stuff] Added the time axes section
other:python:misc_by_jyp [2023/05/04 09:46]
jypeter [Data representation] Added the Base notions section
Line 31: Line 31:
  
 <​code>​sys.exit('​Some optional message about why we are stopping'​)</​code>​ <​code>​sys.exit('​Some optional message about why we are stopping'​)</​code>​
- 
- 
 ===== Checking if a file/​directory is writable by the current user ===== ===== Checking if a file/​directory is writable by the current user =====
  
Line 462: Line 460:
   * [[https://​matplotlib.org/​stable/​gallery/​index.html#​ticks|Ticks examples'​ gallery]]   * [[https://​matplotlib.org/​stable/​gallery/​index.html#​ticks|Ticks examples'​ gallery]]
   * [[https://​matplotlib.org/​stable/​gallery/​text_labels_and_annotations/​date.html|Date tick labels example]]   * [[https://​matplotlib.org/​stable/​gallery/​text_labels_and_annotations/​date.html|Date tick labels example]]
 +
 +
 +===== Data representation =====
 +
 +A few notes for a future section or page about about //data representation//​ (bits and bytes) on disk and in memory, vs //data format//
 +
 +FIXME Add parts (pages 28 to 37) of this [[https://​wiki.lsce.ipsl.fr/​pmip3/​doku.php/​other:​python:​jyp_steps#​part_2|old tutorial]] to this section
 +
 +==== Base notions ====
 +
 +  * **Never forget** that all the bits and pieces of information we use are coded in [[https://​en.wikipedia.org/​wiki/​Binary_number#​Counting_in_binary|base 2]] (''​0''​s and ''​1''​s),​ grouped in bytes!
 +    * Some things can be stored exactly (integers, characters, ...)
 +    * In other cases (**//real// numbers** that we work with all the time, compressed images/​videos/​music) we only store **//good enough approximation//​**
 +
 +  * 1 byte <=> 8 bits
 +    * ''​REAL*4''​ <=> 4 bytes <=> 32 bits
 +    * For easier written/​displayed representation,​ 1 byte is usually split into 2 groups of 4 bits, using base 16 and [[https://​en.wikipedia.org/​wiki/​Hexadecimal|hexadecimal representation]]
 +      * ''​0000''​ <=> ''​0'',​ ''​0010''​ <=> ''​1'',​ ..., ''​1111''​ <=> ''​F''​
 +      * ''​1101''​ <=> ''​D''​ in hexadecimal <=> ''​13''​ in decimal (''​**1** * 8 + **1** * 4 + **0** * 2 + **1** * 1''​)
 +      * ''​11111101''​ <=> ''​1111 1101''​ <=> ''​FC''​ in hexadecimal <=> ''​253''​ in decimal (''​15 * 16 + 13''​)
 +
 +  * Conversion with Python
 +    * <​code>>>>​ hex(13) # Decimal to Hexadecimal conversion
 +'​0xd'​
 +>>>​ hex(255)
 +'​0xff'​
 +>>>​ hex(256)
 +'​0x100'​
 +>>>​ int('​0x100',​ 16) # Hexadecimal to Decimal conversion
 +256
 +>>>​ int('​11',​ 2)
 +3
 +>>>​ int('​1111',​ 2) # Binary to Decimal conversion
 +15
 +>>>​ int('​11111101',​ 2)
 +253
 +>>>​ 15 * 16 + 13
 +253
 +>>>​ 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0
 +11
 +>>>​ int('​13',​ 8) # 1*8 + 3
 +11</​code>​
 +==== Numerical values ====
 +
 +  * Binary data representation of some numbers (not everythin is listed here):
 +    * [[https://​en.wikipedia.org/​wiki/​Integer_(computer_science)|Integers]]
 +      * Range:
 +        * 4-byte integers: −2,​147,​483,​648 to 2,​147,​483,​647
 +          * Python: ''​numpy.int32''​
 +          * [[https://​docs.unidata.ucar.edu/​nug/​current/​md_types.html|NetCDF]],​ [[https://​docs.unidata.ucar.edu/​netcdf-fortran/​current/​f90-variables.html#​f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]:​ ''​int'',​ ''​NC_INT64'',​ ''​NF90_INT''​
 +          * Fortran:
 +        * 8-byte integers: −9,​223,​372,​036,​854,​775,​808 to 9,​223,​372,​036,​854,​775,​807
 +          * Python: ''​numpy.int64''​
 +          * [[https://​docs.unidata.ucar.edu/​nug/​current/​md_types.html|NetCDF]]:​ ''​int64'',​ ''​NC_INT64''​
 +          * Fortran:
 +      * Tech note: signed integers use [[https://​en.wikipedia.org/​wiki/​Two%27s_complement|two'​s complement]] for coding negative integers
 +    * [[https://​en.wikipedia.org/​wiki/​IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//​)
 +      * Range:
 +        * 4-byte float: ~8 significant digits * 10E±38
 +          * Python: ''​numpy.float32''​
 +          * [[https://​docs.unidata.ucar.edu/​nug/​current/​md_types.html|NetCDF]],​ [[https://​docs.unidata.ucar.edu/​netcdf-fortran/​current/​f90-variables.html#​f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]: ​
 +          * Fortran:
 +          * See also [[https://​en.wikipedia.org/​wiki/​Single-precision_floating-point_format|Single-precision floating-point format]]
 +        * 8-byte float: ~15 significant digits * 10E±308
 +          * Python: ''​numpy.float64''​
 +          * [[https://​docs.unidata.ucar.edu/​nug/​current/​md_types.html|NetCDF]],​ [[https://​docs.unidata.ucar.edu/​netcdf-fortran/​current/​f90-variables.html#​f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]: ​
 +          * Fortran:
 +      * Special values:
 +        * [[https://​en.wikipedia.org/​wiki/​NaN|NaN]] (''​numpy.nan''​):​ //Not a Number//
 +        * Infinity (''​-numpy.inf''​ and ''​numpy.inf''​)
 +        * Note: it is cleaner to use masks (and [[https://​numpy.org/​doc/​stable/​reference/​maskedarray.generic.html|Numpy masked arrays]]) than NaNs, when you have to deal with missing values !
 +    * [[https://​en.wikipedia.org/​wiki/​Bit_numbering|Bit numbering]]
 +    * [[https://​en.wikipedia.org/​wiki/​Endianness|Endianness]]
 +    * A rather technical example: we //play// with a numpy 4-byte integer scalar
 +      * <​code>>>>​ one_int32 = np.int32(1)
 +>>>​ one_int32
 +1
 +>>>​ type(one_int32)
 +<class '​numpy.int32'>​
 +>>>​ one_int32.dtype
 +dtype('​int32'​)
 +>>>​ one_int32.shape # A numpy SCALAR, is an ARRAY WITH NO SHAPE !
 +()
 +>>>​ one_int32[0]
 +Traceback (most recent call last):
 +  File "<​stdin>",​ line 1, in <​module>​
 +IndexError: invalid index to scalar variable.
 +>>>​ one_int32[()] # Note how to access the single element, when there is NO SHAPE
 +1
 +>>>​ one_int32.ndim # NO SHAPE means no dimensions, but there is ONE element
 +0
 +>>>​ one_int32.size
 +1
 +>>>​ one_int32.nbytes # The element requires 4 bytes of storage
 +4
 +>>>​ hex(one_int32) # We can print the hexadecimal representation for INTEGERS scalars and arrays
 +'​0x1'​
 +>>>​ hex(one_int32 * 15)
 +'​0xf'​
 +>>>​ hex(one_int32 * 16)
 +'​0x10'​
 +
 +# '​Serialize'​ the data (i.e. change the data to a series of bytes)
 +# Note: the serialized data seems to be printed in the reverse order of '​hex(one_int32)'​
 +>>>​ one_int32_serialized = one_int32.tobytes()
 +>>>​ type(one_int32_serialized)
 +<class '​bytes'>​
 +>>>​ len(one_int32_serialized)
 +4
 +>>>​ one_int32_serialized ​
 +b'​\x01\x00\x00\x00'​
 +>>>​ one_int32_serialized.hex('​ ') # Another way to print the hexadecimal values
 +'01 00 00 00'
 +
 +# Use the following in the unlikely case where you need to change the endianness (bytes ordering)
 +>>>​ one_int32_reversed_endian = one_int32.byteswap()
 +>>>​ one_int32_reversed_endian # Same bytes in a different order represent a different number (of course)
 +16777216
 +>>>​ hex(one_int32_reversed_endian) # Compare to the output of hex(one_int32) above
 +'​0x1000000'​
 +>>>​ one_int32_reversed_endian.tobytes()
 +b'​\x00\x00\x00\x01'</​code>​
 +    * Another technical example: we use an array of 2 integers\\ When using ''​byteswap()'',​ notice how bytes are swapped by groups of 4 bytes, because int32 use 4 bytes
 +      * <​code>>>>​ array_example = np.asarray((3,​ 17), dtype=np.int32)
 +>>>​ array_example
 +array([ 3, 17], dtype=int32)
 +>>>​ array_example.shape,​ array_example.ndim,​ array_example.size,​ array_example.nbytes
 +((2,), 1, 2, 8)
 +>>>​ array_example.tobytes().hex('​ ', 4)
 +'​03000000 11000000'​
 +>>>​ array_example.byteswap().tobytes().hex('​ ', 4)
 +'​00000003 00000011'​
 +</​code>​
 +
 +  * Manipulating binary data with [[https://​docs.python.org/​3/​library/​stdtypes.html#​binary-sequence-types-bytes-bytearray-memoryview|bytes,​ bytearray, memoryview]]
 +
 +  * Array addressing
 +    * [[https://​www.geeksforgeeks.org/​calculation-of-address-of-element-of-1-d-2-d-and-3-d-using-row-major-and-column-major-order/​|Calculation of address of element of 1-D, 2-D, and 3-D using row-major and column-major order]]
 +      * In other words: //using indices to go from 1-D to n-Dimnensions data// ​
 +    * The [[https://​en.wikipedia.org/​wiki/​Array_(data_structure)|array]] structure
 +    * python/C vs Fortran...
 +
 +  * disk and ram usage: how to check the usage (available ram and disk), best practice on multi-user systems (how much allowed?)
 +    * ''​du'',​ ''​df'',​ ''​cat /​proc/​meminfo'',​ ''​top''​
 +
 +  * understanding and reverse-engineering //binary// format
 +    * ''​od'',​ ''​strings''​
 +
 +  * binary vs text format: ascii, utf, raw
 +    * text related functions in python: ''​str'',​ ''​int'',​ ''​float'',​ ''​ord'',​ ...
 +      * lists conversion with ''​map''​ and ''​join''​
 +
 +  * Misc : ''​md5sum''​
 +
 +==== Strings ====
 +
 +  * Encoding, [[https://​en.wikipedia.org/​wiki/​ASCII|ASCII]],​ [[https://​en.wikipedia.org/​wiki/​Unicode|unicode]],​ [[https://​en.wikipedia.org/​wiki/​UTF-8|UTF-8]],​ ...
 +
 +  * Getting the binary representation of a string
 +    * <​code>>>>​ test_string = 'A B 0 1 à µ'
 +>>>​ type(test_string)
 +<class '​str'>​
 +>>>​ len(test_string)
 +11
 +>>>​ test_string_bin = test_string.encode('​utf-8'​)
 +>>>​ test_string_bin
 +b'A B 0 1 \xc3\xa0 \xc2\xb5'​
 +>>>​ type(test_string_bin)
 +<class '​bytes'>​
 +>>>​ len(test_string_bin)
 +13
 +>>>​ test_string_bin.hex('​-'​)
 +'​41-20-42-20-30-20-31-20-c3-a0-20-c2-b5'​
 +</​code>​
  
  
other/python/misc_by_jyp.txt · Last modified: 2024/04/19 12:02 by jypeter