Differences

This shows you the differences between two versions of the page.

--- other:python:misc_by_jyp [2023/03/28 14:10]
jypeter [matplotlib related stuff] Added the time axes section
+++ other:python:misc_by_jyp [2023/05/04 09:46]
jypeter [Data representation] Added the Base notions section
@@ Line 31: / Line 31: @@
 <code>sys.exit('Some optional message about why we are stopping')</code>
 ===== Checking if a file/directory is writable by the current user =====
@@ Line 462: / Line 460: @@
   * [[https://matplotlib.org/stable/gallery/index.html#ticks|Ticks examples' gallery]]
   * [[https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html|Date tick labels example]]
+===== Data representation =====
+A few notes for a future section or page about about //data representation// (bits and bytes) on disk and in memory, vs //data format//
+FIXME Add parts (pages 28 to 37) of this [[https://wiki.lsce.ipsl.fr/pmip3/doku.php/other:python:jyp_steps#part_2|old tutorial]] to this section
+==== Base notions ====
+  * **Never forget** that all the bits and pieces of information we use are coded in [[https://en.wikipedia.org/wiki/Binary_number#Counting_in_binary|base 2]] (''0''s and ''1''s), grouped in bytes!
+    * Some things can be stored exactly (integers, characters, ...)
+    * In other cases (**//real// numbers** that we work with all the time, compressed images/videos/music) we only store **//good enough approximation//**
+  * 1 byte <=> 8 bits
+    * ''REAL*4'' <=> 4 bytes <=> 32 bits
+    * For easier written/displayed representation, 1 byte is usually split into 2 groups of 4 bits, using base 16 and [[https://en.wikipedia.org/wiki/Hexadecimal|hexadecimal representation]]
+      * ''0000'' <=> ''0'', ''0010'' <=> ''1'', ..., ''1111'' <=> ''F''
+      * ''1101'' <=> ''D'' in hexadecimal <=> ''13'' in decimal (''**1** * 8 + **1** * 4 + **0** * 2 + **1** * 1'')
+      * ''11111101'' <=> ''1111 1101'' <=> ''FC'' in hexadecimal <=> ''253'' in decimal (''15 * 16 + 13'')
+  * Conversion with Python
+    * <code>>>> hex(13) # Decimal to Hexadecimal conversion
+'0xd'
+>>> hex(255)
+'0xff'
+>>> hex(256)
+'0x100'
+>>> int('0x100', 16) # Hexadecimal to Decimal conversion
+>>> int('11', 2)
+>>> int('1111', 2) # Binary to Decimal conversion
+>>> int('11111101', 2)
+>>> 15 * 16 + 13
+>>> 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0
+>>> int('13', 8) # 1*8 + 3
+</code>
+==== Numerical values ====
+  * Binary data representation of some numbers (not everythin is listed here):
+    * [[https://en.wikipedia.org/wiki/Integer_(computer_science)|Integers]]
+      * Range:
+        * 4-byte integers: −2,147,483,648 to 2,147,483,647
+          * Python: ''numpy.int32''
+          * [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]: ''int'', ''NC_INT64'', ''NF90_INT''
+          * Fortran:
+        * 8-byte integers: −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
+          * Python: ''numpy.int64''
+          * [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]]: ''int64'', ''NC_INT64''
+          * Fortran:
+      * Tech note: signed integers use [[https://en.wikipedia.org/wiki/Two%27s_complement|two's complement]] for coding negative integers
+    * [[https://en.wikipedia.org/wiki/IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//)
+      * Range:
+        * 4-byte float: ~8 significant digits * 10E±38
+          * Python: ''numpy.float32''
+          * [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]:
+          * Fortran:
+          * See also [[https://en.wikipedia.org/wiki/Single-precision_floating-point_format|Single-precision floating-point format]]
+        * 8-byte float: ~15 significant digits * 10E±308
+          * Python: ''numpy.float64''
+          * [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]:
+          * Fortran:
+      * Special values:
+        * [[https://en.wikipedia.org/wiki/NaN|NaN]] (''numpy.nan''): //Not a Number//
+        * Infinity (''-numpy.inf'' and ''numpy.inf'')
+        * Note: it is cleaner to use masks (and [[https://numpy.org/doc/stable/reference/maskedarray.generic.html|Numpy masked arrays]]) than NaNs, when you have to deal with missing values !
+    * [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]]
+    * [[https://en.wikipedia.org/wiki/Endianness|Endianness]]
+    * A rather technical example: we //play// with a numpy 4-byte integer scalar
+      * <code>>>> one_int32 = np.int32(1)
+>>> one_int32
+>>> type(one_int32)
+<class 'numpy.int32'>
+>>> one_int32.dtype
+dtype('int32')
+>>> one_int32.shape # A numpy SCALAR, is an ARRAY WITH NO SHAPE !
+()
+>>> one_int32[0]
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IndexError: invalid index to scalar variable.
+>>> one_int32[()] # Note how to access the single element, when there is NO SHAPE
+>>> one_int32.ndim # NO SHAPE means no dimensions, but there is ONE element
+>>> one_int32.size
+>>> one_int32.nbytes # The element requires 4 bytes of storage
+>>> hex(one_int32) # We can print the hexadecimal representation for INTEGERS scalars and arrays
+'0x1'
+>>> hex(one_int32 * 15)
+'0xf'
+>>> hex(one_int32 * 16)
+'0x10'
+# 'Serialize' the data (i.e. change the data to a series of bytes)
+# Note: the serialized data seems to be printed in the reverse order of 'hex(one_int32)'
+>>> one_int32_serialized = one_int32.tobytes()
+>>> type(one_int32_serialized)
+<class 'bytes'>
+>>> len(one_int32_serialized)
+>>> one_int32_serialized
+b'\x01\x00\x00\x00'
+>>> one_int32_serialized.hex(' ') # Another way to print the hexadecimal values
+'01 00 00 00'
+# Use the following in the unlikely case where you need to change the endianness (bytes ordering)
+>>> one_int32_reversed_endian = one_int32.byteswap()
+>>> one_int32_reversed_endian # Same bytes in a different order represent a different number (of course)
+16777216
+>>> hex(one_int32_reversed_endian) # Compare to the output of hex(one_int32) above
+'0x1000000'
+>>> one_int32_reversed_endian.tobytes()
+b'\x00\x00\x00\x01'</code>
+    * Another technical example: we use an array of 2 integers\\ When using ''byteswap()'', notice how bytes are swapped by groups of 4 bytes, because int32 use 4 bytes
+      * <code>>>> array_example = np.asarray((3, 17), dtype=np.int32)
+>>> array_example
+array([ 3, 17], dtype=int32)
+>>> array_example.shape, array_example.ndim, array_example.size, array_example.nbytes
+((2,), 1, 2, 8)
+>>> array_example.tobytes().hex(' ', 4)
+'03000000 11000000'
+>>> array_example.byteswap().tobytes().hex(' ', 4)
+'00000003 00000011'
+</code>
+  * Manipulating binary data with [[https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview|bytes, bytearray, memoryview]]
+  * Array addressing
+    * [[https://www.geeksforgeeks.org/calculation-of-address-of-element-of-1-d-2-d-and-3-d-using-row-major-and-column-major-order/|Calculation of address of element of 1-D, 2-D, and 3-D using row-major and column-major order]]
+      * In other words: //using indices to go from 1-D to n-Dimnensions data//
+    * The [[https://en.wikipedia.org/wiki/Array_(data_structure)|array]] structure
+    * python/C vs Fortran...
+  * disk and ram usage: how to check the usage (available ram and disk), best practice on multi-user systems (how much allowed?)
+    * ''du'', ''df'', ''cat /proc/meminfo'', ''top''
+  * understanding and reverse-engineering //binary// format
+    * ''od'', ''strings''
+  * binary vs text format: ascii, utf, raw
+    * text related functions in python: ''str'', ''int'', ''float'', ''ord'', ...
+      * lists conversion with ''map'' and ''join''
+  * Misc : ''md5sum''
+==== Strings ====
+  * Encoding, [[https://en.wikipedia.org/wiki/ASCII|ASCII]], [[https://en.wikipedia.org/wiki/Unicode|unicode]], [[https://en.wikipedia.org/wiki/UTF-8|UTF-8]], ...
+  * Getting the binary representation of a string
+    * <code>>>> test_string = 'A B 0 1 à µ'
+>>> type(test_string)
+<class 'str'>
+>>> len(test_string)
+>>> test_string_bin = test_string.encode('utf-8')
+>>> test_string_bin
+b'A B 0 1 \xc3\xa0 \xc2\xb5'
+>>> type(test_string_bin)
+<class 'bytes'>
+>>> len(test_string_bin)
+>>> test_string_bin.hex('-')
+'41-20-42-20-30-20-31-20-c3-a0-20-c2-b5'
+</code>

PMIP3 wiki

User Tools

Site Tools

Differences

Page Tools