Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
other:python:misc_by_jyp [2023/05/04 11:46] – [Data representation] Added the Base notions section jypeter | other:python:misc_by_jyp [2023/05/04 17:25] – [Numerical values] Lots of changes jypeter |
---|
==== Base notions ==== | ==== Base notions ==== |
| |
* **Never forget** that all the bits and pieces of information we use are coded in [[https://en.wikipedia.org/wiki/Binary_number#Counting_in_binary|base 2]] (''0''s and ''1''s), grouped in bytes! | * **Never forget** that all the bits and pieces of information we use are coded in [[https://en.wikipedia.org/wiki/Binary_number#Counting_in_binary|base 2]] (''0''s and ''1''s ...), grouped in bytes! |
* Some things can be stored exactly (integers, characters, ...) | * Some things can be stored exactly (integers, characters, ...) |
* In other cases (**//real// numbers** that we work with all the time, compressed images/videos/music) we only store **//good enough approximation//** | * In other cases (**//real// numbers** that we work with all the time, compressed images/videos/music) we only store **//good enough approximation//** |
* 1 byte <=> 8 bits | * 1 byte <=> 8 bits |
* ''REAL*4'' <=> 4 bytes <=> 32 bits | * ''REAL*4'' <=> 4 bytes <=> 32 bits |
* For easier written/displayed representation, 1 byte is usually split into 2 groups of 4 bits, using base 16 and [[https://en.wikipedia.org/wiki/Hexadecimal|hexadecimal representation]] | * For easier written/displayed representation, 1 byte is usually split into 2 groups of 4 bits, and displayed using base 16 and [[https://en.wikipedia.org/wiki/Hexadecimal|hexadecimal representation]] (characters ''0'', ''1'', ..., ''A'', ''B'', ..., ''F'') |
* ''0000'' <=> ''0'', ''0010'' <=> ''1'', ..., ''1111'' <=> ''F'' | * ''0000'' <=> ''0'',\\ ''0010'' <=> ''1'', ...,\\ ''1111'' <=> ''F'' |
* ''1101'' <=> ''D'' in hexadecimal <=> ''13'' in decimal (''**1** * 8 + **1** * 4 + **0** * 2 + **1** * 1'') | * ''1101'' <=> ''D'' in hexadecimal <=> ''13'' in decimal (''**1** * 8 + **1** * 4 + **0** * 2 + **1** * 1'') |
* ''11111101'' <=> ''1111 1101'' <=> ''FC'' in hexadecimal <=> ''253'' in decimal (''15 * 16 + 13'') | * ''11111101'' in //base 2// <=> ''1111 1101'' <=> ''FD'' in //hexadecimal// <=> ''253'' (''15 * 16 + 13'') in //decimal// |
| |
* Conversion with Python | * Base conversion with Python |
* <code>>>> hex(13) # Decimal to Hexadecimal conversion | * <code>>>> hex(13) # Decimal to Hexadecimal conversion |
'0xd' | '0xd' |
>>> hex(255) | >>> hex(253) |
'0xff' | '0xfd' |
>>> hex(256) | >>> hex(256) |
'0x100' | '0x100' |
>>> int('0x100', 16) # Hexadecimal to Decimal conversion | >>> int('0x100', 16) # Hexadecimal to Decimal conversion |
256 | 256 |
>>> int('11', 2) | |
3 | |
>>> int('1111', 2) # Binary to Decimal conversion | >>> int('1111', 2) # Binary to Decimal conversion |
15 | 15 |
>>> int('11111101', 2) | >>> int('11111101', 2) # '11111101' <=> '1111 1101' <=> 'FD' <=> 15 * 16 + 13 = 253 |
253 | |
>>> 15 * 16 + 13 | |
253 | 253 |
>>> 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0 | >>> 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0 |
>>> int('13', 8) # 1*8 + 3 | >>> int('13', 8) # 1*8 + 3 |
11</code> | 11</code> |
| |
| * More technical topics |
| * [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]]: the art of ordering bits, everything about MSB (Most Significant Byte) and LSB (Least Significant Byte) |
| * [[https://en.wikipedia.org/wiki/Endianness|Endianness]]: the art of ordering bytes |
==== Numerical values ==== | ==== Numerical values ==== |
| |
* Binary data representation of some numbers (not everythin is listed here): | * Binary data representation of some numbers (only some common types are listed here): |
| * Languages and packages **references** used below: |
| * Python: [[https://numpy.org/doc/stable/reference/arrays.scalars.html#sized-aliases|NumPy Sized aliases]] |
| * NetCDF: [[https://docs.unidata.ucar.edu/nug/current/md_types.html|Data Types]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|Fortran related Data Types]], [[https://docs.unidata.ucar.edu/nug/current/_c_d_l.html#cdl_data_types|CDL Data Types]] |
| * Fortran: Intel Fortran Compiler [[https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-1/intrinsic-data-types.html|Intrinsic Data Types]] |
* [[https://en.wikipedia.org/wiki/Integer_(computer_science)|Integers]] | * [[https://en.wikipedia.org/wiki/Integer_(computer_science)|Integers]] |
* Range: | * Range: |
* 4-byte integers: −2,147,483,648 to 2,147,483,647 | * 4-byte //signed// integers: ''−2,147,483,648'' to ''2,147,483,647'' |
* Python: ''numpy.int32'' | * Python: ''numpy.int32'' |
* [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]: ''int'', ''NC_INT64'', ''NF90_INT'' | * NetCDF: ''int'', ''NC_INT'' or ''NC_LONG'', ''NF90_INT'' |
* Fortran: | * Fortran: ''INTEGER*4'' |
* 8-byte integers: −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | * 8-byte //signed// integers: ''−9,223,372,036,854,775,808'' to ''9,223,372,036,854,775,807'' |
* Python: ''numpy.int64'' | * Python: ''numpy.int64'' |
* [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]]: ''int64'', ''NC_INT64'' | * NetCDF: ''int64'', ''NC_INT64'' |
* Fortran: | * Fortran: ''INTEGER*8'' |
* Tech note: signed integers use [[https://en.wikipedia.org/wiki/Two%27s_complement|two's complement]] for coding negative integers | * Tech note: signed integers use [[https://en.wikipedia.org/wiki/Two%27s_complement|two's complement]] for coding negative integers |
* [[https://en.wikipedia.org/wiki/IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//) | * [[https://en.wikipedia.org/wiki/IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//) |
* Range: | * Range: |
* 4-byte float: ~8 significant digits * 10E±38 | * 4-byte float: ''~8 significant digits * 10E±38'' |
* Python: ''numpy.float32'' | * Python: ''numpy.float32'' |
* [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]: | * NetCDF: ''float'', ''NC-FLOAT'', ''NF90_FLOAT'' |
* Fortran: | * Fortran:''REAL*4'' |
* See also [[https://en.wikipedia.org/wiki/Single-precision_floating-point_format|Single-precision floating-point format]] | * See also [[https://en.wikipedia.org/wiki/Single-precision_floating-point_format|Single-precision floating-point format]] |
* 8-byte float: ~15 significant digits * 10E±308 | * 8-byte float: ''~15 significant digits * 10E±308'' |
* Python: ''numpy.float64'' | * Python: ''numpy.float64'' |
* [[https://docs.unidata.ucar.edu/nug/current/md_types.html|NetCDF]], [[https://docs.unidata.ucar.edu/netcdf-fortran/current/f90-variables.html#f90-language-types-corresponding-to-netcdf-external-data-types|NetCDF-Fortran]]: | * NetCDF: ''double'', ''NC_DOUBLE'', ''NF90_DOUBLE'' |
* Fortran: | * Fortran: ''REAL*8'' |
* Special values: | * **Special values**: |
* [[https://en.wikipedia.org/wiki/NaN|NaN]] (''numpy.nan''): //Not a Number// | * [[https://en.wikipedia.org/wiki/NaN|NaN]]: //Not a Number// |
* Infinity (''-numpy.inf'' and ''numpy.inf'') | * Python: ''numpy.nan'' |
| * Infinity |
| * Python: ''-numpy.inf'' and ''numpy.inf'' |
* Note: it is cleaner to use masks (and [[https://numpy.org/doc/stable/reference/maskedarray.generic.html|Numpy masked arrays]]) than NaNs, when you have to deal with missing values ! | * Note: it is cleaner to use masks (and [[https://numpy.org/doc/stable/reference/maskedarray.generic.html|Numpy masked arrays]]) than NaNs, when you have to deal with missing values ! |
* [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]] | * <wrap hi>The RISKS of working with (the wrong) floats</wrap>: |
* [[https://en.wikipedia.org/wiki/Endianness|Endianness]] | * [[https://en.wikipedia.org/wiki/Round-off_error|Round-off error]] |
| * [[https://en.wikipedia.org/wiki/Catastrophic_cancellation|Catastrophic cancellation]] |
| * [[https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html|What Every Computer Scientist Should Know About Floating-Point Arithmetic]] |
* A rather technical example: we //play// with a numpy 4-byte integer scalar | * A rather technical example: we //play// with a numpy 4-byte integer scalar |
* <code>>>> one_int32 = np.int32(1) | * <code>>>> one_int32 = np.int32(1) |