This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
other:python:misc_by_jyp [2022/12/12 13:50] jypeter Improved by changing the sections' levels |
other:python:misc_by_jyp [2023/04/28 14:59] jypeter [Data representation] Added link to old tutorial |
||
---|---|---|---|
Line 31: | Line 31: | ||
<code>sys.exit('Some optional message about why we are stopping')</code> | <code>sys.exit('Some optional message about why we are stopping')</code> | ||
- | |||
- | |||
===== Checking if a file/directory is writable by the current user ===== | ===== Checking if a file/directory is writable by the current user ===== | ||
Line 43: | Line 41: | ||
===== Playing with strings ===== | ===== Playing with strings ===== | ||
- | ==== Filenames, etc... ==== | ||
- | |||
- | Check [[other:python:misc_by_jyp#working_with_paths_and_filenames|Working with paths and filenames]] and [[other:python:misc_by_jyp#generating_file_names|Generating file names]] | ||
- | ==== Splitting strings ==== | + | ==== Splitting (complex) strings ==== |
It's easy to split a string with multiple blank delimiters, or a specific delimiter, but it can be harder to deal with sub-strings | It's easy to split a string with multiple blank delimiters, or a specific delimiter, but it can be harder to deal with sub-strings | ||
Line 257: | Line 252: | ||
Note: a configuration file is also a way to easily store and exchange text data ! | Note: a configuration file is also a way to easily store and exchange text data ! | ||
+ | |||
+ | ===== Working with global variables ===== | ||
+ | |||
+ | There is a good chance you don't actually want/need a //global// variable. Be sure to use the ''global'' statement correctly if you want to avoid side-effects... | ||
+ | |||
+ | * [[https://docs.python.org/3/faq/programming.html?highlight=global#why-am-i-getting-an-unboundlocalerror-when-the-variable-has-a-value|Using (and changing) a global variable inside a script or module]] | ||
+ | * Simple module example\\ <code>_myvar = 10 | ||
+ | |||
+ | def set_myvar(new_val): | ||
+ | # Note: need to explicitly define a global variable (of a module) | ||
+ | # as 'global' BEFORE changing its value in a function! | ||
+ | # Otherwise, the value will not be REdefined outside the function | ||
+ | global _myvar | ||
+ | _myvar = new_val | ||
+ | |||
+ | def get_myvar(): | ||
+ | return _myvar | ||
+ | |||
+ | def myfunc(nb_repeat = 10): | ||
+ | print(nb_repeat * _myvar)</code> | ||
+ | * [[https://docs.python.org/3/faq/programming.html?highlight=global#how-do-i-share-global-variables-across-modules|Sharing global variables across modules]] | ||
===== Sorting ===== | ===== Sorting ===== | ||
Line 435: | Line 451: | ||
>>> np.add.reduceat(np.sort(vals), slices_indices) | >>> np.add.reduceat(np.sort(vals), slices_indices) | ||
array([3. , 4.5, 8. ])</code> | array([3. , 4.5, 8. ])</code> | ||
+ | |||
+ | |||
+ | ===== matplotlib related stuff ===== | ||
+ | |||
+ | ==== Working with time axes (and ticks) ==== | ||
+ | |||
+ | If you have problems setting the limits of a time axis, choosing the ticks' locations, or specifying the style of the labels, you should check the: | ||
+ | * [[https://matplotlib.org/stable/gallery/index.html#ticks|Ticks examples' gallery]] | ||
+ | * [[https://matplotlib.org/stable/gallery/text_labels_and_annotations/date.html|Date tick labels example]] | ||
+ | |||
+ | |||
+ | ===== Data representation ===== | ||
+ | |||
+ | A few notes for a future section or page about about //data representation// (bits and bytes) on disk and in memory, vs //data format// | ||
+ | |||
+ | FIXME Add parts (pages 28 to 37) of this [[https://wiki.lsce.ipsl.fr/pmip3/doku.php/other:python:jyp_steps#part_2|old tutorial]] to this section | ||
+ | ==== Numerical values ==== | ||
+ | |||
+ | * Binary data representation of some numbers: | ||
+ | * [[https://en.wikipedia.org/wiki/Integer_(computer_science)|Integers]] | ||
+ | * Range: | ||
+ | * 4-byte integers (''numpy.int32''): −2,147,483,648 to 2,147,483,647 | ||
+ | * 8-byte integers (''numpy.int64''): −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | ||
+ | * Tech note: signed integers use [[https://en.wikipedia.org/wiki/Two%27s_complement|two's complement]] for coding negative integers | ||
+ | * [[https://en.wikipedia.org/wiki/IEEE_754|Floating point numbers]] (//IEEE 754// standard aka //IEEE Standard for Binary Floating-Point for Arithmetic//) | ||
+ | * Range: | ||
+ | * 4-byte float (''numpy.float32''): ~8 significant digits * 10E±38 | ||
+ | * See also [[https://en.wikipedia.org/wiki/Single-precision_floating-point_format|Single-precision floating-point format]] | ||
+ | * 8-byte float (''numpy.float64''): ~15 significant digits * 10E±308 | ||
+ | * Special values: | ||
+ | * [[https://en.wikipedia.org/wiki/NaN|NaN]] (''numpy.nan''): //Not a Number// | ||
+ | * Infinity (''-numpy.inf'' and ''numpy.inf'') | ||
+ | * Note: it is cleaner to use masks (and [[https://numpy.org/doc/stable/reference/maskedarray.generic.html|Numpy masked arrays]]) than NaNs, when you have to deal with missing values ! | ||
+ | * [[https://en.wikipedia.org/wiki/Bit_numbering|Bit numbering]] | ||
+ | * [[https://en.wikipedia.org/wiki/Endianness|Endianness]] | ||
+ | * A rather technical example: we //play// with a numpy 4-byte integer scalar | ||
+ | * <code>>>> one_int32 = np.int32(1) | ||
+ | >>> one_int32 | ||
+ | 1 | ||
+ | >>> type(one_int32) | ||
+ | <class 'numpy.int32'> | ||
+ | >>> one_int32.dtype | ||
+ | dtype('int32') | ||
+ | >>> one_int32.shape # A numpy SCALAR, is an ARRAY WITH NO SHAPE ! | ||
+ | () | ||
+ | >>> one_int32[0] | ||
+ | Traceback (most recent call last): | ||
+ | File "<stdin>", line 1, in <module> | ||
+ | IndexError: invalid index to scalar variable. | ||
+ | >>> one_int32[()] # Note how to access the single element, when there is NO SHAPE | ||
+ | 1 | ||
+ | >>> one_int32.ndim # NO SHAPE means no dimensions, but there is ONE element | ||
+ | 0 | ||
+ | >>> one_int32.size | ||
+ | 1 | ||
+ | >>> one_int32.nbytes # The element requires 4 bytes of storage | ||
+ | 4 | ||
+ | >>> hex(one_int32) # We can print the hexadecimal representation for INTEGERS scalars and arrays | ||
+ | '0x1' | ||
+ | >>> hex(one_int32 * 15) | ||
+ | '0xf' | ||
+ | >>> hex(one_int32 * 16) | ||
+ | '0x10' | ||
+ | |||
+ | # 'Serialize' the data (i.e. change the data to a series of bytes) | ||
+ | # Note: the serialized data seems to be printed in the reverse order of 'hex(one_int32)' | ||
+ | >>> one_int32_serialized = one_int32.tobytes() | ||
+ | >>> type(one_int32_serialized) | ||
+ | <class 'bytes'> | ||
+ | >>> len(one_int32_serialized) | ||
+ | 4 | ||
+ | >>> one_int32_serialized | ||
+ | b'\x01\x00\x00\x00' | ||
+ | >>> one_int32_serialized.hex(' ') # Another way to print the hexadecimal values | ||
+ | '01 00 00 00' | ||
+ | |||
+ | # Use the following in the unlikely case where you need to change the endianness (bytes ordering) | ||
+ | >>> one_int32_reversed_endian = one_int32.byteswap() | ||
+ | >>> one_int32_reversed_endian # Same bytes in a different order represent a different number (of course) | ||
+ | 16777216 | ||
+ | >>> hex(one_int32_reversed_endian) # Compare to the output of hex(one_int32) above | ||
+ | '0x1000000' | ||
+ | >>> one_int32_reversed_endian.tobytes() | ||
+ | b'\x00\x00\x00\x01'</code> | ||
+ | * Another technical example: we use an array of 2 integers\\ When using ''byteswap()'', notice how bytes are swapped by groups of 4 bytes, because int32 use 4 bytes | ||
+ | * <code>>>> array_example = np.asarray((3, 17), dtype=np.int32) | ||
+ | >>> array_example | ||
+ | array([ 3, 17], dtype=int32) | ||
+ | >>> array_example.shape, array_example.ndim, array_example.size, array_example.nbytes | ||
+ | ((2,), 1, 2, 8) | ||
+ | >>> array_example.tobytes().hex(' ', 4) | ||
+ | '03000000 11000000' | ||
+ | >>> array_example.byteswap().tobytes().hex(' ', 4) | ||
+ | '00000003 00000011' | ||
+ | </code> | ||
+ | |||
+ | * Manipulating binary data with [[https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview|bytes, bytearray, memoryview]] | ||
+ | |||
+ | * Array addressing | ||
+ | * [[https://www.geeksforgeeks.org/calculation-of-address-of-element-of-1-d-2-d-and-3-d-using-row-major-and-column-major-order/|Calculation of address of element of 1-D, 2-D, and 3-D using row-major and column-major order]] | ||
+ | * In other words: //using indices to go from 1-D to n-Dimnensions data// | ||
+ | * The [[https://en.wikipedia.org/wiki/Array_(data_structure)|array]] structure | ||
+ | * python/C vs Fortran... | ||
+ | |||
+ | * disk and ram usage: how to check the usage (available ram and disk), best practice on multi-user systems (how much allowed?) | ||
+ | * ''du'', ''df'', ''cat /proc/meminfo'', ''top'' | ||
+ | |||
+ | * understanding and reverse-engineering //binary// format | ||
+ | * ''od'', ''strings'' | ||
+ | |||
+ | * binary vs text format: ascii, utf, raw | ||
+ | * text related functions in python: ''str'', ''int'', ''float'', ''ord'', ... | ||
+ | * lists conversion with ''map'' and ''join'' | ||
+ | |||
+ | * Misc : ''md5sum'' | ||
+ | |||
+ | ==== Strings ==== | ||
+ | |||
+ | * Encoding, [[https://en.wikipedia.org/wiki/ASCII|ASCII]], [[https://en.wikipedia.org/wiki/Unicode|unicode]], [[https://en.wikipedia.org/wiki/UTF-8|UTF-8]], ... | ||
+ | |||
+ | * Getting the binary representation of a string | ||
+ | * <code>>>> test_string = 'A B 0 1 à µ' | ||
+ | >>> type(test_string) | ||
+ | <class 'str'> | ||
+ | >>> len(test_string) | ||
+ | 11 | ||
+ | >>> test_string_bin = test_string.encode('utf-8') | ||
+ | >>> test_string_bin | ||
+ | b'A B 0 1 \xc3\xa0 \xc2\xb5' | ||
+ | >>> type(test_string_bin) | ||
+ | <class 'bytes'> | ||
+ | >>> len(test_string_bin) | ||
+ | 13 | ||
+ | >>> test_string_bin.hex('-') | ||
+ | '41-20-42-20-30-20-31-20-c3-a0-20-c2-b5' | ||
+ | </code> | ||