You will find on this page some useful, but unsorted, python tips and tricks that can't fit in a section of the main JYP's recommended steps for learning python page
Only when you have already read all the content of this page several times, and you are looking for new ideas
>>> os.environ['TMPDIR'] '/data/jypmce/climafcache' >>> os.environ.get('SCRATCHDIR', '/data/jypmce/some_scratch_stuff') '/data/jypmce/some_scratch_stuff' >>> os.environ['temporary_env_var_for_THIS_script'] = 'some value' >>> os.environ['temporary_env_var_for_THIS_script'] 'some value'
This will stop the script, unless it is called in a function, and the code calling the function explicitely catches and deals with errors
raise RuntimeError('\n\nOMG! An error! :-(\nAborting script...')
It is always possible to display information messages using the print()
command, but it is more efficient to use logging tools when you want to display correctly a lot of information about a script progress
logging
module
A user can use CTRL-C
or kill
to stop a script, or CTRL-Z
to suspend it temporarily (use fg
to resume a suspended script). The code below can be used by the script itself to interrupt its execution, instead of raising an error
sys.exit('Some optional message about why we are stopping')
>>> os.access('/', os.W_OK) False >>> os.access('/home/jypmce/.bashrc', os.W_OK) True
You will find below some examples of quick printing, as well as using old style formatting, formatted string literals (f-strings) and the String format()
Method. More details in the next section
>>> # Basic (but quick and efficient) printing >>> year = 1984 >>> print(year) 1984 >>> print('[', year, 'is a famous book ]') [ 1984 is a famous book ] >>> # Old style formatting >>> print('[ %i is a famous book ]' % (year,)) [ 1984 is a famous book ] >>> print('[ %10i is a famous book ]' % (year,)) [ 1984 is a famous book ] >>> print('[ %-10i is a famous book ]' % (year,)) [ 1984 is a famous book ] >>> print('[ %010i is a famous book ]' % (year,)) [ 0000001984 is a famous book ] >>> # Formatted string literals (f-strings) >>> print(f'[ {year} is a famous book ]') [ 1984 is a famous book ] >>> print(f'[ {year=} is a famous book ]') [ year=1984 is a famous book ] >>> print(f'[ {year:10} is a famous book ]') [ 1984 is a famous book ] >>> print(f'[ {year:<10} is a famous book ]') [ 1984 is a famous book ] >>> print(f'[ {year:010} is a famous book ]') [ 0000001984 is a famous book ] >>> print(f'[ {year:10.2f} is a famous book (yes, {year}!) ]') [ 1984.00 is a famous book (yes, 1984!) ] >>> # The String format() Method >>> print('[ {} is a famous book ]'.format(year)) [ 1984 is a famous book ] >>> print('[ {:10} is a famous book ]'.format(year)) [ 1984 is a famous book ] >>> print('[ {:<10} is a famous book ]'.format(year)) [ 1984 is a famous book ] >>> print('[ {:010} is a famous book ]'.format(year)) [ 0000001984 is a famous book ] >>> print('[ {:10.2f} is a famous book (yes, {}!) ]'.format(year, year)) [ 1984.00 is a famous book (yes, 1984!) ] >>> print('[ {title:10.2f} is a famous book (yes, {title}!) ]'.format(title=year)) [ 1984.00 is a famous book (yes, 1984!) ] >>> print('[ {title:10.2e} is a famous book ]'.format(title=year)) [ 1.98e+03 is a famous book ]
format()
methodIt's easy to split a string with multiple blank delimiters, or a specific delimiter, but it can be harder to deal with sub-strings
>>> str_with_blanks = 'one two\t3\t\tFOUR' >>> str_with_blanks.split() ['one', 'two', '3', 'FOUR'] >>> str_with_simple_delimiters = '1,2,3.14, 4' >>> str_with_simple_delimiters.split(',') ['1', '2', '3.14', ' 4'] >>> complex_string='-o 1 --long "A string with accented chars: é è à ç"' >>> complex_string.split() ['-o', '1', '--long', '"A', 'string', 'with', 'accented', 'chars:', '\xc3\xa9', '\xc3\xa8', '\xc3\xa0', '\xc3\xa7"'] >>> import shlex >>> shlex.split(complex_string) ['-o', '1', '--long', 'A string with accented chars: \xc3\xa9 \xc3\xa8 \xc3\xa0 \xc3\xa7']
If you are in a hurry, you can just use string functions to work with paths and file names.
You will need some specific objects and functions to check if a file exists, and similar operations. Check the libraries listed below, that can automatically deal with Unix-type paths on Linux and MacOS computers, and Windows-type paths on Windows computers
Note: the actual python may be different from the default python!
$ which python /usr/bin/python $ /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/bin/python >>> import sys, shutil >>> shutil.which('python') '/usr/bin/python' >>> sys.executable '/home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/bin/python'
>>> import os >>> os.getcwd() '/home/jypmce/PMIP4' >>> os.path.exists('./argv_test.py') True >>> os.path.abspath('./argv_test.py') '/home/jypmce/PMIP4/argv_test.py' >>> os.path.exists('/home/jypmce/PMIP4/argv_test.py') True
Note: the following example was generated on a Linux server and uses a / character as a path separator
>>> my_home = Path.home() >>> my_home PosixPath('/home/users/my_login') >>> my_conf = my_home / '.config' / 'evince' >>> my_conf PosixPath('/home/users/my_login/.config/evince') >>> my_conf.is_dir() True >>> my_conf.is_file() False >>> list(my_conf.glob('*')) [PosixPath('/home/users/my_login/.config/evince/evince_toolbar.xml'), PosixPath(' /home/users/my_login/.config/evince/accels')] >>> [ ff.name for ff in my_conf.glob('*') ] ['evince_toolbar.xml', 'accels']
$ cd /data/jypmce/TestDir $ ls -l total 72 -rw-r--r-- 1 jypmce ipsl 18147 Jun 25 2012 get_TS_cmip5.py -rw-r--r-- 1 jypmce ipsl 16152 Jun 21 2012 get_TS_cmip5.py~ -rw-r--r-- 1 jypmce ipsl 13954 Jul 3 2012 get_TS_cmip5_regular.py -rw-r--r-- 1 jypmce ipsl 16539 Jun 22 2012 get_TS_cmip5_regular.py~
>>> os.chdir('/data/jypmce/TestDir') >>> print(os.getcwd()) /data/jypmce/TestDir >>> files_list = os.listdir() >>> files_list ['get_TS_cmip5.py~', 'get_TS_cmip5_regular.py', 'get_TS_cmip5_regular.py~', 'get_TS_cmip5.py'] >>> files_sizes = list(map(os.path.getsize, files_list)) >>> files_sizes [16152, 13954, 16539, 18147] >>> sum(files_sizes) 64792
>>> import time >>> plot_version = time.strftime('%Y%m%d_%H%M') >>> f_name = 'test_%s.nc' % (plot_version,) >>> f_name 'test_20210827_1334.nc'
>>> import tempfile, os >>> f_tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.nc', delete=False) >>> f_tmp <tempfile._TemporaryFileWrapper object at 0x2b5614743820> >>> f_tmp.name '/tmp/tmpi6uk9hre.nc' >>> f_tmp.close() >>> os.remove(f_tmp.name)
The name of a script, the number of arguments (including the name of the script), and the arguments (as strings) can be accessed through the sys.argv
strings' list
Simple argv_test.py
test script:
#!/usr/bin/env python import sys nb_args = len(sys.argv) print('Number of script arguments (including script name) =', nb_args) for idx, val in enumerate(sys.argv): print(idx, val)
$ python argv_test.py Number of script arguments (including script name) = 1 0 argv_test.py $ python argv_test.py tas tas_tes.nc Number of script arguments (including script name) = 3 0 argv_test.py 1 tas 2 tas_tes.nc
Use getopt (C-style parser for command line options)
optparse (parser for command line options) is deprecated since Python version 3.2! You should now use argparse (check Upgrading optparse code for converting from optparse
to argparse
)
argparse (parser for command-line options, arguments and sub-commands) is available since Python version 3.2
Dictionary order is guaranteed to be insertion order! Note that the usual Python dictionary also guarantees the order since version 3.6
Check the OrderedDict class (from collections import OrderedDict
) and the OrderedDict vs dict in Python: The Right Tool for the Job tutorial
Python sets are groups of unique elements. They can be used to easily find all the unique elements of something and you can easily determine the intersection, union (and other similar operations) of sets.
The pprint module can be used for pretty printing objects (lists, dictionaries, …). It will wrap long lines in a meaningful way
>>> import pprint >>> test_dic = {'AWI-ESM-1-1-LR_AWI':{'r1i1p1f1': {'grid': 'gn'}}, 'CESM2_NCAR':{'r1i1p1f1': {'grid': 'gn'}}, 'IPSL-CM6A-LR_IPSL':{'r1i1p1f1': {'grid': 'gr'}, 'r1i1p1f2': {'grid': 'gr'}, 'r1i1p1f3': {'grid': 'gr'}, 'r1i1p1f4': {'grid': 'gr'}}} >>> print(test_dic) {'AWI-ESM-1-1-LR_AWI': {'r1i1p1f1': {'grid': 'gn'}}, 'CESM2_NCAR': {'r1i1p1f1': {'grid': 'gn'}}, 'IPSL-CM6A-LR_IPSL': {'r1i1p1f1': {'grid': 'gr'}, 'r1i1p1f2': {'grid': 'gr'}, 'r1i1p1f3': {'grid': 'gr'}, 'r1i1p1f4': {'grid': 'gr'}}} >>> pprint.pprint(test_dic) {'AWI-ESM-1-1-LR_AWI': {'r1i1p1f1': {'grid': 'gn'}}, 'CESM2_NCAR': {'r1i1p1f1': {'grid': 'gn'}}, 'IPSL-CM6A-LR_IPSL': {'r1i1p1f1': {'grid': 'gr'}, 'r1i1p1f2': {'grid': 'gr'}, 'r1i1p1f3': {'grid': 'gr'}, 'r1i1p1f4': {'grid': 'gr'}}} >>> dir(test_dic) ['__class__', '__contains__', '__delattr__', [... lots of unreadable stuff removed...] 'setdefault', 'update', 'values'] >>> pprint.pprint(dir(test_dic)) ['__class__', '__contains__', [... lots of lines removed in this example ] 'setdefault', 'update', 'values']
The built-in shelve module can be easily used for storing temporary/intermediate data
More options:
The built-in configparser module can be easily used for reading (and writing!) text configuration files.
Note: a configuration file is also a way to easily store and exchange text data !
There is a good chance you don't actually want/need a global variable. Be sure to use the global
statement correctly if you want to avoid side-effects…
_myvar = 10 def set_myvar(new_val): # Note: need to explicitly define a global variable (of a module) # as 'global' BEFORE changing its value in a function! # Otherwise, the value will not be REdefined outside the function global _myvar _myvar = new_val def get_myvar(): return _myvar def myfunc(nb_repeat = 10): print(nb_repeat * _myvar)
key
parameter to sort the keys of a dictionary according to the value associated with the keykey
function, the sort
function will sort the elements by the values returned by the function, instead of sorting by the initial values. The function used for generating the key below is very simple and we can use a lambda (i.e in place) function>>> demo_dic = {'a':10, 'b':5, 'c':-1, 'd':0} >>> sorted(demo_dic.keys()) ['a', 'b', 'c', 'd'] >>> sorted(demo_dic.values()) [-1, 0, 5, 10] >>> sorted(demo_dic.keys(), key=lambda key_name:demo_dic[key_name]) ['c', 'd', 'b', 'a']
Big, nested, explicit for
loops should be avoided at all cost, in order to reduce a script execution time!
numpy
arrays should be used when dealing with numerical data>>> my_ints = [1, 2, 3] >>> map(str, my_ints) ['1', '2', '3'] >>> map(lambda ii: str(10*ii + 5), my_ints) ['15', '25', '35']
>>> it.product('AB', '01') <itertools.product object at 0x2b35a7b5f100> >>> list(it.product('AB', '01')) [('A', '0'), ('A', '1'), ('B', '0'), ('B', '1')] >>> for c1, c2 in it.product('AB', '01'): ... print(c1 + c2) ... A0 A1 B0 B1 >>> for c1, c2 in it.product(['A', 'B'], ['0', '1']): ... print(c1 + c2) ... A0 A1 B0 B1 >>> for c1, c2, c3 in it.product('AB', '01', '$!'): ... print(c1 + c2 + c3, end=', ') ... A0$, A0!, A1$, A1!, B0$, B0!, B1$, B1!,
map
function detailed above>>> my_ints = [1, 2, 3] >>> [ str(ii) for ii in my_ints ] ['1', '2', '3']
The numpy arrays are usually used to store scalars of the same type (see also the Data type objects (dtype)), very often numerical values.
It is also possible to store arbitrary Python objects in an array, rather than using nested lists or dictionaries!
>>> some_array = np.empty((2, 3), dtype=object) >>> some_array array([[None, None, None], [None, None, None]], dtype=object) >>> some_array.shape (2, 3) >>> print(some_array[-1, -1]) None >>> some_array[-1, 0] = filled_contour # e.g. save an existing cartopy filled contour object >>> some_array array([[None, None, None], [<cartopy.mpl.contour.GeoContourSet object at 0x2ab679e8bf10>, None, None]], dtype=object)
>>> i10 = np.identity(10) >>> i10 array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], ... [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]) >>> i10.shape (10, 10) >>> i10[3:7, 4:6] array([[0., 0.], [1., 0.], [0., 1.], [0., 0.]]) >>> s0 = slice(3, 7) >>> s1 = slice(4, 6) >>> i10[s0, s1] array([[0., 0.], [1., 0.], [0., 1.], [0., 0.]]) >>> my_slices = (s0, s1) >>> i10[my_slices] array([[0., 0.], [1., 0.], [0., 1.], [0., 0.]]) >>> my_fancy_slices = (s0, Ellipsis) >>> i10[my_fancy_slices] array([[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.]]) >>> i10[my_fancy_slices].shape (4, 10) >>> # WARNING! DANGERRRR! NEVER forget that a VIEW is NOT A COPY >>> # and that you can change the content of the original array by mistake >>> my_view = i10[my_slices] >>> my_view[:, :] = -1 >>> my_view array([[-1., -1.], [-1., -1.], [-1., -1.], [-1., -1.]]) >>> i10 array([[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 1., -1., -1., 0., 0., 0., 0.], [ 0., 0., 0., 0., -1., -1., 0., 0., 0., 0.], [ 0., 0., 0., 0., -1., -1., 0., 0., 0., 0.], [ 0., 0., 0., 0., -1., -1., 1., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
Use np.unique
, do not try to use histogram related functions!
>>> vals = np.random.randint(2, 5, (10,)) * 0.5 # Get 10 discreet float values >>> vals array([1. , 2. , 1. , 2. , 2. , 1.5, 1. , 1.5, 2. , 1.5]) >>> np.unique(vals) array([1. , 1.5, 2. ]) >>> unique_vals, nb_unique = np.unique(vals, return_counts=True) >>> unique_vals array([1. , 1.5, 2. ]) >>> nb_unique array([3, 3, 4]) >>> sorted_vals = np.sort(vals) # Sorted copy, in order to check the result >>> sorted_vals array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ])
There are all sorts of ufuncs (Universal Functions), and we will just use below add
from the math operations, applied on the arrays defined in Finding and counting unique values
# Get the sum of all the elements of 'vals' >>> np.add.reduce(vals) 15.5 >>> np.add.reduce(sorted_vals) 15.5 >>> vals.sum() # The usual and easy way to do it 15.5 # Compute the sum of the elements of 'nb_unique' # AND keep (accumulate) the intermediate results >>> nb_unique array([3, 3, 4]) >>> np.add.accumulate(nb_unique) array([ 3, 6, 10]) # The accumulated values can be used as indices to separate the different groups of sorted values! >>> sorted_vals array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ]) >>> sorted_vals[0:3] array([1., 1., 1.]) >>> sorted_vals[3:6] array([1.5, 1.5, 1.5]) >>> sorted_vals[6:10] array([2., 2., 2., 2.]) # Compute the sum of each equal-value group >>> sorted_vals[0:3].sum(), sorted_vals[3:6].sum(), sorted_vals[6:10].sum() (3.0, 4.5, 8.0)
The reduceat function can be used to avoid explicit python loops, and improve the speed (but not the readability…) of a script. The example below improves what has been shown above
# Define a list with the boundaries of the intervals we want to apply the 'add' function to # We need to add the beginning index (0), AND remove the last index # (reduceat will automatically go to the end of the input array >>> nb_unique array([3, 3, 4]) >>> slices_indices = [0] + list(np.add.accumulate(nb_unique)) >>> slices_indices.pop() # Remove last element 10 >>> slices_indices [0, 3, 6] # Compute the sums over the selected intervals with just one call >>> np.add.reduceat(np.sort(vals), slices_indices) array([3. , 4.5, 8. ])
Have a look at 100 numpy exercises
If you have problems setting the limits of a time axis, choosing the ticks' locations, or specifying the style of the labels, you should check the:
A few notes for a future section or page about about data representation (bits and bytes) on disk and in memory, vs data format
Add parts (pages 28 to 37) of this old tutorial to this section
0
s and 1
s …), grouped in bytes!REAL*4
⇔ 4 bytes ⇔ 32 bits0
, 1
, …, A
, B
, …, F
)0000
⇔ 0
,0010
⇔ 1
, …,1111
⇔ F
1101
⇔ D
in hexadecimal ⇔ 13
in decimal (1 * 8 + 1 * 4 + 0 * 2 + 1 * 1
)11111101
in base 2 ⇔ 1111 1101
⇔ FD
in hexadecimal ⇔ 253
(15 * 16 + 13
) in decimal>>> hex(13) # Decimal to Hexadecimal conversion '0xd' >>> hex(253) '0xfd' >>> hex(256) '0x100' >>> int('0x100', 16) # Hexadecimal to Decimal conversion 256 >>> int('1111', 2) # Binary to Decimal conversion 15 >>> int('11111101', 2) # '11111101' <=> '1111 1101' <=> 'FD' <=> 15 * 16 + 13 = 253 253 >>> 013 # DANGER! Python considers an integer to be in OCTAL base if it starts with a 0 11 >>> int('13', 8) # 1*8 + 3 11
−2,147,483,648
to 2,147,483,647
numpy.int32
int
, NC_INT
or NC_LONG
, NF90_INT
INTEGER*4
−9,223,372,036,854,775,808
to 9,223,372,036,854,775,807
numpy.int64
int64
, NC_INT64
INTEGER*8
~8 significant digits * 10E±38
numpy.float32
float
, NC-FLOAT
, NF90_FLOAT
REAL*4
~15 significant digits * 10E±308
numpy.float64
double
, NC_DOUBLE
, NF90_DOUBLE
REAL*8
numpy.nan
-numpy.inf
and numpy.inf
NaN
s, when you have to deal with missing values !>>> one_int32 = np.int32(1) >>> one_int32 1 >>> type(one_int32) <class 'numpy.int32'> >>> one_int32.dtype dtype('int32') >>> one_int32.shape # A numpy SCALAR, is an ARRAY WITH NO SHAPE ! () >>> one_int32[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: invalid index to scalar variable. >>> one_int32[()] # Note how to access the single element, when there is NO SHAPE 1 >>> one_int32.ndim # NO SHAPE means no dimensions, but there is ONE element 0 >>> one_int32.size 1 >>> one_int32.nbytes # The element requires 4 bytes of storage 4 >>> hex(one_int32) # We can print the hexadecimal representation for INTEGERS scalars and arrays '0x1' >>> hex(one_int32 * 15) '0xf' >>> hex(one_int32 * 16) '0x10' # 'Serialize' the data (i.e. change the data to a series of bytes) # Note: the serialized data seems to be printed in the reverse order of 'hex(one_int32)' >>> one_int32_serialized = one_int32.tobytes() >>> type(one_int32_serialized) <class 'bytes'> >>> len(one_int32_serialized) 4 >>> one_int32_serialized b'\x01\x00\x00\x00' >>> one_int32_serialized.hex(' ') # Another way to print the hexadecimal values '01 00 00 00' # Use the following in the unlikely case where you need to change the endianness (bytes ordering) >>> one_int32_reversed_endian = one_int32.byteswap() >>> one_int32_reversed_endian # Same bytes in a different order represent a different number (of course) 16777216 >>> hex(one_int32_reversed_endian) # Compare to the output of hex(one_int32) above '0x1000000' >>> one_int32_reversed_endian.tobytes() b'\x00\x00\x00\x01'
byteswap()
, notice how bytes are swapped by groups of 4 bytes, because int32 use 4 bytes>>> array_example = np.asarray((3, 17), dtype=np.int32) >>> array_example array([ 3, 17], dtype=int32) >>> array_example.shape, array_example.ndim, array_example.size, array_example.nbytes ((2,), 1, 2, 8) >>> array_example.tobytes().hex(' ', 4) '03000000 11000000' >>> array_example.byteswap().tobytes().hex(' ', 4) '00000003 00000011'
du
, df
, cat /proc/meminfo
, top
od
, strings
str
, int
, float
, ord
, …map
and join
md5sum
>>> test_string = 'A B 0 1 à µ' >>> type(test_string) <class 'str'> >>> len(test_string) 11 >>> test_string_bin = test_string.encode('utf-8') >>> test_string_bin b'A B 0 1 \xc3\xa0 \xc2\xb5' >>> type(test_string_bin) <class 'bytes'> >>> len(test_string_bin) 13 >>> test_string_bin.hex('-') '41-20-42-20-30-20-31-20-c3-a0-20-c2-b5'
[ PMIP3 Wiki Home ] - [ Help! ] - [ Wiki syntax ]