This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
other:python:misc_by_jyp [2022/02/21 15:15] jypeter [numpy related stuff] |
other:python:misc_by_jyp [2022/03/08 16:40] jypeter [Playing with strings] |
||
---|---|---|---|
Line 39: | Line 39: | ||
True</code> | True</code> | ||
+ | ==== Playing with strings ==== | ||
+ | |||
+ | === Filenames, etc... === | ||
+ | |||
+ | Check [[other:python:misc_by_jyp#working_with_paths_and_filenames|Working with paths and filenames]] and [[other:python:misc_by_jyp#generating_file_names|Generating file names]] | ||
+ | |||
+ | === Splitting strings === | ||
+ | |||
+ | It's easy to split a string with multiple blank delimiters, or a specific delimiter, but it can be harder to deal with sub-strings | ||
+ | |||
+ | <code>>>> str_with_blanks = 'one two\t3\t\tFOUR' | ||
+ | >>> str_with_blanks.split() | ||
+ | ['one', 'two', '3', 'FOUR'] | ||
+ | |||
+ | >>> str_with_simple_delimiters = '1,2,3.14, 4' | ||
+ | >>> str_with_simple_delimiters.split(',') | ||
+ | ['1', '2', '3.14', ' 4'] | ||
+ | |||
+ | >>> complex_string='-o 1 --long "A string with accented chars: é è à ç"' | ||
+ | >>> complex_string.split() | ||
+ | ['-o', '1', '--long', '"A', 'string', 'with', 'accented', 'chars:', '\xc3\xa9', '\xc3\xa8', '\xc3\xa0', '\xc3\xa7"'] | ||
+ | |||
+ | >>> import shlex | ||
+ | >>> shlex.split(complex_string) | ||
+ | ['-o', '1', '--long', 'A string with accented chars: \xc3\xa9 \xc3\xa8 \xc3\xa0 \xc3\xa7']</code> | ||
==== Working with paths and filenames ==== | ==== Working with paths and filenames ==== | ||
Line 253: | Line 278: | ||
15.5 | 15.5 | ||
>>> vals.sum() # The usual and easy way to do it | >>> vals.sum() # The usual and easy way to do it | ||
- | 15.5</code> | + | 15.5 |
+ | |||
+ | # Compute the sum of the elements of 'nb_unique' | ||
+ | # AND keep (accumulate) the intermediate results | ||
+ | >>> nb_unique | ||
+ | array([3, 3, 4]) | ||
+ | >>> np.add.accumulate(nb_unique) | ||
+ | array([ 3, 6, 10]) | ||
+ | |||
+ | # The accumulated values can be used as indices to separate the different groups of sorted values! | ||
+ | >>> sorted_vals | ||
+ | array([1. , 1. , 1. , 1.5, 1.5, 1.5, 2. , 2. , 2. , 2. ]) | ||
+ | >>> sorted_vals[0:3] | ||
+ | array([1., 1., 1.]) | ||
+ | >>> sorted_vals[3:6] | ||
+ | array([1.5, 1.5, 1.5]) | ||
+ | >>> sorted_vals[6:10] | ||
+ | array([2., 2., 2., 2.]) | ||
+ | |||
+ | # Compute the sum of each equal-value group | ||
+ | >>> sorted_vals[0:3].sum(), sorted_vals[3:6].sum(), sorted_vals[6:10].sum() | ||
+ | (3.0, 4.5, 8.0)</code> | ||
+ | |||
+ | === Applying a ufunc over specified sections of an array === | ||
+ | |||
+ | The [[https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html#numpy.ufunc.reduceat|reduceat]] function can be used to avoid explicit python loops, and improve the speed (but not the readability...) of a script. The example below //improves// what has been shown above | ||
+ | |||
+ | <code># Define a list with the boundaries of the intervals we want to apply the 'add' function to | ||
+ | # We need to add the beginning index (0), AND remove the last index | ||
+ | # (reduceat will automatically go to the end of the input array | ||
+ | >>> nb_unique | ||
+ | array([3, 3, 4]) | ||
+ | >>> slices_indices = [0] + list(np.add.accumulate(nb_unique)) | ||
+ | >>> slices_indices.pop() # Remove last element | ||
+ | 10 | ||
+ | >>> slices_indices | ||
+ | [0, 3, 6] | ||
+ | |||
+ | # Compute the sums over the selected intervals with just one call | ||
+ | >>> np.add.reduceat(np.sort(vals), slices_indices) | ||
+ | array([3. , 4.5, 8. ])</code> | ||
/* | /* |