This is an old revision of the document!
As can be expected, there is a lot of online python documentation available, and it's easy to get lost. You can always use google to find an answer to your problem, and you will probably end up looking at lots of answers on Stack Overflow or a similar site. But it's always better to know where you can find some good documentation… and to spend some time to read the documentation
This page tries to list some python for the scientist related resources, in a suggested reading order. Do not print anything (or at least not everything), but it's a good idea to download all the pdf files in the same place, so that you can easily open and search the documents
You can start using python by reading the Bien démarrer avec python tutorial that was used during a 2013 IPSL python class:
Once you have done your first steps, you should read Plus loin avec Python (start at page 39, the previous pages are an old version of what was covered in Part 1 above)
os.remove(file_name)
instead of rm $file_name
)You do not need to read all the python documentation at this step, but it is really well made and you should at least have a look at it. The Tutorial is very good, and you should have a look at the table of content of the Python Standard Library. There is a lot in the default library that can make your life easier
Summary: Python provides ordered objects (e.g. lists, strings, basic arrays, …) and some math operators, but you can't do real heavy computation with these. Numpy makes it possible to work with multi-dimensional data arrays, and using array syntax and masks (instead of explicit nested loops and tests) and the apropriate numpy functions will allow you to get performance similar to what you would get with a compiled program! Scipy adds more scientific functions
Where: html and pdf documentation
0
and that the last element of an array is at index -1
!'This document by JY is awesome!'[::-1]
and 'This document by JY is awesome!'[slice(None, None, -1)]
) That is not a problem when you only read the values, but if you change the values of the View, you change the values of the first array (and vice-versa)! If that is not what want, do not forget to make a copy of the data before working on it!
Views are a good thing most of the time, so only make a copy of your data when needed, because otherwise copying a big array will just be a waste of CPU and computer memory. Anyway, it is always better to understand what you are doing…
Check the example below and the copies and views part of the quickstart tutorial.
>>> import numpy as np >>> a = np.arange(30).reshape((3,10)) >>> a array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]]) >>> b = a[1, :] >>> b array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) >>> b[3:7] = 0 >>> b array([10, 11, 12, 0, 0, 0, 0, 17, 18, 19]) >>> a array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 0, 0, 0, 0, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]]) >>> a[:, 2:4] = -1 >>> a array([[ 0, 1, -1, -1, 4, 5, 6, 7, 8, 9], [10, 11, -1, -1, 0, 0, 0, 17, 18, 19], [20, 21, -1, -1, 24, 25, 26, 27, 28, 29]]) >>> b array([10, 11, -1, -1, 0, 0, 0, 17, 18, 19]) >>> c = a[1, :].copy() >>> c array([10, 11, -1, -1, 0, 0, 0, 17, 18, 19]) >>> c[:] = 9 >>> c array([9, 9, 9, 9, 9, 9, 9, 9, 9, 9]) >>> b array([10, 11, -1, -1, 0, 0, 0, 17, 18, 19]) >>> a array([[ 0, 1, -1, -1, 4, 5, 6, 7, 8, 9], [10, 11, -1, -1, 0, 0, 0, 17, 18, 19], [20, 21, -1, -1, 24, 25, 26, 27, 28, 29]])
There is a good chance that your input array data will come from a file in the NetCDF format.
Depending on which python distribution you are using, you can use the cdms2 or or netCDF4 modules to read the data.
Summary: cdms2 can read/write netCDF files (and read grads dat+ctl files) and provides a higher level interface than netCDF4. cdms2 is available in the UV-CDAT distribution, and can theoretically be installed independently of UV-CDAT (e.g. it will be installed when you install CMOR in conda). When you can use cdms2, you also have access to cdtime, that is very useful for handling time axis data.
How to get started:
Summary: netCDF4 can read/write netCDF files and is available in most python distributions
Some links, in case they can't be found easily on the UV-CDAT web site…
Summary: there are lots of python libraries that you can use for plotting, but Matplotlib has become a de facto standard
Where: Matplotlib web site
Help on stack overflow: matplotlib help
The matplotlib documentation is good, but not always easy to use. A good way to start with matplotlib is to quickly read the following, practice, and read this section again
import matplotlib.pyplot as plt
import matplotlib as mpl
my_page = plt.figure()
win_1 = plt.figure() win_2 = plt.figure()
(0, 0)
is the bottom left of the figure, and (1, 1)
the top rightmy_page = plt.figure()
: the ratio of the default figure is landscape
, because it is 33% larger than it is high. Creating a default figure will be OK most of the time!my_page = plt.figure(figsize=(width, height))
: create a figure with a custom ratio (sizes are considered to be in inches)my_page = plt.figure(figsize=(8.3, 11.7))
: create a figure that will theoretically fill an A4 size page in portrait mode (check Dimensions Of A Series Paper Sizes if you need more size details)my_plot = my_page.add_subplot(1, 1, 1)
: syntax is add_subplot(nrows, ncols, index)
my_plot = my_page.subplots()
top_plot = my_page.add_subplot(3, 1, 1) middle_plot = my_page.add_subplot(3, 1, 2) bottom_plot = my_page.add_subplot(3, 1, 3)
plot_array = my_page.subplots(3, 1) top_plot = plot_array[0] middle_plot = plot_array[1] bottom_plot = plot_array[2]
my_page, plot_array = plt.subplots(3, 1)
my_page.add_axes([left, bottom, width, height])
my_page.clear()
or my_page.clf()
or plt.clf()
: clear the (current) figuremy_plot.clear()
or my_plot.cla()
: clear the (current) axisplt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
my_page.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
hspace
/wspace
is the amount of height/width between the subplotshspace=0.1
is enough for just displaying the ticks and the labels, without the axis namehspace=0
to stick the plots together verticallymy_plot.set_xticks([])
my_page.subplots_adjust(right=0.75)
will leave 25% on the right of the page for adding a legend outside of a plotpl_x_bottomleft, pl_y_bottomleft, pl_width, pl_height = my_plot.get_position().bounds
my_plot.set_position( (pl_x_bottomleft, pl_y_bottomleft, pl_width, pl_height * 0.5) )
my_page.savefig('my_plot.pdf')
: save the figure to a pdf filemy_page.savefig('my_plot.png', dpi=200, transparent=True, bbox_inches='tight')
: save the figure to a png file at a higher resolution than the default (default is 100 dots per inch), with a transparent background and no extra space around the figureplt.show()
_r
to the name, e.g., viridis_r
zorder=NN
parameter when creating objects. NN
is an integer where 0 is the lowest value (the farthest from the eye), and objects are plotted above objects with a lower zorder valuematplotlib_object.set_order(NN)
to change the order after an object has been createdalpha
parameter where 0.0
means that the object is completely transparent, and 1.0
means completely opaquemy_plot.scatter(…, alpha=0.7)
show()
) a plot, because matplotlib expects to be able to display the figure on a screen by default.plot
function will be faster for scatterplots where markers don't vary in size or colormy_plot.set_xlim(x_leftmost_value, x_rightmost_value)
my_plot.set_xlabel(x_label_string, fontsize=axis_label_fontsize)
my_plot.set_xlabel('A closer label', labelpad=-20
my_plot.set_xticks(x_ticks_values, minor=False)
my_plot.set_xticks([])
my_plot.set_xticklabels(x_ticks_labels, minor=False, fontsize=ticklabels_fontsize)
x_ticks_labels
is a list of strings that has the same length as x_ticks_values
. Use an empty string in the positions where you don't want a labelmpl.rcParams['lines.markersize'] ** 2
⇒ 36mpl.rcParams['lines.linewidth']
⇒ 1.5plot
, all the markers have the same attributes, and for scatter
the attributes can be the same, or specified for each markermarker
(marker type), c
(color), s
(size), linewidths
(linewidth of the marker edges), edgecolors
_r
at the end of the colormap namemy_cmap.N
0
to my_cmap.N - 1
. Note that the index will saturate below 0
and above my_cmap.N - 1
>>> my_cmap.N 256 >>> my_cmap(-1) # Same as ano_cmap(0) (0.3686274509803922, 0.30980392156862746, 0.6352941176470588, 1.0) >>> my_cmap(0) (0.3686274509803922, 0.30980392156862746, 0.6352941176470588, 1.0) >>> my_cmap(1) (0.36186082276047676, 0.3185697808535179, 0.6394463667820068, 1.0) >>> my_cmap(255) (0.6196078431372549, 0.00392156862745098, 0.25882352941176473, 1.0) >>> my_cmap(256) # Same as ano_cmap(255) (0.6196078431372549, 0.00392156862745098, 0.25882352941176473, 1.0) >>> my_cmap(257) # Same as ano_cmap(255) (0.6196078431372549, 0.00392156862745098, 0.25882352941176473, 1.0)
my_cmap.set_bad(color='k')
: color to be used for masked valuesmy_cmap.set_over(color='k')
: color to be used for high out-of-range values if extend
is specified and is 'both' or 'max'. Default color is my_cmap(my_cmap.N - 1)
my_cmap.set_under(color='k')
: color to be used for low out-of-range values if extend
is specified and is 'both' or 'min'. Default color is my_cmap(0)
my_figure.suptitle('Figure title', x=xloc_in_normalized_coordinates, y=yloc_in_normalized_coordinates, …)
my_plot.set_title('Plot title', …)
fontsize
: size in points, or (better!) string specifying a relative size (xx-small
, x-small
, small
, medium
, large
, x-large
, xx-large
)label=
keyword when creating/updating a plotlabel=
keyword for these elements, or add a _
at the front of the label stringsbbox_to_anchor
parameterbbox_to_anchor
are in normalized coordinates of the current (sub)plot:(0, 0)
is the lower left corner of the plot, and (1, 1)
the upper right cornerlegend(… bbox_to_anchor=(1.05, 1.), loc='upper left', …)
will put the upper left corner of the legend slightly right ((1.05, 1.)
) of the upper right corner ((1, 1)
) of the plotplt.subplots_adjust(right=0.75)
will make all the plots use 75% on the left of the page, and leave 25% on the right for the legendmpl.rcParams['figure.figsize']
([6.4, 4.8]
)mpl.matplotlib_fname()
# make the background dark gray (call this before the contourf)
plt.gca().patch.set_color('.25')
plt.contourf(d)
plt.show()
Summary: Basemap is an extension of Matplotlib that you can use for plotting maps, using different projections
Where: Basemap web site
Help on stack overflow: basemap help
How to use basemap?
Summary: Cartopy is a Python package for advanced map generation with a simple matplotlib interface and Iris is a Python package for analysing and visualising meteorological and oceanographic data sets
Where: Cartopy and Iris web sites
Examples:
Help on stack overflow: cartopy help
We list here some resources about non-NetCDF data formats that can be useful
More and more applications use json files as configuration files or as a mean to use text files to exchange data (through serialization/deserialization ).
json files look basically like a list of (nested) python dictionaries that would have been dumped to a text file
/home/users/jypeter/CDAT/Progs/Devel/beaugendre/nc2json.py
cat file.json | python -m json.tool | less
Resources for Linked PaleoData:
BagIt, a set of hierarchical file layout conventions for storage and transfer of arbitrary digital content.
Summary: pandas is a library providing high-performance, easy-to-use data structures and data analysis tools
Where: Pandas web site
JYP's comment: pandas is supposed to be quite good for loading, processing and plotting time series, without writing custom code. It is very convenient for processing tables in xlsx files (or csv, etc…). You should at least have a quick look at:
Summary: One document to learn numerics, science, and data with Python
This is a really nice and useful document that is regularly updated and used for the EuroScipy tutorials. You will learn more things about python, numpy and matplotlib, debugging and optimizing scripts, and also learn about using python for statistics, image processing, machine learning, washing dishes (this is just to check if you have read this page), etc…
There is only so much you can do with staring at your code in your favorite text editor, and adding print
lines in your code (or using logging instead of print
). The next step is to use the python debugger!
python -m pdb my_script.py
run
(or r) to go to the first line of the scriptcontinue
(or c) to execute the script to the end, or till the first breakpoint or error is reachedwhere
(or w) to check the call stack that led to the current stop. Use up
and down
to navigate through the call stack and examine the values of the functions' parametersbreak NNN
to stop at line NNNtype(var)
and print var
to check the type and values of variables. You can also change the variables' values on the fly!run
(or r) to restart the scriptnext
and step
to execute some parts of the script line by line. If a code line calls a function:next
(or n) will execute a function and stop on the next linestep
(or s) will stop at the first line inside the function help
in the debugger for using the built-in help
Depending on the distribution, the editor and the programming environment you use, you may have access to a graphical version of the debugger. UV-CDAT users can use pydebug my_script.py
IDE = Integrated Development Environment
There are lots of ways to use Python and develop scripts, from using a lightweight approach (your favorite text editor with builtin python syntax highlighting, e.g. emacs and python -i myscript.py
) to a full-fledged IDE. You'll find below some IDE related links
You can already get a very efficient script by checking the following:
If your script is still not fast enough, there is a lot you can do to improve it, without resorting to parallelization (that may introduce extra bugs rather that extra performance). See the sections below
Hint: before optimizing your script, you should spent some time profiling it, in order to only spend time improving the slow parts of your script
The official Porting Python 2 Code to Python 3 page gives the required information to make the transition from python 2 to python 3. It is still safe to use Python 2.7, so there is no rush to change to Python 3.
You can do a lot more with python! But if you have read at least a part of this page, you should be able to find and use the modules you need. Make sure you do not reinvent the wheel! Use existing packages when possible, and make sure to report bugs or errors in the documentations when you find some
[ PMIP3 Wiki Home ] - [ Help! ] - [ Wiki syntax ]