This is an old revision of the document!

JYP's recommended steps for learning python

If you don't know which python distribution to use and how to start the python interpreter, you should first read the Working with Python page

As can be expected, there is a lot of online python documentation available, and it's easy to get lost. You can always use google to find an answer to your problem, and you will probably end up looking at lots of answers on Stack Overflow or a similar site. But it's always better to know where you can find some good documentation… and to spend some time to read the documentation

This page tries to list some python for the scientist related resources, in a suggested reading order. Do not print anything (or at least not everything), but it's a good idea to download all the pdf files in the same place, so that you can easily open and search the documents

JYP's introduction to python

Part 1

You can start using python by reading the Bien démarrer avec python tutorial that was used during a 2013 IPSL python class:

this tutorial is in French (my apologies for the lack of translation, but it should be easy to understand)
- If you have too much trouble understanding this French Tutorial, you can read the first 6 chapters of the Tutorial in the official Python documentation and chapters 1.2.1 to 1.2.5 in the Scipy Lecture Notes. Once you have read these, you can try to read the French tutorial again
it's an introduction to python (and programming) for the climate scientist: after reading this tutorial, you should be able to do most of the things you usually do in a shell script
- python types, tests, loops, reading a text file
- the tutorial is very detailed about string handling, because strings offer an easy way to practice working with indices (indexing and slicing), before indexing numpy arrays. And our usual pre/post-processing scripts often need to do a lot of string handling in order to generate the file/variable/experiment names
after reading this tutorial, you should practice with the following:

Part 2

Once you have done your first steps, you should read Plus loin avec Python (start at page 39, the previous pages are an old version of what was covered in Part 1 above)

this tutorial is in French (sorry again)
after reading this tutorial, you will be able to do more than you can do in a shell script, in an easier way
- advanced string formatting
- creating functions and using modules
- working with file paths and handling files without calling external Linux programs
  (e.g. using os.remove(file_name) instead of rm $file_name)
- using command-line options for scripts, or using configuration files
- calling external programs

The official python documentation

You do not need to read all the python documentation at this step, but it is really well made and you should at least have a look at it. The Tutorial is very good, and you should have a look at the table of content of the Python Standard Library. There is a lot in the default library that can make your life easier

Python 2.7

html - pdf (in a zip file)

Python 3

html - pdf (in a zip file)

Numpy and Scipy

Summary: Python provides ordered objects (e.g. lists, strings, basic arrays, …) and some math operators, but you can't do real heavy computation with these. Numpy makes it possible to work with multi-dimensional data arrays, and using array syntax and masks (instead of explicit nested loops and tests) and the apropriate numpy functions will allow you to get performance similar to what you would get with a compiled program! Scipy adds more scientific functions

Where: html and pdf documentation

Getting started

always remember that indices start at 0 and that the last element of an array is at index -1!
First learn about indexing and slicing by manipulating strings, as shown in Part 1 above (try 'This document by JY is awesome!'[::-1] and 'This document by JY is awesome!'[slice(None, None, -1)])
if you are a Matlab user (but the references are interesting for others as well), you can read the following:
1. Numpy for Matlab users
2. NumPy for MATLAB users (nice, but does not seem to be maintained any more)
read the really nice numpy Quickstart tutorial
have a quick look at the full documentation to know where things are
1. Numpy User Guide
2. Numpy Reference Guide
3. Scipy Reference Guide

Beware of the array view side effects

When you take a slice of an array, you get a View : an array that has a new shape but that still shares its data with the first array.

That is not a problem when you only read the values, but if you change the values of the View, you change the values of the first array (and vice-versa)! If that is not what want, do not forget to make a copy of the data before working on it!

Views are a good thing most of the time, so only make a copy of your data when needed, because otherwise copying a big array will just be a waste of CPU and computer memory. Anyway, it is always better to understand what you are doing…

Check the example below and the copies and views part of the quickstart tutorial.

>>> import numpy as np
>>> a = np.arange(30).reshape((3,10))
>>> a
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
 
>>> b = a[1, :]
>>> b
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
 
>>> b[3:7] = 0
>>> b
array([10, 11, 12,  0,  0,  0,  0, 17, 18, 19])
 
>>> a
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12,  0,  0,  0,  0, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
 
>>> a[:, 2:4] = -1
>>> a
array([[ 0,  1, -1, -1,  4,  5,  6,  7,  8,  9],
       [10, 11, -1, -1,  0,  0,  0, 17, 18, 19],
       [20, 21, -1, -1, 24, 25, 26, 27, 28, 29]])
 
>>> b
array([10, 11, -1, -1,  0,  0,  0, 17, 18, 19])
 
>>> c = a[1, :].copy()
>>> c
array([10, 11, -1, -1,  0,  0,  0, 17, 18, 19])
 
>>> c[:] = 9
>>> c
array([9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
 
>>> b
array([10, 11, -1, -1,  0,  0,  0, 17, 18, 19])
 
>>> a
array([[ 0,  1, -1, -1,  4,  5,  6,  7,  8,  9],
       [10, 11, -1, -1,  0,  0,  0, 17, 18, 19],
       [20, 21, -1, -1, 24, 25, 26, 27, 28, 29]])

Extra numpy information

More information about array indexing:
- Examples:
  - indirect_indexing_2.py.txt: Take a vertical slice in a 3D zyx array, along a varying y 'path'
- Indexing (index arrays, boolean index arrays, np.newaxis, Ellipsis, variable numbers of indices, …)
- Fancy indexing and the ix_() function
- Indexing (in the numpy reference manual)
- Indexing routines
More information about arrays:
Dealing with special numerical values (Nan, inf)
- If you know that your data has missing values, it is cleaner and safer to handle them with masked arrays!
- Handling numerical exceptions
- Floating point error handling

cdms2 and netCDF4

There is a good chance that your input array data will come from a file in the NetCDF format.

Depending on which python distribution you are using, you can use the cdms2 or or netCDF4 modules to read the data.

cdms2

Summary: cdms2 can read/write netCDF files (and read grads dat+ctl files) and provides a higher level interface than netCDF4. cdms2 is available in the UV-CDAT distribution, and can theoretically be installed independently of UV-CDAT (e.g. it will be installed when you install CMOR in conda). When you can use cdms2, you also have access to cdtime, that is very useful for handling time axis data.

How to get started:

read JYP's cdms tutorial, starting at page 54
1. the tutorial is in French (soooorry!)
2. you have to replace cdms with cdms2, and MV with MV2 (sooorry about that, the tutorial was written when CDAT was based on Numeric instead of numpy to handle array data)
read the official cdms documentation (link may change)

netCDF4

Summary: netCDF4 can read/write netCDF files and is available in most python distributions

Where: http://unidata.github.io/netcdf4-python/

CDAT-related resources

Some links, in case they can't be found easily on the UV-CDAT web site…

Matplotlib

Summary: there are lots of python libraries that you can use for plotting, but Matplotlib has become a de facto standard

Where: Matplotlib web site

Help on stack overflow: matplotlib help

The matplotlib documentation is good, but not always easy to use. A good way to start with matplotlib is to quickly read the following, practice, and read this section again

Have a quick look at the matplotlib gallery to get an idea of all you can do with matplotlib. Later, when you need to plot something, go back to the gallery to find some examples that are close to what you need and click on them to view their source code
- some examples are more pythonic (ie object oriented) than others, and some examples mix different styles of coding, which can be quite confusing. Try to use an object oriented way of doing things!
Use the free hints provided by JY!
1. You will usually initialize matplotlib with: import matplotlib.pyplot as plt
  - in some cases you may also need: import matplotlib as mpl
  - later, you may need other matplotlib related modules, for advanced usage
2. You need to know some matplotlib specific vocabulary:
  - a Matplotlib Figure (or canvas) is a graphical window in which you create your plots…
    - example: my_page = plt.figure()
    - if you need several display windows at the same time, create several figures!
      
      win_1 = plt.figure() win_2 = plt.figure()
    - the parts of a figure are often positioned in normalized coordinates: (0, 0) is the bottom left of the figure, and (1, 1) the top right
    - You don't really specify the page orientation (portrait or landscape) of a plot. If you want a portrait plot, it's up to you to create a plot that will look higher than it is large. The idea is not to worry about this and just check the final resulting plot: create a plot, save it, display the resulting png/pdf and then adjust the creation script
      - If you do have an idea of the layout of what you want to plot, it may be easier to explicitly specify the figure size/ratio at creation time, and then try to fill the normalized coordinates space of the figure
      - my_page = plt.figure(): the ratio of the default figure is landscape, because it is 33% larger than it is high. Creating a default figure will be OK most of the time!
      - my_page = plt.figure(figsize=(width, height)): create a figure with a custom ratio (sizes are considered to be in inches)
        
        my_page = plt.figure(figsize=(8.3, 11.7)): create a figure that will theoretically fill an A4 size page in portrait mode (check Dimensions Of A Series Paper Sizes if you need more size details)
  - a Matplotlib Axis is a plot inside a Figure… More details
    - reserve space for one plot that will use most of the available area of the figure/page:
      - my_plot = my_page.add_subplot(1, 1, 1): syntax is add_subplot(nrows, ncols, index)
      - my_plot = my_page.subplots()
    - create 3 plots on 1 column (each plot uses the full width of the figure):
      - top_plot = my_page.add_subplot(3, 1, 1) middle_plot = my_page.add_subplot(3, 1, 2) bottom_plot = my_page.add_subplot(3, 1, 3)
      - the following method is more efficient than add_subplot when there are lots of plots on a page
        plot_array = my_page.subplots(3, 1) top_plot = plot_array[0] middle_plot = plot_array[1] bottom_plot = plot_array[2]
      - creating a figure and axes with a single line: my_page, plot_array = plt.subplots(3, 1)
    - use my_page.add_axes(...) to add an axis in an arbirary location of the page
      my_page.add_axes([left, bottom, width, height])
  - a Matplotlib Artist or Patch is something (e.g a line, a group of markers, text, the legend…) plotted on the Figure/Axis
  - clearing the page (or part of it): you probably won't need that…
    - my_page.clear() or my_page.clf() or plt.clf(): clear the (current) figure
    - my_plot.clear() or my_plot.cla(): clear the (current) axis
3. some resources for having multiple plots on the same figure
  - Easily creating subplots
    - fig.add_subplot(...)
    - fig.add_axes(...)
    - plt.subplot(...)
    - plt.subplots(...) with an s at the end (demo)
    - subplots_adjust can be used to change the overall boundaries of the subplots on the figure, and the spacing between the subplots
      plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
      or my_page.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
      - hspace/wspace is the amount of height/width between the subplots
        
        hspace=0.1 is enough for just displaying the ticks and the labels, without the axis name
        
        use hspace=0 to stick the plots together vertically
        
        do not forget to disable the ticks where there is no space to plot them: my_plot.set_xticks([])
      - my_page.subplots_adjust(right=0.75) will leave 25% on the right of the page for adding a legend outside of a plot
    - You can also resize an existing (sub)plot the following way:
      1. Get the current size information: pl_x_bottomleft, pl_y_bottomleft, pl_width, pl_height = my_plot.get_position().bounds
      2. Set the new size: e.g reduce the height with my_plot.set_position( (pl_x_bottomleft, pl_y_bottomleft, pl_width, pl_height * 0.5) )
  - Subplots, axes and figures gallery
  - Customizing Figure Layouts Using GridSpec and Other Functions, constrained layout and tight layout
4. use my_page.savefig(...) to save a figure
  - savefig(…) must be called before plt.show()!
  - my_page.savefig('my_plot.pdf'): save the figure to a pdf file
  - my_page.savefig('my_plot.png', dpi=200, transparent=True, bbox_inches='tight'): save the figure to a png file at a higher resolution than the default (default is 100 dots per inch), with a transparent background and no extra space around the figure
5. display the figure and its plots, and start interacting (zooming, panning…) with them:
  plt.show()
6. it may be hard to (remember how to) work with colors and colorbars. Some examples from the matplotlib Gallery can help you!
  Note: A reversed version of each colormap is available by appending _r to the name, e.g., viridis_r
  - leftventricle_bulleye.py: associating different types of colormaps to a plot and colorbar
  - colorbar_only.py: the different types of colorbars (or plotting only a colorbar)
  - colormaps_reference.py: pre-defined colormaps
  - named_colors.py: named colors
  - More details about colors and colorbars below, in the Useful matplotlib reference pages section and the Graphics related resources section
7. if you don't see a part of what you have plotted, maybe it's hidden behind other elements! Use the zorder parameter to explicitly specify the plotting order/layers/depth
  - things should automatically work as expected if zorder is not explicitly specified
  - Use the zorder=NN parameter when creating objects. NN is an integer where 0 is the lowest value (the farthest from the eye), and objects are plotted above objects with a lower zorder value
  - Use matplotlib_object.set_order(NN) to change the order after an object has been created
8. you can use transparency to partially show what is behind some markers or other objects. Many artists accept the alpha parameter where 0.0 means that the object is completely transparent, and 1.0 means completely opaque
  e.g. my_plot.scatter(…, alpha=0.7)
9. sometimes the results of the python/matplolib commands are displayed immediately, sometimes not. It depends if you are in interactive or non-interactive mode
10. if your matplotlib is executed in a batch script, it will generate an error when trying to create (show()) a plot, because matplotlib expects to be able to display the figure on a screen by default.
  - Check how you can generate images offline
11. the documentation may mention backends. What?? Basically, you use python commands to create a plot, and the backend is the thing that will render your plot on the screen or in a file (png, pdf, etc…)
Read the Matplotlib tutorial by Nicolas Rougier
Download the pdf version of the manual. Do not print the 2300+ pages of the manual! Read the beginner's guide (Chapter FIVE of Part II) and have a super quick look at the table of contents of the whole document.

Useful matplotlib reference pages

Some plot types:
- plot(...): Plot y versus x as lines and/or markers
- scatter(...): A scatter plot of y vs x with varying marker size and/or color
- The plot function will be faster for scatterplots where markers don't vary in size or color
- contour(...) and contourf(...): draw contour lines and filled contours
X and Y axes parameters
- Axis range: my_plot.set_xlim(x_leftmost_value, x_rightmost_value)
  - Use the leftmost and rightmost values to specify the orientation of the axis (i.e the rightmost value can be smaller than the leftmost)
- Axis label: my_plot.set_xlabel(x_label_string, fontsize=axis_label_fontsize)
  - Use the extra labelpad parameter to move the label closer (negative value) to the axis or farther (positive value): e.g. my_plot.set_xlabel('A closer label', labelpad=-20
- Major (and minor) tick marks location: my_plot.set_xticks(x_ticks_values, minor=False)
  - Use an empty list if you don't want tick marks: my_plot.set_xticks([])
- Tick labels (if you don't want the default values): my_plot.set_xticklabels(x_ticks_labels, minor=False, fontsize=ticklabels_fontsize)
  - x_ticks_labels is a list of strings that has the same length as x_ticks_values. Use an empty string in the positions where you don't want a label
  - Many more options for ticks, labels, orientation, …
line parameters
- linestyle: solid, None, other (default styles example, custom styles example)
marker types
- Default marker size and edge width:
  - mpl.rcParams['lines.markersize'] ** 2 ⇒ 36
  - mpl.rcParams['lines.linewidth'] ⇒ 1.5
- Other marker attributes. For plot, all the markers have the same attributes, and for scatter the attributes can be the same, or specified for each marker
  - plot(...): fmt (see documentation) or marker and markerfacecolor/mfc (and markerfacecoloralt/mfcalt for dual color markers), markersize, markeredgewidth/mew, markeredgecolor (use markeredgecolor='none' if you don't want to plot the edge of the markers), fillstyle (full, None, other)
  - scatter(...): marker (marker type), c (color), s (size), linewidths (linewidth of the marker edges), edgecolors
colors and colormaps
- color demo
- named colors
- Reverting the colors: add _r at the end of the colormap name
- Number of colors in the my_cmap colormap (usually 256): my_cmap.N
  - Accessing the RGB color definition by index, from 0 to my_cmap.N - 1. Note that the index will saturate below 0 and above my_cmap.N - 1
```
>>> my_cmap.N
256
>>> my_cmap(-1) # Same as ano_cmap(0)
(0.3686274509803922, 0.30980392156862746, 0.6352941176470588, 1.0)
>>> my_cmap(0)
(0.3686274509803922, 0.30980392156862746, 0.6352941176470588, 1.0)
>>> my_cmap(1)
(0.36186082276047676, 0.3185697808535179, 0.6394463667820068, 1.0)
>>> my_cmap(255)
(0.6196078431372549, 0.00392156862745098, 0.25882352941176473, 1.0)
>>> my_cmap(256) # Same as ano_cmap(255)
(0.6196078431372549, 0.00392156862745098, 0.25882352941176473, 1.0)
>>> my_cmap(257) # Same as ano_cmap(255)
(0.6196078431372549, 0.00392156862745098, 0.25882352941176473, 1.0)
```
- Special colormap colors
  - my_cmap.set_bad(color='k'): color to be used for masked values
  - my_cmap.set_over(color='k'): color to be used for high out-of-range values if extend is specified and is 'both' or 'max'. Default color is my_cmap(my_cmap.N - 1)
  - my_cmap.set_under(color='k'): color to be used for low out-of-range values if extend is specified and is 'both' or 'min'. Default color is my_cmap(0)
colorbar
- Placing colorbars demo
- contourf + colorbar demo
text(...) and annotations
- Some titles:
  - Figure title: my_figure.suptitle('Figure title', x=xloc_in_normalized_coordinates, y=yloc_in_normalized_coordinates, …)
  - Axis Labels, title, and legend: my_plot.set_title('Plot title', …)
- fontsize: size in points, or (better!) string specifying a relative size (xx-small, x-small, small, medium, large, x-large, xx-large)
- all the text properties
legend(...) (legend demo, advanced legend guide)
- The legend will show the lines (or other objects) that were associated with a label with the label= keyword when creating/updating a plot
  - If there are some elements of a plot that you do not want to associate with a legend (e.g. there are several lines with the same color and markers, but you want to plot the legend only once), do not specify a label= keyword for these elements, or add a _ at the front of the label strings
- The legend is positioned somewhere (that can be specified) inside the plot. In order to place a legend outside the plot, use the bbox_to_anchor parameter
  - the parameters of bbox_to_anchor are in normalized coordinates of the current (sub)plot:
    - (0, 0) is the lower left corner of the plot, and (1, 1) the upper right corner
    - legend(… bbox_to_anchor=(1.05, 1.), loc='upper left', …) will put the upper left corner of the legend slightly right ((1.05, 1.)) of the upper right corner ((1, 1)) of the plot
  - if the legend is outside of the plot, you have to explicitly provide enough space for the legend on the page
    - e.g. with subplots_adjust, plt.subplots_adjust(right=0.75) will make all the plots use 75% on the left of the page, and leave 25% on the right for the legend
The figure(...) and the associated methods
The axes and the associated methods
matplotlib default config/settings can be queried and updated
- example: the default figure size (inches) is mpl.rcParams['figure.figsize'] ([6.4, 4.8])
- current settings' file: mpl.matplotlib_fname()
Animations (demo)

Misc Matplotlib tricks

Specifying the background color of a plot (e.g. when plotting a masked variable and you don't want the masked areas to be white)
- # make the background dark gray (call this before the contourf)
  plt.gca().patch.set_color('.25')
  plt.contourf(d)
  plt.show()
- trick source

Graphics related resources

Ten Simple Rules for Better Figures
Top 50 matplotlib Visualizations
Seaborn is a library for making attractive and informative statistical graphics in Python, built on top of matplotlib
- See also: Python Seaborn Tutorial For Beginners
Working with colors
- Choosing colormaps
- Beautiful colormaps for oceanography: cmocean
- ColorBrewer 2.0 is a tool that can help you understand, and experiment with sequential, diverging and qualitative colormaps

Basemap

Basemap is going to be slowly phased out, in favor of cartopy
More information in this:

Summary: Basemap is an extension of Matplotlib that you can use for plotting maps, using different projections

Where: Basemap web site

Help on stack overflow: basemap help

How to use basemap?

look at the examples
check the different projections
read some documentation!
1. the really nice basemap tutorial seems much better than the official documentation below
2. look at the detailed official documentation

Cartopy + Iris

Summary: Cartopy is a Python package for advanced map generation with a simple matplotlib interface and Iris is a Python package for analysing and visualising meteorological and oceanographic data sets

Where: Cartopy and Iris web sites

Examples:

Help on stack overflow: cartopy help

Maps and projections resources

About projections

Libraries

3D resources

Data file formats

We list here some resources about non-NetCDF data formats that can be useful

json files

More and more applications use json files as configuration files or as a mean to use text files to exchange data (through serialization/deserialization ).

json files look basically like a list of (nested) python dictionaries that would have been dumped to a text file

json module documentation
Working With JSON Data in Python tutorial
example script: /home/users/jypeter/CDAT/Progs/Devel/beaugendre/nc2json.py
A compact (not easy to read…) json file can be pretty-printed with
cat file.json | python -m json.tool | less

LiPD files

Resources for Linked PaleoData:

LiPD
Technical note: The Linked Paleo Data framework – a common tongue for paleoclimatology @ GMD
LiPD-utilities @ github

BagIt files

BagIt, a set of hierarchical file layout conventions for storage and transfer of arbitrary digital content.

Pandas

Summary: pandas is a library providing high-performance, easy-to-use data structures and data analysis tools

Where: Pandas web site

JYP's comment: pandas is supposed to be quite good for loading, processing and plotting time series, without writing custom code. It is very convenient for processing tables in xlsx files (or csv, etc…). You should at least have a quick look at:

Some Cheat Sheets (in the following order):
1. Basics: Pandas basics (associated with the Pandas Cheat Sheet for Data Science in Python pandas introduction page)
2. Intermediate: github Pandas doc page
3. Advanced: the cheat sheet on the Enthought workshops advertising page
Some tutorials:
- Pandas Cheat Sheet for Data Science in Python pandas introduction page
- The Statistics in Python tutorial that combines Pandas, Statsmodels and Seaborn

Scipy Lecture Notes

Summary: One document to learn numerics, science, and data with Python

Where: pdf - html

This is a really nice and useful document that is regularly updated and used for the EuroScipy tutorials. You will learn more things about python, numpy and matplotlib, debugging and optimizing scripts, and also learn about using python for statistics, image processing, machine learning, washing dishes (this is just to check if you have read this page), etc…

Quick Reference and cheat sheets

The nice and convenient Python 2.7 Quick Reference: pdf - html
- A possibly more up-date-version

Python 3 Quick reference and Cheat sheet

Jupyter Notebook Keyboard Shortcuts

Misc tutorials

PyFormat: With this site we try to show you the most common use-cases covered by the old and new style string formatting API with practical examples

Some good coding tips

The official Style Guide for Python Code (aka PEP 0008)

A Pragmatic Quick Reference

Debugging your code

There is only so much you can do with staring at your code in your favorite text editor, and adding print lines in your code (or using logging instead of print). The next step is to use the python debugger!

Debugging in text mode

Start the script with: python -m pdb my_script.py
Type run (or r) to go to the first line of the script
Type continue (or c) to execute the script to the end, or till the first breakpoint or error is reached
Use where (or w) to check the call stack that led to the current stop. Use up and down to navigate through the call stack and examine the values of the functions' parameters
Type break NNN to stop at line NNN
Use type(var) and print var to check the type and values of variables. You can also change the variables' values on the fly!
Type run (or r) to restart the script
Use next and step to execute some parts of the script line by line. If a code line calls a function:
- next (or n) will execute a function and stop on the next line
- step (or s) will stop at the first line inside the function
Check the debugger commands for details, or type help in the debugger for using the built-in help

Using pydebug

Depending on the distribution, the editor and the programming environment you use, you may have access to a graphical version of the debugger. UV-CDAT users can use pydebug my_script.py

Using a Python IDE

IDE = Integrated Development Environment

There are lots of ways to use Python and develop scripts, from using a lightweight approach (your favorite text editor with builtin python syntax highlighting, e.g. emacs and python -i myscript.py) to a full-fledged IDE. You'll find below some IDE related links

Spyder

Improving the performance of your code

You can already get a very efficient script by checking the following:

make sure that your script is not using too much memory (the amount depends on the computer you are using)! Your script should be scalable (e.g. keeps on working even when your data gets bigger), so it's a good idea to load only the data you need in memory (e.g. not all the time steps), and learn how to load chunks of data

make sure that you are using array/vector syntax and masks, instead of using explicit loops and tests. The numpy documentation is big, because there are lots of optimized functions to help you! If you are stuck, ask JY or somebody else who is used to numpy.

If your script is still not fast enough, there is a lot you can do to improve it, without resorting to parallelization (that may introduce extra bugs rather that extra performance). See the sections below

Hint: before optimizing your script, you should spent some time profiling it, in order to only spend time improving the slow parts of your script

Useful packages

Numexpr: Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like “3*a+4*b”) are accelerated and use less memory than doing the same calculation in Python.
PyTables: PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data

Tutorials by Ian Osvald

Python 2.7 vs Python 3

The official Porting Python 2 Code to Python 3 page gives the required information to make the transition from python 2 to python 3. It is still safe to use Python 2.7, so there is no rush to change to Python 3.

What now?

You can do a lot more with python! But if you have read at least a part of this page, you should be able to find and use the modules you need. Make sure you do not reinvent the wheel! Use existing packages when possible, and make sure to report bugs or errors in the documentations when you find some

[ PMIP3 Wiki Home ] - [ Help! ] - [ Wiki syntax ]

Table of Contents