User Tools

Site Tools


other:python:jyp_steps

This is an old revision of the document!


Following JYP's recommended steps

As can be expected, there is a lot of online python documentation available, and it's easy to get lost. You can always use google to find an answer to your problem, and you will probably end up looking at lots of answers on Stack Overflow or a similar site. But it's always better to know where you can find some good documentation… and to spend some time to read the documentation

This page tries to list some python for the scientist related resources, in a suggested reading order. Do not print anything (or at least not everything), but it's a good idea to download all the pdf files in the same place, so that you can easily open and search the documents

The official python documentation

You do not need to read all the python documentation at this step, but it is really well made and you should at least have a look at it. The Tutorial is really well made, and you should have a look at the table of content of the Python Standard Library. There is a lot in the default library that can make your life easier

Python 2.7

Python 3

Numpy and Scipy

Summary: Python provides ordered objects (e.g. lists, strings, basic arrays, …) and some math operators, but you can't do real heavy computation with these. Numpy makes it possible to work with multi-dimensional data arrays, and using array syntax and masks (instead of explicit nested loops and tests) and the apropriate numpy functions will allow you to get performance similar to what you would get with a compiled program! Scipy adds more scientific functions

How to get started?

  1. always remember that indices start at 0 and that the last element of an array is at index -1! Learn about indexing and slicing by manipulating a string (try 'This document by JY is awesome!'[::-1])
  2. have a quick look at the full documentation to know where things are
    1. Numpy User Guide:
    2. Numpy Reference Guide
    3. Scipy Reference Guide

cdms2 and netcdf4

There is a good chance that your input array data will come from a file in the NetCDF format. Depending on which python distribution you are using, you can use the cdms2 or or netCDF4 modules to read the data.

Note: the NetCDF file format is self-documented, and the metadata of climate date files often follows the CF (Climate and Forecast) Metadata Conventions

cdms2

Summary: cdms2 can read/write netCDF files (and read grads dat+ctl files) and provides a higher level interface than netCDF4. Unfortunately, cdms2 is only available in the UV-CDAT distribution, and distributions where somebody has installed some version of cdat-lite. When you can use cdms2, you also have access to cdtime, that is very useful for handling time axis data.

How to get started:

  1. read JYP's cdms tutorial, starting at page 54
    1. the tutorial is in French (soooorry!)
    2. you have to replace cdms with cdms2, and MV with MV2 (sooorry about that, the tutorial was written when CDAT was based on Numeric instead of numpy to handle array data)
  2. ask questions and get answers on the UV-CDAT askbot

netCDF4

Summary: netCDF4 can read/write netCDF files and is available in most python distributions

Where: http://unidata.github.io/netcdf4-python/

Matplotlib

Summary: there are lots of python libraries that you can use for plotting, but Matplotlib has become a de facto standard

Where: Matplotlib web site

The documentation is good, but not always easy to use. A good way to start with matplotlib is to:

  1. Look at the matplotlib gallery to get an idea of all you can do with matplotlib. Later, when you need to plot something, come back to the gallery to find some examples that are close to what you need and click on them to get the sources
  2. Use the free hints provided by JY!
    1. a Matplotlib Figure is a graphical window in which you make your plots…
    2. a Matplotlib Axis is a plot inside a Figure… More details
    3. some examples are more pythonic (ie object oriented) than others, some example mix different styles of coding, all this can be confusing. Try to use an object oriented way of doing things!
    4. sometimes the results of the python/matplolib commands are shown directly, sometimes not. It depends if you are in interactive or non-interactive mode
    5. the documentation may mention backends. What?? Basically, you use python commands to create a plot, and the backend is the thing that will render your plot on the screen or in a file (png, pdf, etc…)
  3. Download the pdf version of the manual. Do not print the 2800+ pages of the manual! Read the beginner's guide (Chapter FIVE of Part II) and have a super quick look at the table of contents of the whole document.

Basemap

Summary: Basemap is an extension of Matplotlib that you can use for plotting maps, using different projections

Where: Basemap web site

How to use basemap?

  1. look at the examples

Scipy Lecture Notes

Summary: One document to learn numerics, science, and data with Python

Where: pdf - html

This is a really nice document that is regularly updated and used for the EuroScipy tutorials

Quick Reference

  • The nice and convenient Python 2.7 Quick Reference: pdf - html

Some good coding tips

Improving the performance of your code

You can already get a very efficient script by checking the following:

  • make sure that your script is not using too much memory (the amount depends on the computer you are using)! Your script should be scalable (e.g. keeps on working even when your data gets bigger), so it's a good idea to load only the data you need in memory (e.g. not all the time steps), and learn how to load chunks of data
  • make sure that you are using array/vector syntax and masks, instead of using explicit loops and tests. The numpy documentation is big, because there are lots of optimized functions to help you! If you are stuck, ask JY or somebody else who is used to numpy.

If your script is still not fast enough, there is a lot you can do to improve it, without resorting to parallelization (that may introduce extra bugs rather that extra performance). See the sections below

Hint: before optimizing your script, you should spent some time profiling it, in order to only spend time improving the slow parts of your script

Tutorials by Ian Osvald

Python 2.7 vs Python 3

The official Porting Python 2 Code to Python 3 page gives the required information to make the transition from python 2 to python 3. It is still safe to use Python 2.7, so there is no rush to change to Python 3.





[ PMIP3 Wiki Home ] - [ Help! ] - [ Wiki syntax ]

other/python/jyp_steps.1453462808.txt.gz · Last modified: 2016/01/22 11:40 by jypeter