User Tools

Site Tools


other:python:jyp_steps

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
other:python:jyp_steps [2016/01/29 15:41]
jypeter Added the numpy for matlab users references
other:python:jyp_steps [2017/08/30 09:57]
jypeter [Graphics related resources] Added link to Seaborn tutorial
Line 1: Line 1:
 ====== JYP's recommended steps for learning python ====== ====== JYP's recommended steps for learning python ======
 +
 +<note tip>If you don't know which python distribution to use and how to start the python interpreter,​ you should first read the [[starting|Working with Python]] page</​note>​
  
 As can be expected, there is **a lot** of online python documentation available, and it's easy to get lost. You can always use google to find an answer to your problem, and you will probably end up looking at lots of answers on [[http://​stackoverflow.com/​questions/​tagged/​python|Stack Overflow]] or a similar site. But it's always better to know where you can find some good documentation... and to spend some time to read the documentation As can be expected, there is **a lot** of online python documentation available, and it's easy to get lost. You can always use google to find an answer to your problem, and you will probably end up looking at lots of answers on [[http://​stackoverflow.com/​questions/​tagged/​python|Stack Overflow]] or a similar site. But it's always better to know where you can find some good documentation... and to spend some time to read the documentation
Line 11: Line 13:
 You can start using python by reading the {{:​other:​python:​python_intro_ipsl_oct2013_v2.pdf|Bien démarrer avec python}} tutorial that was used during a 2013 IPSL python class: You can start using python by reading the {{:​other:​python:​python_intro_ipsl_oct2013_v2.pdf|Bien démarrer avec python}} tutorial that was used during a 2013 IPSL python class:
   * this tutorial is in French (my apologies for the lack of translation,​ but it should be easy to understand)   * this tutorial is in French (my apologies for the lack of translation,​ but it should be easy to understand)
 +    * If you have too much trouble understanding this French Tutorial, you can read the first 6 chapters of the **Tutorial** in [[#​the_official_python_documentation|the official Python documentation]] and chapters 1.2.1 to 1.2.5 in the [[#​scipy_lecture_notes|Scipy Lecture Notes]]. Once you have read these, you can try to read the French tutorial again
   * it's an introduction to python (and programming) for the climate scientist: after reading this tutorial, you should be able to do most of the things you usually do in a shell script   * it's an introduction to python (and programming) for the climate scientist: after reading this tutorial, you should be able to do most of the things you usually do in a shell script
     * python types, tests, loops, reading a text file     * python types, tests, loops, reading a text file
Line 49: Line 52:
 Where: [[http://​docs.scipy.org/​doc/​|html and pdf documentation]] Where: [[http://​docs.scipy.org/​doc/​|html and pdf documentation]]
  
-How to get started?+==== Getting ​started ​==== 
   - always remember that indices start at ''​0''​ and that the last element of an array is at index ''​-1''​!\\ First learn about //​indexing//​ and //slicing// by manipulating strings, as shown in [[#​part1|Part 1]] above (try '''​This document by JY is awesome!'​[::​-1]''​ and '''​This document by JY is awesome!'​[slice(None,​ None, -1)]''​) 8-)   - always remember that indices start at ''​0''​ and that the last element of an array is at index ''​-1''​!\\ First learn about //​indexing//​ and //slicing// by manipulating strings, as shown in [[#​part1|Part 1]] above (try '''​This document by JY is awesome!'​[::​-1]''​ and '''​This document by JY is awesome!'​[slice(None,​ None, -1)]''​) 8-)
   - if you are a Matlab user (but the references are interesting for others as well), you can read the following:   - if you are a Matlab user (but the references are interesting for others as well), you can read the following:
Line 59: Line 63:
     - Numpy Reference Guide     - Numpy Reference Guide
     - Scipy Reference Guide     - Scipy Reference Guide
 +
 +==== Beware of the array view side effects ====
 +
 +<note warning>​When you take a slice of an array, you get a **//​View//​** : an array that has a new shape but that still shares its data with the first array.
 +
 +That is not a problem when you only read the values, but **if you change the values of the //View//, you change the values of the first array** (and vice-versa)! If that is not what want, do not forget to **make a copy** of the data before working on it!
 +
 +//Views// are a good thing most of the time, so only make a copy of your data when needed, because otherwise copying a big array will just be a waste of CPU and computer memory. Anyway, it is always better to understand what you are doing... :-P
 +
 +Check the example below and the [[https://​docs.scipy.org/​doc/​numpy-dev/​user/​quickstart.html#​copies-and-views|copies and views]] part of the quickstart tutorial.
 +
 +<code python>
 +>>>​ import numpy as np
 +>>>​ a = np.arange(30).reshape((3,​10))
 +>>>​ a
 +array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
 +       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 +       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
 +
 +>>>​ b = a[1, :]
 +>>>​ b
 +array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
 +
 +>>>​ b[3:7] = 0
 +>>>​ b
 +array([10, 11, 12,  0,  0,  0,  0, 17, 18, 19])
 +
 +>>>​ a
 +array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
 +       [10, 11, 12,  0,  0,  0,  0, 17, 18, 19],
 +       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
 +       
 +>>>​ a[:, 2:4] = -1
 +>>>​ a
 +array([[ 0,  1, -1, -1,  4,  5,  6,  7,  8,  9],
 +       [10, 11, -1, -1,  0,  0,  0, 17, 18, 19],
 +       [20, 21, -1, -1, 24, 25, 26, 27, 28, 29]])
 +       
 +>>>​ b
 +array([10, 11, -1, -1,  0,  0,  0, 17, 18, 19])
 +
 +>>>​ c = a[1, :].copy()
 +>>>​ c
 +array([10, 11, -1, -1,  0,  0,  0, 17, 18, 19])
 +
 +>>>​ c[:] = 9
 +>>>​ c
 +array([9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
 +
 +>>>​ b
 +array([10, 11, -1, -1,  0,  0,  0, 17, 18, 19])
 +
 +>>>​ a
 +array([[ 0,  1, -1, -1,  4,  5,  6,  7,  8,  9],
 +       [10, 11, -1, -1,  0,  0,  0, 17, 18, 19],
 +       [20, 21, -1, -1, 24, 25, 26, 27, 28, 29]])
 +</​code></​note>​
  
 ===== cdms2 and netCDF4 ===== ===== cdms2 and netCDF4 =====
  
-There is a good chance that your input array data will come from a file in the [[http://​www.unidata.ucar.edu/​software/​netcdf/​|NetCDF]] format. Depending on which [[other:python:starting#some_python_distributions|python distribution]] you are using, you can use the //cdms2// or or //netCDF4// modules to read the data.+There is a good chance that your input array data will come from a file in the [[other:newppl:starting#netcdf_and_file_formats|NetCDF format]].
  
-Note: the NetCDF file format is self-documented,​ and the metadata of climate date files often follows the [[http://​cfconventions.org/​|CF (Climate and Forecast) Metadata Conventions]]+Depending on which [[other:python:​starting#​some_python_distributions|python distribution]] you are using, you can use the //cdms2// or or //netCDF4// modules to read the data.
  
 ==== cdms2 ==== ==== cdms2 ====
  
-Summary: cdms2 can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. Unfortunately,​ cdms2 is only available in the UV-CDAT distribution,​ and distributions where somebody has installed some version of //​cdat-lite//​. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data.+Summary: cdms2 can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. Unfortunately,​ cdms2 is only available in the [[other:​python:​starting#​uv-cdat|UV-CDAT distribution]], and distributions where somebody has installed some version of //​cdat-lite//​. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data.
  
 How to get started: How to get started:
Line 80: Line 141:
 ==== netCDF4 ==== ==== netCDF4 ====
  
-Summary: netCDF4 can read/write netCDF files and is available in most python distributions+Summary: ​//netCDF4 can read/write netCDF files and is available in most python distributions//
  
 Where: [[http://​unidata.github.io/​netcdf4-python/​]] Where: [[http://​unidata.github.io/​netcdf4-python/​]]
Line 91: Line 152:
 Where: [[http://​matplotlib.org|Matplotlib web site]] Where: [[http://​matplotlib.org|Matplotlib web site]]
  
-The documentation is good, but not always easy to use. A good way to start with matplotlib is to:+Help on //stack overflow//: [[https://​stackoverflow.com/​questions/​tagged/​matplotlib|matplotlib help]] 
 + 
 +The documentation is good, but not always easy to use. <wrap hi>A good way to start with matplotlib</​wrap> ​is to:
   - Look at the [[http://​matplotlib.org/​gallery.html|matplotlib gallery]] to get an idea of all you can do with matplotlib. Later, when you need to plot something, come back to the gallery to find some examples that are close to what you need and click on them to get the sources   - Look at the [[http://​matplotlib.org/​gallery.html|matplotlib gallery]] to get an idea of all you can do with matplotlib. Later, when you need to plot something, come back to the gallery to find some examples that are close to what you need and click on them to get the sources
   - Use the free hints provided by JY!   - Use the free hints provided by JY!
Line 101: Line 164:
   - Read the [[http://​www.labri.fr/​perso/​nrougier/​teaching/​matplotlib/​|Matplotlib tutorial by Nicolas Rougier]]   - Read the [[http://​www.labri.fr/​perso/​nrougier/​teaching/​matplotlib/​|Matplotlib tutorial by Nicolas Rougier]]
   - Download the [[http://​matplotlib.org/​contents.html|pdf version of the manual]]. **Do not print** the 2800+ pages of the manual! Read the beginner'​s guide (Chapter //FIVE// of //Part II//) and have a super quick look at the table of contents of the whole document.   - Download the [[http://​matplotlib.org/​contents.html|pdf version of the manual]]. **Do not print** the 2800+ pages of the manual! Read the beginner'​s guide (Chapter //FIVE// of //Part II//) and have a super quick look at the table of contents of the whole document.
 +
 +===== Graphics related resources =====
 +
 +  * [[http://​seaborn.pydata.org/​|Seaborn]] is a library for making attractive and informative statistical graphics in Python, built on top of matplotlib
 +    * See also: [[https://​www.datacamp.com/​community/​tutorials/​seaborn-python-tutorial|
 +Python Seaborn Tutorial For Beginners]]
 +  * [[http://​colorbrewer2.org|ColorBrewer 2.0]] is a tool that can help you understand, and experiment with //​sequential//,​ //​diverging//​ and //​qualitative//​ colormaps
 +
  
 ===== Basemap ===== ===== Basemap =====
  
-Summary: Basemap is an extension of Matplotlib that you can use for plotting maps, using different projections+<note warning>​Basemap is going to be slowly phased out, in favor of [[#​cartopy]]\\ More information in this: 
 +  * [[https://​github.com/​SciTools/​cartopy/​issues/​920|cartopy github issue]] 
 +  * [[https://​github.com/​matplotlib/​basemap/​issues/​267|basemap github issue]] 
 +</​note>​ 
 + 
 +Summary: ​//Basemap is an extension of Matplotlib that you can use for plotting maps, using different projections//
  
 Where: [[http://​matplotlib.org/​basemap/​|Basemap web site]] Where: [[http://​matplotlib.org/​basemap/​|Basemap web site]]
 +
 +Help on //stack overflow//: [[https://​stackoverflow.com/​questions/​tagged/​matplotlib-basemap|basemap help]]
  
 How to use basemap? How to use basemap?
   - look at the [[http://​matplotlib.org/​basemap/​users/​examples.html|examples]]   - look at the [[http://​matplotlib.org/​basemap/​users/​examples.html|examples]]
   - check the [[http://​matplotlib.org/​basemap/​users/​mapsetup.html|different projections]]   - check the [[http://​matplotlib.org/​basemap/​users/​mapsetup.html|different projections]]
-  - look at the [[http://​matplotlib.org/​basemap/​api/​basemap_api.html#​module-mpl_toolkits.basemap|detailed documentation]]+  ​- read some documentation! 
 +    - the **really nice** [[http://​basemaptutorial.readthedocs.io/​en/​latest/​index.html|basemap tutorial]] seems much better than the official documentation below 
 +    ​- look at the [[http://​matplotlib.org/​basemap/​api/​basemap_api.html#​module-mpl_toolkits.basemap|detailed ​official ​documentation]] 
 + 
 +===== Cartopy + Iris ===== 
 + 
 +Summary: //Cartopy is a Python package for advanced map generation with a simple matplotlib interface// and //Iris is a Python package for analysing and visualising meteorological and oceanographic data sets// 
 + 
 +Where: [[http://​scitools.org.uk/​cartopy/​docs/​latest/​|Cartopy]] and [[http://​scitools.org.uk/​iris/​index.html|Iris]] web sites 
 + 
 +Examples: 
 +  * [[http://​scitools.org.uk/​cartopy/​docs/​latest/​gallery.html|Gallery on the Cartopy web site]] 
 +  * [[http://​scitools.org.uk/​iris/​docs/​latest/​gallery.html|Gallery on the Iris web site]] 
 +  * [[http://​scitools.org.uk/​iris/​docs/​latest/​examples/​index.html|Examples on the Iris web site]] 
 + 
 +Help on //stack overflow//: [[https://​stackoverflow.com/​questions/​tagged/​cartopy|cartopy help]] 
 + 
 +===== Pandas ===== 
 + 
 +Summary: //pandas is a library providing high-performance,​ easy-to-use data structures and data analysis tools// 
 + 
 +Where: [[http://​pandas.pydata.org|Pandas web site]]
  
 +JYP's comment: pandas is supposed to be quite good for loading, processing and plotting time series, without writing custom code. You should at least have a quick look at:
 +  * The [[http://​www.scipy-lectures.org/​packages/​statistics/​index.html|Statistics in Python]] tutorial that combines Pandas, [[http://​statsmodels.sourceforge.net/​|Statsmodels]] and [[http://​seaborn.pydata.org/​|Seaborn]]
 +  * the cheat sheet on the [[https://​www.enthought.com/​services/​training/​pandas-mastery-workshop/​|Enthought workshops advertising page]]
 +  * the cheat sheet on the [[https://​github.com/​pandas-dev/​pandas/​tree/​master/​doc/​cheatsheet|github Pandas doc page]]
  
 ===== Scipy Lecture Notes ===== ===== Scipy Lecture Notes =====
Line 120: Line 223:
 Where: [[http://​www.scipy-lectures.org/​_downloads/​ScipyLectures-simple.pdf|pdf]] - [[http://​www.scipy-lectures.org/​|html]] Where: [[http://​www.scipy-lectures.org/​_downloads/​ScipyLectures-simple.pdf|pdf]] - [[http://​www.scipy-lectures.org/​|html]]
  
-This is a really nice document that is regularly updated and used for the [[https://​www.euroscipy.org/​|EuroScipy]] tutorials. You will learn more things about python, numpy and matplotlib, debugging and optimizing scripts, and also learn about using python for statistics, image processing, machine learning, washing dishes (this is just to check if you have read this page), etc...+This is **a really nice and useful ​document** that is regularly updated and used for the [[https://​www.euroscipy.org/​|EuroScipy]] tutorials. You will learn more things about python, numpy and matplotlib, debugging and optimizing scripts, and also learn about using python for statistics, image processing, machine learning, washing dishes (this is just to check if you have read this page), etc...
  
 ===== Quick Reference ===== ===== Quick Reference =====
Line 131: Line 234:
  
   * [[http://​blog.codinghorror.com/​a-pragmatic-quick-reference/​|A Pragmatic Quick Reference]]   * [[http://​blog.codinghorror.com/​a-pragmatic-quick-reference/​|A Pragmatic Quick Reference]]
 +
 +===== Debugging your code =====
 +
 +There is only so much you can do with staring at your code in your favorite text editor, and adding ''​print''​ lines in your code (or using [[https://​docs.python.org/​2/​howto/​logging.html#​logging-basic-tutorial|logging]] instead of ''​print''​). The next step is to **use the python debugger**!
 +
 +==== Debugging in text mode ====
 +
 +  - Start the script with: ''​python -m pdb my_script.py''​
 +  - Type ''​run''​ (or **r**) to go to the first line of the script
 +  - Type ''​continue''​ (or **c**) to execute the script to the end, or till the first breakpoint or error is reached
 +  - Use ''​where''​ (or **w**) to check the call stack that led to the current stop. Use ''​up''​ and ''​down''​ to navigate through the call stack and examine the values of the functions'​ parameters
 +  - Type ''​break NNN''​ to stop at line NNN
 +  - Use ''​type(var)''​ and ''​print var''​ to check the type and values of variables. You can also change the variables'​ values on the fly!
 +  - Type ''​run''​ (or **r**) to restart the script
 +  - Use ''​next''​ and ''​step''​ to execute some parts of the script line by line. If a code line calls a function:
 +    * ''​next''​ (or **n**) will execute a function and stop on the next line
 +    * ''​step''​ (or **s**) will stop at the first line **inside the function**  ​
 +  - Check the [[https://​docs.python.org/​2/​library/​pdb.html#​debugger-commands|debugger commands]] for details, or type ''​help''​ in the debugger for using the built-in help
 +
 +==== Using pydebug ====
 +
 +Depending on the distribution,​ the editor and the programming environment you use, you may have access to a graphical version of the debugger. UV-CDAT users can use ''​pydebug my_script.py''​
 +
 +===== Using a Python IDE =====
 +
 +**IDE** = //​Integrated Development Environment//​
 +
 +There are lots of ways to use Python and develop scripts, from using a lightweight approach (your favorite text editor with builtin python syntax highlighting,​ e.g. **emacs** and ''​python -i myscript.py''​) to a full-fledged IDE. You'll find below some IDE related links
 +
 +  * [[https://​www.datacamp.com/​community/​tutorials/​data-science-python-ide|Top 5 Python IDEs For Data Science]]
 +  * [[http://​noeticforce.com/​best-python-ide-for-programmers-windows-and-mac|Python IDE: The10 Best IDEs for Python Programmers]]
 +  * [[https://​wiki.python.org/​moin/​IntegratedDevelopmentEnvironments]]
 +
 +==== Spyder ====
 +
 +  * [[https://​github.com/​spyder-ide/​spyder|Home page]]
 +  * [[http://​pythonhosted.org/​spyder/​|Documentation]]
 +
  
 ===== Improving the performance of your code ===== ===== Improving the performance of your code =====
Line 143: Line 284:
  
 Hint: before optimizing your script, you should spent some time //​profiling//​ it, in order to only spend time improving the slow parts of your script Hint: before optimizing your script, you should spent some time //​profiling//​ it, in order to only spend time improving the slow parts of your script
 +
 +==== Useful packages ====
 +
 +  * [[https://​github.com/​pydata/​numexpr|Numexpr]]:​ //Numexpr is a **fast numerical expression evaluator for NumPy**. With it, expressions that operate on arrays (like "​3*a+4*b"​) are accelerated and use less memory than doing the same calculation in Python.//
 +  * [[http://​www.pytables.org/​|PyTables]]:​ //PyTables is a package for managing hierarchical datasets and designed to efficiently and **easily cope with extremely large amounts of data**//
  
 ==== Tutorials by Ian Osvald ==== ==== Tutorials by Ian Osvald ====
other/python/jyp_steps.txt · Last modified: 2024/03/07 10:15 by jypeter