User Tools

Site Tools


other:python:jyp_steps

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
other:python:jyp_steps [2019/08/30 09:36]
jypeter [Useful matplotlib reference pages] suptitle parameters
other:python:jyp_steps [2024/03/07 10:15] (current)
jypeter Added a Protocol Buffers section to the file formats
Line 13: Line 13:
 You can start using python by reading the {{:​other:​python:​python_intro_ipsl_oct2013_v2.pdf|Bien démarrer avec python}} tutorial that was used during a 2013 IPSL python class: You can start using python by reading the {{:​other:​python:​python_intro_ipsl_oct2013_v2.pdf|Bien démarrer avec python}} tutorial that was used during a 2013 IPSL python class:
   * this tutorial is in French (my apologies for the lack of translation,​ but it should be easy to understand)   * this tutorial is in French (my apologies for the lack of translation,​ but it should be easy to understand)
-    * If you have too much trouble understanding this French Tutorial, you can read the first 6 chapters of the **Tutorial** in [[#​the_official_python_documentation|the official Python documentation]] and chapters 1.2.1 to 1.2.5 in the [[#scipy_lecture_notes|Scipy Lecture Notes]]. Once you have read these, you can try to read the French tutorial again+    * If you have too much trouble understanding this French Tutorial, you can read the first 6 chapters of the **Tutorial** in [[#​the_official_python_documentation|the official Python documentation]] and chapters 1.2.1 to 1.2.5 in the [[#scientific_python_lectures|Scientific Python Lectures]]. Once you have read these, you can try to read the French tutorial again
   * it's an introduction to python (and programming) for the climate scientist: after reading this tutorial, you should be able to do most of the things you usually do in a shell script   * it's an introduction to python (and programming) for the climate scientist: after reading this tutorial, you should be able to do most of the things you usually do in a shell script
     * python types, tests, loops, reading a text file     * python types, tests, loops, reading a text file
     * the tutorial is very detailed about string handling, because strings offer an easy way to practice working with indices (indexing and slicing), before indexing numpy arrays. And our usual pre/​post-processing scripts often need to do a lot of string handling in order to generate the file/​variable/​experiment names     * the tutorial is very detailed about string handling, because strings offer an easy way to practice working with indices (indexing and slicing), before indexing numpy arrays. And our usual pre/​post-processing scripts often need to do a lot of string handling in order to generate the file/​variable/​experiment names
   * after reading this tutorial, you should practice with the following:   * after reading this tutorial, you should practice with the following:
-    * [[https://files.lsce.ipsl.fr/​public.php?​service=files&​t=9731fdad4521ac5fa6e84b392d3a2e44|Basic python training test (ipython notebook version)]]+    * [[https://sharebox.lsce.ipsl.fr/​index.php/​s/​S3EO8cLrhVDeQWA|Basic python training test (ipython notebook version)]]
     * {{:​other:​python:​tp_intro_python_oct2013_no_solutions.pdf|Basic python training test (pdf version)}}     * {{:​other:​python:​tp_intro_python_oct2013_no_solutions.pdf|Basic python training test (pdf version)}}
     * {{:​other:​python:​tp_intro_python_oct2013_full.pdf|Basic python training test (pdf version, with answers)}}     * {{:​other:​python:​tp_intro_python_oct2013_full.pdf|Basic python training test (pdf version, with answers)}}
Line 44: Line 44:
  
 [[https://​docs.python.org/​3/​|html]] - [[https://​docs.python.org/​3/​download.html|pdf (in a zip file)]] [[https://​docs.python.org/​3/​|html]] - [[https://​docs.python.org/​3/​download.html|pdf (in a zip file)]]
 +
 +
 +===== Scientific Python Lectures =====
 +
 +Summary: //One document to learn numerics, science, and data with Python//
 +
 +Note: this used to be called //Scipy Lecture Notes//
 +
 +Where: [[https://​lectures.scientific-python.org/​_downloads/​ScientificPythonLectures-simple.pdf|pdf]] - [[https://​lectures.scientific-python.org/​|html]]
 +
 +This is **a really nice and useful document** that is regularly updated and used for the [[https://​www.euroscipy.org/​|EuroScipy]] tutorials.
 +
 +This document will teach you lots of things about python, numpy and matplotlib, debugging and optimizing scripts, and about using python for statistics, image processing, machine learning, washing dishes (this is just to check if you have read this page), etc...
 +  * Example: the [[https://​lectures.scientific-python.org/​packages/​statistics/​index.html|Statistics in Python]] tutorial that combines [[other:​python:​jyp_steps#​pandas|Pandas]],​ [[http://​statsmodels.sourceforge.net/​|Statsmodels]] and [[http://​seaborn.pydata.org/​|Seaborn]]
  
  
Line 55: Line 69:
  
   - always remember that indices start at ''​0''​ and that the last element of an array is at index ''​-1''​!\\ First learn about //​indexing//​ and //slicing// by manipulating strings, as shown in [[#​part1|Part 1]] above (try '''​This document by JY is awesome!'​[::​-1]''​ and '''​This document by JY is awesome!'​[slice(None,​ None, -1)]''​) 8-)   - always remember that indices start at ''​0''​ and that the last element of an array is at index ''​-1''​!\\ First learn about //​indexing//​ and //slicing// by manipulating strings, as shown in [[#​part1|Part 1]] above (try '''​This document by JY is awesome!'​[::​-1]''​ and '''​This document by JY is awesome!'​[slice(None,​ None, -1)]''​) 8-)
-  - if you are a Matlab user (but the references are interesting for others as well), you can read the following:+  - if you are a **Matlab user** (but the references are interesting for others as well), you can read the following: 
 +    - [[https://​www.enthought.com/​wp-content/​uploads/​2019/​08/​Enthought-MATLAB-to-Python-White-Paper-1.pdf|Migrating from MATLAB to Python]] on the [[https://​www.enthought.com/​software-development/​|Enthought Software Development page]]
     - [[https://​docs.scipy.org/​doc/​numpy-dev/​user/​numpy-for-matlab-users.html|Numpy for Matlab users]]     - [[https://​docs.scipy.org/​doc/​numpy-dev/​user/​numpy-for-matlab-users.html|Numpy for Matlab users]]
     - [[http://​mathesaurus.sourceforge.net/​matlab-numpy.html|NumPy for MATLAB users]] (nice, but does not seem to be maintained any more)     - [[http://​mathesaurus.sourceforge.net/​matlab-numpy.html|NumPy for MATLAB users]] (nice, but does not seem to be maintained any more)
Line 63: Line 78:
     - Numpy Reference Guide     - Numpy Reference Guide
     - Scipy Reference Guide     - Scipy Reference Guide
 +  - read [[https://​github.com/​rougier/​numpy-100/​blob/​master/​100_Numpy_exercises.ipynb|100 numpy exercises]]
  
 ==== Beware of the array view side effects ==== ==== Beware of the array view side effects ====
Line 123: Line 139:
 ==== Extra numpy information ==== ==== Extra numpy information ====
  
-  ​* More information about array indexing:+<WRAP center round tip 60%> 
 +You can also check the [[other:​python:​misc_by_jyp#​numpy_related_stuff|numpy section]] of the //Useful python stuff// page 
 +</​WRAP>​ 
 + 
 + 
 +  ​* More information about **array indexing**:\\ <wrap em>​Always check what you are doing on a simple test case, when you use advanced/​fancy indexing!</​wrap>​
     * Examples:     * Examples:
       * {{ :​other:​python:​indirect_indexing_2.py.txt |}}: Take a vertical slice in a 3D zyx array, along a varying y '​path'​       * {{ :​other:​python:​indirect_indexing_2.py.txt |}}: Take a vertical slice in a 3D zyx array, along a varying y '​path'​
-    * [[https://docs.scipy.org/doc/numpy/​user/​basics.indexing.html|Indexing]] (//index arrays//, //boolean index arrays//, //​np.newaxis//,​ //​Ellipsis//,​ //variable numbers of indices//, ...) +    * [[https://numpy.org/doc/stable/​user/​basics.indexing.html|Array indexing basics (user guide)]] (//index arrays//, //boolean index arrays//, //​np.newaxis//,​ //​Ellipsis//,​ //variable numbers of indices//, ...) 
-    * [[https://docs.scipy.org/​doc/​numpy/​user/​quickstart.html#​fancy-indexing-and-index-tricks|Fancy indexing]] and [[https://docs.scipy.org/doc/numpy/​user/​quickstart.html#​the-ix-function|the ix_() function]] +    * [[https://numpy.org/doc/​stable/​reference/​arrays.indexing.html|Indexing routines (reference manual)]] 
-    * [[https://​docs.scipy.org/​doc/​numpy/​reference/​arrays.indexing.html|Indexing (in the numpy reference manual)]] +    * [[https://numpy.org/​doc/​stable/​user/​quickstart.html#​advanced-indexing-and-index-tricks|Advanced ​indexing ​and index tricks]] and [[https://numpy.org/doc/stable/​user/​quickstart.html#​the-ix-function|the ix_() function]]
-    * [[https://​docs.scipy.org/​doc/​numpy/​reference/​routines.indexing.html#​routines-indexing|Indexing routines]] +
   * More information about arrays:   * More information about arrays:
-    * [[https://docs.scipy.org/doc/numpy/​reference/​routines.array-creation.html#​routines-array-creation|Array creation routines]] +    * [[https://numpy.org/doc/stable/​reference/​routines.array-creation.html|Array creation routines]] 
-    * [[https://docs.scipy.org/doc/numpy/​reference/​routines.array-manipulation.html|Array manipulation routines]] +    * [[https://numpy.org/doc/stable/​reference/​routines.array-manipulation.html|Array manipulation routines]] 
-    * [[https://docs.scipy.org/​doc/​numpy/​reference/​maskedarray.html|Masked arrays]] +    * [[https://numpy.org/doc/​stable/​reference/​routines.sort.html|Sorting,​ searching, and counting routines]] 
-      * [[https://docs.scipy.org/doc/numpy/​reference/​routines.ma.html|Masked array operations]] +    * [[https://numpy.org/​doc/​stable/​reference/​maskedarray.html|Masked arrays]] 
-  * [[https://docs.scipy.org/doc/numpy/​user/​misc.html#​ieee-754-floating-point-special-values|Dealing with special numerical values]] (//Nan//, //inf//) +      * [[https://numpy.org/doc/stable/​reference/​routines.ma.html|Masked array operations]] 
-    * If you know that your data has missing values, it is cleaner and safer to handle them with [[https://docs.scipy.org/doc/numpy/​reference/​maskedarray.html|masked arrays]]! +  * [[https://numpy.org/doc/stable/​user/​misc.html#​ieee-754-floating-point-special-values|Dealing with special numerical values]] (//Nan//, //inf//) 
-    * [[https://docs.scipy.org/doc/numpy/​user/​misc.html#​how-numpy-handles-numerical-exceptions|Handling numerical exceptions]] +    * If you know that your data has missing values, it is cleaner and safer to handle them with [[https://numpy.org/doc/stable/​reference/​maskedarray.html|masked arrays]]! 
-    * [[https://docs.scipy.org/doc/numpy/​reference/​routines.err.html|Floating point error handling]]+    * If you know that some of your data //may// have masked values, play safe by explicitly using ''​np.ma.some_function()''​ rather than just ''​np.some_function()''​ 
 +      * More details in the [[https://github.com/​numpy/​numpy/​issues/​18675|Why/​when does np.something remove the mask of a np.ma array ?]] discussion 
 +    * [[https://​numpy.org/doc/stable/​user/​misc.html#​how-numpy-handles-numerical-exceptions|Handling numerical exceptions]] 
 +    * [[https://numpy.org/doc/stable/​reference/​routines.err.html|Floating point error handling]]
  
-===== cdms2 and netCDF4 ​=====+===== Using NetCDF files with Python ​=====
  
-There is a good chance that your input array data will come from a file in the [[other:​newppl:​starting#​netcdf_and_file_formats|NetCDF format]]. 
  
-Depending on which [[other:​python:​starting#​some_python_distributions|python distribution]] you are using, you can use the //cdms2// or or //netCDF4// modules to read the data.+==== What is NetCDF? ====
  
-==== cdms2 ====+  * If you are working with climate model output data, there is a good chance that your input array data will be stored in a NetCDF file!
  
-Summary: cdms2 can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. cdms2 is available in the [[other:python:starting#uv-cdat|UV-CDAT distribution]], ​and can theoretically be installed independently of UV-CDAT (e.g. it will be installed when you install [[https://​cmor.llnl.gov/​mydoc_cmor3_conda/​|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful ​for handling time axis data.+  * Read the [[other:newppl:starting#netcdf_and_related_conventions|NetCDF ​and related Conventions]] for more information
  
-How to get started: +  ​There may be different ways of dealing ​with NetCDF filesdepending ​on which [[other:python:​starting#​some_python_distributions|python distribution]] you have access to
-  - read [[http://​www.lsce.ipsl.fr/​Phocea/​file.php?​class=page&​file=5/​pythonCDAT_jyp_2sur2_070306.pdf|JYP'​s cdms tutorial]], starting at page 54 +
-    - the tutorial is in French (soooorry!) +
-    - you have to replace //cdms// with **cdms2**, and //​MV// ​with **MV2** (sooorry about thatthe tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data) +
-  - read the [[http://​cdms.readthedocs.io/​en/​docstanya/​index.html|official cdms documentation]] (link may change)+
  
  
-==== netCDF4 ​====+==== CliMAF and C-ESM-EP ​====
  
-Summary: ​//netCDF4 can read/write netCDF files and is available in most python distributions//+People using **//CMIPn// and model data on the IPSL servers** can easily search and process NetCDF files using:
  
-Where: ​[[http://unidata.github.io/netcdf4-python/​]]+  * the [[https://climaf.readthedocs.io/|Climate Model Assessment Framework (CliMAF)]] environment
  
-===== CDAT-related resources =====+  * and the [[https://​github.com/​jservonnat/​C-ESM-EP/​wiki|CliMAF Earth System Evaluation Platform (C-ESM-EP)]]
  
-Some links, in case they can't be found easily on the [[https://​uv-cdat.llnl.gov|UV-CDAT]] web site... 
  
-  * [[https://​uv-cdat.llnl.gov/​tutorials.html|Tutorials in ipython notebooks]] +==== xarray ====
-  * [[http://​cdat-vcs.readthedocs.io/​en/​latest/​|VCS:​ Visualization Control System]] +
-    * [[https://​github.com/​CDAT/​vcs/​issues/​238|Colormaps in vcs examples]] +
-  * [[https://​github.com/​CDAT/​cdat-site/​blob/​master/​eztemplate.md|EzTemplate Documentation]]+
  
-===== Matplotlib =====+[[https://​docs.xarray.dev/​|xarray]] makes working with labelled multi-dimensional arrays in Python simple, efficient, and fun! [...] It is particularly tailored to working with netCDF files
  
-Summary: there are lots of python libraries that you can use for plotting, but Matplotlib has become a //de facto// standard+=== Some xarray related resources ===
  
-Where: [[http://​matplotlib.org|Matplotlib web site]]+Notemore packages (than listed below) may be listed in the [[other:uvcdat:​cdat_conda:​cdat_8_2_1#​extra_packages_list|Extra packages list]] page
  
-Help on //stack overflow//: ​[[https://stackoverflow.com/questions/tagged/matplotlib|matplotlib help]]+  * [[https://docs.xarray.dev/en/stable/generated/​xarray.tutorial.load_dataset.html|xarray test datasets]]
  
-The matplotlib documentation is good, but not always easy to use. <wrap hi>A good way to start with matplotlib</​wrap>​ is to quickly read the following, practice, and read this section again +  ​* **[[https://​xcdat.readthedocs.io/|xCDAT]]: ''​xarray'' ​extended ​with Climate Data Analysis Tools**
-  - Have a quick look at the [[https://​matplotlib.org/​gallery/​index.html|matplotlib gallery]] to get an idea of all you can do with matplotlib. Later, when you need to plot something, go back to the gallery to find some examples that are close to what you need and click on them to view their source code +
-    ​some examples are more //​pythonic//​ (ie object oriented) than others, and some examples mix different styles of coding, which can be quite confusing. Try to [[http://​matplotlib.org/​faq/​usage_faq.html#​coding-styles|use an object oriented way of doing things]]! +
-  - Use the free hints provided by JY! +
-    - You will usually ​**initialize matplotlib** with: ''​import matplotlib.pyplot as plt''​ +
-      * in some cases you may also need: ''​import matplotlib as mpl''​ +
-      * later, you may need other matplotlib related modules, for advanced usage +
-    - You need to know some **matplotlib specific vocabulary**:​ +
-      * a Matplotlib **//​Figure//​** (or //canvas//) is a **graphical window** in which you create your plots... +
-        * example: ''​my_page = plt.figure()''​ +
-        * if you need several display windows at the same time, create several figures!\\ <​code>​win_1 = plt.figure() +
-win_2 = plt.figure()</​code>​ +
-        * the [[http://​matplotlib.org/​faq/​usage_faq.html#​parts-of-a-figure|parts of a figure]] are often positioned in //​normalized coordinates//:​ ''​(0,​ 0)''​ is the bottom left of the figure, and ''​(1,​ 1)''​ the top right +
-        * You don't really specify the **page orientation** (//​portrait//​ or //​landscape//​) of a plot. If you want a portrait plot, it's up to you to create a plot that will look higher than it is large. The idea is not to worry about this and just check the final resulting plot: create a plot, save it, display the resulting png/pdf and then adjust the creation script +
-          * If you do have an idea of the layout of what you want to plot, it may be easier to explicitly specify the figure size/ratio at creation time, and then try to //fill// the normalized coordinates space of the figure +
-          * ''​my_page = plt.figure()'':​ the ratio of the default figure is ''​landscape'',​ because it is 33% larger than it is high. Creating a default figure will be OK most of the time! +
-          * ''​my_page = plt.figure(figsize=(width,​ height))'':​ create a figure with a custom ratio (sizes are considered to be in inches) +
-            * ''​my_page = plt.figure(figsize=(8.3,​ 11.7))'':​ create a figure that will theoretically fill an A4 size page in portrait mode (check ​[[https://www.papersizes.org/a-paper-sizes.htm|Dimensions Of A Series Paper Sizes]] if you need more size details) +
-      * a Matplotlib **//​Axis//​** is a **plot** inside a Figure... [[http://​matplotlib.org/​faq/​usage_faq.html#​parts-of-a-figure|More details]] +
-        * reserve space for **one plot** that will use most of the available area of the figure/​page:​ +
-          * ''​my_plot = my_page.add_subplot(1,​ 1, 1)''​: syntax is ''​add_subplot(nrows,​ ncols, index)''​ +
-          * ''​my_plot = my_page.subplot**s**()''​ +
-        * create **3 plots on 1 column** (each plot uses the full width of the figure): +
-          * <​code>​top_plot = my_page.add_subplot(3,​ 1, 1) +
-middle_plot = my_page.add_subplot(3,​ 1, 2) +
-bottom_plot = my_page.add_subplot(3,​ 1, 3)</​code>​ +
-          * the following method is more efficient than add_subplot when there are lots of plots on a page<​code>​plot_array = my_page.subplots(3,​ 1) +
-top_plot = plot_array[0] +
-middle_plot = plot_array[1] +
-bottom_plot = plot_array[2]</​code>​ +
-          * creating a figure and axes with a single line: ''​my_page,​ plot_array = **plt**.subplots(3,​ 1)''​ +
-        * use [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.figure.Figure.html#​matplotlib.figure.Figure.add_axes|my_page.add_axes(...)]] to add an axis in an arbirary location of the page\\ ''​my_page.add_axes([left,​ bottom, width, height])''​ +
-      * a Matplotlib **//​Artist//​** or //Patch// is //​something//​ (e.g a line, a group of markers, text, the legend...) plotted ​ on the Figure/​Axis +
-      * **clearing** the //page// (or part of it): you probably won't need that... +
-        * ''​my_page.clear()''​ or ''​my_page.clf()''​ or ''​plt.clf()'':​ clear the (current) figure +
-        * ''​my_plot.clear()''​ or ''​my_plot.cla()'':​ clear the (current) axis +
-    - some resources for having multiple plots on the same figure +
-      * [[https://​matplotlib.org/​gallery/​recipes/​create_subplots.html#​sphx-glr-gallery-recipes-create-subplots-py|Easily creating subplots]] +
-        * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.figure.Figure.html#​matplotlib.figure.Figure.add_subplot|fig.add_subplot(...)]] +
-        * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.figure.Figure.html#​matplotlib.figure.Figure.add_axes|fig.add_axes(...)]] +
-        * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.pyplot.subplot.html|plt.subplot(...)]] +
-        * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.pyplot.subplots.html|plt.subplots(...)]] with an **s** at the end ([[https://​matplotlib.org/​gallery/​subplots_axes_and_figures/​subplots_demo.html|demo]]) +
-        * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.pyplot.subplots_adjust.html|subplots_adjust]] can be used to change the overall boundaries of the subplots on the figure, and the spacing between the subplots\\ ''​plt.subplots_adjust(left=None,​ bottom=None,​ right=None, top=None, wspace=None,​ hspace=None)''​\\ or ''​my_page.subplots_adjust(left=None,​ bottom=None,​ right=None, top=None, wspace=None,​ hspace=None)''​ +
-          * ''​hspace''/''​wspace''​ is the amount of height/​width between the subplots +
-            * ''​hspace=0.1''​ is enough for just displaying the ticks and the labels, without the axis name +
-            * use ''​hspace=0''​ to stick the plots together vertically +
-              * do not forget to disable the ticks where there is no space to plot them: ''​my_plot.set_xticks([])''​ +
-          * ''​my_page.subplots_adjust(right=0.75)''​ will leave 25% on the right of the page for adding a legend outside of a plot +
-        * You can also **resize an existing (sub)plot** the following way: +
-          - Get the current size information:​ ''​pl_x_bottomleft,​ pl_y_bottomleft,​ pl_width, pl_height = my_plot.get_position().bounds''​ +
-          - Set the new size: e.g reduce the height with ''​my_plot.set_position( (pl_x_bottomleft,​ pl_y_bottomleft,​ pl_width, pl_height ​ * 0.5) )''​ +
-      * [[https://​matplotlib.org/​gallery/​index.html#​subplots-axes-and-figures|Subplots,​ axes and figures]] gallery +
-      * [[https://​matplotlib.org/​tutorials/​intermediate/​gridspec.html#​sphx-glr-tutorials-intermediate-gridspec-py|Customizing Figure Layouts Using GridSpec and Other Functions]],​ [[https://​matplotlib.org/​tutorials/​intermediate/​constrainedlayout_guide.html|constrained layout]] and [[https://​matplotlib.org/​tutorials/​intermediate/​tight_layout_guide.html|tight layout]] +
-    - use [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.pyplot.savefig.html|my_page.savefig(...)]] to save a figure +
-      *  <wrap hi>​savefig(...) must be called **before** plt.show()!</​wrap>​ +
-      * ''​my_page.savefig('​my_plot.pdf'​)'':​ save the figure to a pdf file +
-      * ''​my_page.savefig('​my_plot.png',​ dpi=200, transparent=True,​ bbox_inches='​tight'​)'':​ save the figure to a png file at a higher resolution than the default (default is 100 dots per inch), with a transparent background and no extra space around the figure +
-    - **display** the figure and its plots, and **start interacting** (zooming, panning...) with them:\\ ''​plt.show()''​ +
-    - it may be hard to (remember how to) **work with colors //and colorbars//​**. Some examples from the [[https://​matplotlib.org/​gallery/​index.html|matplotlib Gallery]] can help you!\\ Note: A **reversed version of each colormap** is available by appending ''​_r''​ to the name, e.g., ''​viridis_r''​ +
-      * [[https://​matplotlib.org/​gallery/​specialty_plots/​leftventricle_bulleye.html|leftventricle_bulleye.py]]:​ associating different types of colormaps to a plot and colorbar +
-      * [[https://​matplotlib.org/​examples/​api/​colorbar_only.html|colorbar_only.py]]:​ the different types of colorbars (or plotting only a colorbar) +
-      * [[https://​matplotlib.org/​gallery/​color/​colormap_reference.html|colormaps_reference.py]]:​ pre-defined colormaps +
-      * [[https://​matplotlib.org/​gallery/​color/​named_colors.html|named_colors.py]]:​ named colors +
-      * More details about colors and colorbars below, in the [[#​useful_matplotlib_reference_pages|Useful matplotlib reference pages]] section and the [[#​graphics_related_resources|Graphics related resources]] section +
-    - if you don't see a part of what you have plotted, maybe it's hidden behind other elements! Use the [[https://​matplotlib.org/​examples/​pylab_examples/​zorder_demo.html|zorder parameter]] to explicitly **specify the plotting order/​layers/​depth** +
-      * things should automatically work //as expected// if //zorder// is not explicitly specified +
-      * Use the ''​zorder=NN''​ parameter when creating objects. ''​NN''​ is an integer where 0 is the lowest value (the farthest from the eye), and objects are plotted above objects with a lower //zorder// value +
-      * Use ''​matplotlib_object.set_order(NN)''​ to change the order after an object has been created +
-    - you can use **transparency** to partially show what is behind some markers or other objects. Many //artists// accept the ''​alpha''​ parameter where ''​0.0''​ means that the object is completely transparent,​ and ''​1.0''​ means completely opaque\\ e.g. ''​my_plot.scatter(...,​ alpha=0.7)''​ +
-    - sometimes the results of the python/​matplolib commands are displayed immediately,​ sometimes not. It depends if you are in [[http://​matplotlib.org/​faq/​usage_faq.html#​what-is-interactive-mode|interactive or non-interactive]] mode +
-    - if your matplotlib is executed in a batch script, it will generate an error when trying to create (''​show()''​) a plot, because matplotlib expects to be able to display the figure on a screen by default. +
-      * Check how you can [[https://​matplotlib.org/​faq/​howto_faq.html?​highlight=web#​generate-images-without-having-a-window-appear|generate images offline]] +
-    - the documentation may mention [[http://​matplotlib.org/​faq/​usage_faq.html#​what-is-a-backend|backends]]. What?? Basically, you use python commands to create a plot, and the backend is the //thing// that will render your plot on the screen or in a file (png, pdf, etc...) +
-  - Read the [[https://​github.com/​rougier/​matplotlib-tutorial|Matplotlib tutorial by Nicolas Rougier]] +
-  - Download the [[http://​matplotlib.org/​contents.html|pdf version of the manual]]. **Do not print** the 2300+ pages of the manual! Read the beginner'​s guide (Chapter //FIVE// of //Part II//) and have a super quick look at the table of contents of the whole document.+
  
-==== Useful matplotlib reference pages ====+  * [[https://​xoa.readthedocs.io/​en/​latest/​|xoa]]:​ xarray-based ocean analysis library
  
-  ​* Some plot types: +  * [[https://uxarray.readthedocs.io/|uxarray]]: provide xarray styled functionality ​for unstructured grid datasets following ​[[https://ugrid-conventions.github.io/ugrid-conventions/|UGRID Conventions]]
-    ​* [[https://matplotlib.org/​api/​_as_gen/​matplotlib.pyplot.plot.html|plot(...)]]:​ Plot y versus x as lines and/or markers +
-    * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.pyplot.scatter.html|scatter(...)]]: A scatter plot of y vs x with varying marker size and/or color +
-    * The ''​plot''​ function will be faster ​for scatterplots where markers don't vary in size or color +
-    * [[https://matplotlib.org/​api/​_as_gen/​matplotlib.axes.Axes.contourf.html|contour(...) and contourf(...)]]:​ draw contour lines and filled contours +
-  * X and Y axes parameters +
-    * Axis range: ''​my_plot.set_xlim(x_leftmost_value,​ x_rightmost_value)''​ +
-      * Use the leftmost and rightmost values to specify the orientation of the axis (i.e the rightmost value can be smaller than the leftmost) +
-    * Axis label: ''​my_plot.set_xlabel(x_label_string,​ fontsize=axis_label_fontsize)''​ +
-      * Use the extra labelpad parameter to move the label closer (negative value) to the axis or farther (positive value): e.g. ''​my_plot.set_xlabel('​A closer label',​ labelpad=-20''​ +
-    * Major (and minor) tick marks location: ''​my_plot.set_xticks(x_ticks_values,​ minor=False)''​ +
-      * Use an empty list if you don't want tick marks: ''​my_plot.set_xticks([])''​ +
-    * Tick labels (if you don't want the default values): ''​my_plot.set_xticklabels(x_ticks_labels,​ minor=False,​ fontsize=ticklabels_fontsize)''​ +
-      * ''​x_ticks_labels''​ is a list of strings that has the same length as ''​x_ticks_values''​. Use an empty string in the positions where you don't want a label +
-      * Many more options for ticks, labels, orientation,​ ... +
-  * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.lines.Line2D.html|line]] parameters +
-    * ''​linestyle'':​ ''​solid'',​ ''​None'',​ [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.lines.Line2D.html#​matplotlib.lines.Line2D.set_linestyle|other]] ([[https://​matplotlib.org/​examples/​lines_bars_and_markers/​line_styles_reference.html|default styles example]], [[https://​matplotlib.org/​examples/​lines_bars_and_markers/​linestyles.html|custom styles example]]) +
-  * [[https://​matplotlib.org/​api/​markers_api.html|marker types]] +
-    * Default marker size and edge width: +
-      * ''​mpl.rcParams['​lines.markersize'​] %%**%% 2''​ => 36 +
-      * ''​mpl.rcParams['​lines.linewidth'​]''​ => 1.5 +
-    * Other marker attributes. For ''​plot'',​ all the markers have the same attributes, and for ''​scatter''​ the attributes can be the same, or specified for each marker +
-      * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.pyplot.plot.html|plot(...)]]:​ //fmt// (see documentation) or ''​marker''​ and ''​markerfacecolor''/''​mfc''​ (and ''​markerfacecoloralt''/''​mfcalt''​ for dual color markers), ''​markersize'',​ ''​markeredgewidth''/''​mew'',​ ''​markeredgecolor'',​ ''​fillstyle''​ (''​full'',​ ''​None'',​ [[https://​matplotlib.org/​gallery/​lines_bars_and_markers/​marker_fillstyle_reference.html|other]]) +
-      * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.pyplot.scatter.html|scatter(...)]]:​ ''​marker''​ (marker type), ''​c''​ (color), ''​s''​ (size), ''​linewidths''​ (linewidth of the marker edges), ''​edgecolors''​ +
-  * [[https://​matplotlib.org/​api/​colors_api.html|colors]] and colormaps +
-    * [[https://​matplotlib.org/​gallery/​color/​color_demo.html|color demo]] +
-    * [[https://​matplotlib.org/​examples/​color/​named_colors.html|named colors]] +
-    * Reverting the colors: add ''​_r''​ at the end of the colormap name +
-    * Number of colors in the //my_cmap// colormap (usually 256): ''​my_cmap.N''​ +
-      * Accessing the RGB color definition by index, from ''​0''​ to ''​my_cmap.N ​1''​. Note that the index will //saturate// below ''​0''​ and above ''​my_cmap.N - 1''​\\ <​code>>>>​ my_cmap.N +
-256 +
->>>​ my_cmap(-1) # Same as ano_cmap(0) +
-(0.3686274509803922,​ 0.30980392156862746,​ 0.6352941176470588,​ 1.0) +
->>>​ my_cmap(0) +
-(0.3686274509803922,​ 0.30980392156862746,​ 0.6352941176470588,​ 1.0) +
->>>​ my_cmap(1) +
-(0.36186082276047676,​ 0.3185697808535179,​ 0.6394463667820068,​ 1.0) +
->>>​ my_cmap(255) +
-(0.6196078431372549,​ 0.00392156862745098,​ 0.25882352941176473,​ 1.0) +
->>>​ my_cmap(256) # Same as ano_cmap(255) +
-(0.6196078431372549,​ 0.00392156862745098,​ 0.25882352941176473,​ 1.0) +
->>>​ my_cmap(257) # Same as ano_cmap(255) +
-(0.6196078431372549,​ 0.00392156862745098,​ 0.25882352941176473,​ 1.0) +
-</​code>​ +
-    * Special colormap colors +
-      * ''​my_cmap.set_bad(color='​k'​)'':​ color to be used for **masked** values +
-      * ''​my_cmap.set_over(color='​k'​)'':​ color to be used for //high out-of-range values// **if** ''​extend''​ is specified and is //'​both'//​ or  //'​max'//​. Default color is ''​my_cmap(my_cmap.N - 1)''​ +
-      * ''​my_cmap.set_under(color='​k'​)'':​ color to be used for //low out-of-range values// **if** ''​extend''​ is specified and is //'​both'//​ or  //'​min'//​. Default color is ''​my_cmap(0)''​ +
-  * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.figure.Figure.html#​matplotlib.figure.Figure.colorbar|colorbar]] +
-    * [[https://​matplotlib.org/​gallery/​subplots_axes_and_figures/​colorbar_placement.html|Placing colorbars demo]] +
-    * [[https://​matplotlib.org/​gallery/​images_contours_and_fields/​contourf_demo.html|contourf + colorbar demo]] +
-  * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.pyplot.text.html|text(...)]] and [[https://​matplotlib.org/​tutorials/​text/​annotations.html|annotations]] +
-    * Some titles: +
-      * [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.figure.Figure.html#​matplotlib.figure.Figure.suptitle|Figure title]]: ''​my_figure.suptitle('​Figure title',​ x=xloc_in_normalized_coordinates,​ y=yloc_in_normalized_coordinates,​ ...)''​ +
-      * [[https://​matplotlib.org/​api/​axes_api.html#​axis-labels-title-and-legend|Axis Labels, title, and legend]]: ''​my_plot.set_title('​Plot title',​ ...)''​ +
-    * ''​fontsize'':​ size in points, or (better!) string specifying a relative size (''​xx-small'',​ ''​x-small'',​ ''​small'',​ ''​medium'',​ ''​large'',​ ''​x-large'',​ ''​xx-large''​) +
-    * [[https://​matplotlib.org/​api/​text_api.html#​matplotlib.text.Text|all the text properties]] +
-  * [[https://​matplotlib.org/​api/​pyplot_api.html#​matplotlib.pyplot.legend|legend(...)]] ([[https://​matplotlib.org/​examples/​pylab_examples/​legend_demo3.html|legend demo]], [[https://​matplotlib.org/​users/​legend_guide.html|advanced legend guide]]) +
-    * The legend will //show// the lines (or other objects) that were associated with a //label// with the ''​label=''​ keyword when creating/​updating a plot +
-      * If there are some elements of a plot that you do not want to associate with a legend (e.g. there are several lines with the same color and markers, but you want to plot the legend only once), do not specify a ''​label=''​ keyword for these elements, or add a ''​_''​ at the front of the label strings +
-    * The legend is positioned somewhere (that can be specified) **inside** the plot. In order to place a legend **outside** the plot, use the ''​bbox_to_anchor''​ parameter +
-      * the parameters of ''​bbox_to_anchor''​ are in normalized coordinates of the current (sub)plot:​ +
-        * ''​(0,​ 0)''​ is the lower left corner of the plot, and ''​(1,​ 1)''​ the upper right corner +
-        * ''​legend(... bbox_to_anchor=(1.05,​ 1.), loc='​upper left', ...)''​ will put the upper left corner of the legend slightly right (''​(1.05,​ 1.)''​) of the upper right corner (''​(1,​ 1)''​) of the plot +
-      * if the legend is outside of the plot, you have to **explicitly provide enough space for the legend on the page** +
-        * e.g. with [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.pyplot.subplots_adjust.html|subplots_adjust]],​ ''​plt.subplots_adjust(right=0.75)''​ will make all the plots use 75% on the left of the page, and leave 25% on the right for the legend +
-  * The [[https://​matplotlib.org/​api/​_as_gen/​matplotlib.figure.Figure.html|figure(...)]] and the associated methods +
-  * The [[https://​matplotlib.org/​api/​axes_api.html|axes]] and the associated methods +
-  * [[https://​matplotlib.org/​tutorials/​introductory/​customizing.html#​matplotlib-rcparams|matplotlib default config/​settings]] can be queried and updated +
-    * example: the default figure size (inches) is ''​mpl.rcParams['​figure.figsize'​]''​ (''​[6.4,​ 4.8]''​) +
-    * current settings'​ file:  ''​mpl.matplotlib_fname()''​ +
-  * [[https://​matplotlib.org/​api/​animation_api.html|Animations]] ([[https://​matplotlib.org/​gallery/​index.html#​animation|demo]])+
  
-==== Misc Matplotlib tricks ==== 
  
-  * Specifying ​the background color of a plot (e.g. when plotting a masked variable and you don't want the masked areas to be white+==== netCDF4 ==== 
-    * ''#​ make the background dark gray (call this before ​the contourf)''​\\ ''​plt.gca().patch.set_color('​.25'​)''​\\ ''​plt.contourf(d)''​\\ ''​plt.show()''​ + 
-    ​* ​[[https://​stackoverflow.com/​questions/​9797520/masking-part-of-a-contourf-plot-in-matplotlib|trick source]]+[[http://​unidata.github.io/​netcdf4-python/​|netCDF4]] is a Python interface to the netCDF C library 
 + 
 + 
 +==== cdms2 ==== 
 + 
 +<note important>​ 
 +  * ''​cdms2''​ is unfortunately not maintained anymore and is slowly being **phased out in favor of a combination of [[#​xarray|xarray]] and [[https://​xcdat.readthedocs.io/​|xCDAT]]** 
 + 
 +  * ''​cdms2''​ will [[https://​github.com/​CDAT/​cdms/​issues/​449|not be compatible with numpy after numpy 1.23.5]] :-( 
 +</​note>​ 
 + 
 +[[https://​cdms.readthedocs.io/​en/​docstanya/​|cdms2]] can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. ''​cdms2''​ is available in the [[other:​python:​starting#​cdat|CDAT distribution]],​ and can theoretically be installed independently of CDAT (e.g. it will be installed ​when you install [[https://​cmor.llnl.gov/​mydoc_cmor3_conda/​|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data. 
 + 
 +How to get started: 
 +  - read [[http://​www.lsce.ipsl.fr/​Phocea/​file.php?​class=page&​file=5/​pythonCDAT_jyp_2sur2_070306.pdf|JYP's cdms tutorial]], starting at page 54 
 +    - the tutorial is in French (soooorry!
 +    ​- you have to replace //cdms// with **cdms2**, and //MV// with **MV2** (sooorry about that, the tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data) 
 +  - read the [[http://​cdms.readthedocs.io/​en/​docstanya/​index.html|official cdms documentation]] ​(link may change) 
 + 
 +===== Matplotlib ===== 
 + 
 +<note important>​ 
 +The full content of this //​matplotlib//​ section has been moved to\\ [[other:​python:​matplotlib_by_jyp|Working with matplotlib ​(JYP version)]]\\ after becoming too big to manage here 
 + 
 +\\ Note: [[other:​python:​maps_by_jyp|Plotting maps with matplotlib+cartopy]] ​(examples provided by JYP
 +</​note>​ 
 + 
 +Summary: there are lots of python libraries that you can use for plotting, but Matplotlib has become a //de facto// standard 
 + 
 +Where: [[http://​matplotlib.org|Matplotlib web site]] 
 + 
 +Help on //stack overflow//: ​[[https://​stackoverflow.com/​questions/​tagged/​matplotlib|matplotlib help]]
  
 ===== Graphics related resources ===== ===== Graphics related resources =====
Line 344: Line 243:
     * See also: [[https://​www.datacamp.com/​community/​tutorials/​seaborn-python-tutorial|     * See also: [[https://​www.datacamp.com/​community/​tutorials/​seaborn-python-tutorial|
 Python Seaborn Tutorial For Beginners]] Python Seaborn Tutorial For Beginners]]
-  * Working with colors+  ​* Communicating/​displaying/​plotting your data (possibly for people not of your field): 
 +    * [[https://​uxknowledgebase.com/​introduction-to-designing-data-visualizations-part-1-31c056556133|Introduction to Designing Data Visualizations — Part 1]] 
 +    * [[https://​uxknowledgebase.com/​tables-other-charts-data-visualization-part-2-cfc582e4712c|Tables & Other Charts — Data Visualization Part 2]] 
 +    * [[https://​uxknowledgebase.com/​tables-other-charts-data-visualization-part-3-5bfab15ce525|Tables & Other Charts — Data Visualization Part 3]] 
 +  * **IPCC**-related //​stuff//​... 
 +    * [[https://​www.ipcc.ch/​site/​assets/​uploads/​2019/​04/​IPCC-visual-style-guide.pdf|IPCC Visual Style Guide for Authors]] 
 +    * [[https://​wg1.ipcc.ch/​sites/​default/​files/​documents/​ipcc_visual-identity_guidelines.pdf|A new assessment cycle,A new visual identity]] 
 +    * [[https://​link.springer.com/​article/​10.1007/​s10584-019-02537-z|Communication of IPCC visuals: IPCC authors’ views and assessments of visual complexity]] 
 +    * [[https://​www.carbonbrief.org/​guest-post-the-perils-of-counter-intuitive-design-in-ipcc-graphics|The perils of counter-intuitive design in IPCC graphics]] 
 +  ​* Working with **colors** 
 +    * Choosing specific colors: use [[https://​www.w3schools.com/​colors/​colors_names.asp|HTML color names]], the [[https://​www.w3schools.com/​colors/​colors_picker.asp|HTML color picker]], etc... 
 +    * **Do not use the outdated //rainbow// and //jet// colormaps!** 
 +      * [[https://​pjbartlein.github.io/​datagraphics/​index.html|The End of the Rainbow? ​ Color Schemes for Improved Data Graphics]] (Light and Bartlein, EOS 2004, including replies and comments) 
 +      * [[http://​colorspace.r-forge.r-project.org/​articles/​endrainbow.html|Somewhere over the Rainbow]] 
 +      * [[https://​www.nature.com/​articles/​s41467-020-19160-7|The misuse of colour in science communication]]
     * [[https://​matplotlib.org/​users/​colormaps.html|Choosing colormaps]]     * [[https://​matplotlib.org/​users/​colormaps.html|Choosing colormaps]]
-    * [[https://​matplotlib.org/​cmocean/​|Beautiful colormaps for oceanography: ​cmocean]]+    * [[https://​matplotlib.org/​cmocean/​|cmocean: ​Beautiful colormaps for oceanography]] 
 +    * [[https://​jiffyclub.github.io/​palettable/​|Palettable:​ Color palettes for Python]]
     * [[http://​colorbrewer2.org|ColorBrewer 2.0]] is a tool that can help you understand, and experiment with //​sequential//,​ //​diverging//​ and //​qualitative//​ colormaps     * [[http://​colorbrewer2.org|ColorBrewer 2.0]] is a tool that can help you understand, and experiment with //​sequential//,​ //​diverging//​ and //​qualitative//​ colormaps
 +    * The [[http://​hclwizard.org/​|hclwizard]] provides tools for manipulating and assessing colors and palettes based on the underlying ''​colorspace''​ software
 +    * NCL (NCAR Command Language) [[https://​www.ncl.ucar.edu/​Document/​Graphics/​color_table_gallery.shtml|Color table Gallery]]
 +    * JYP's favorite title: [[https://​www.researchgate.net/​publication/​220943662_The_Which_Blair_Project_A_Quick_Visual_Method_for_Evaluating_Perceptual_Color_Maps|The "Which Blair Project":​ A Quick Visual Method for Evaluating Perceptual Color Maps]]
  
  
 ===== Basemap ===== ===== Basemap =====
  
-<note warning>​Basemap is going to be slowly phased out, in favor of [[#​cartopy]]\\ More information in this:+<note warning>​Basemap is going to be slowly phased out, in favor of [[#cartopy_iris|cartopy]]\\ More information in this:
   * [[https://​github.com/​SciTools/​cartopy/​issues/​920|cartopy github issue]]   * [[https://​github.com/​SciTools/​cartopy/​issues/​920|cartopy github issue]]
   * [[https://​github.com/​matplotlib/​basemap/​issues/​267|basemap github issue]]   * [[https://​github.com/​matplotlib/​basemap/​issues/​267|basemap github issue]]
Line 372: Line 289:
 ===== Cartopy + Iris ===== ===== Cartopy + Iris =====
  
-Summary: ​//Cartopy is a Python package for advanced map generation with a simple matplotlib interface// ​and //Iris is a Python package for analysing and visualising ​meteorological and oceanographic ​data sets//+Summary: 
 +  * **Cartopy** is //matplolib-based ​Python package ​designed ​for geospatial data processing in order to produce maps and other geospatial data analyses// 
 +  * **Iris** is //powerful, format-agnostic,​ community-driven ​Python package for analysing and visualising ​Earth science ​data.//
  
-Where: [[http://​scitools.org.uk/​cartopy/​docs/​latest/​|Cartopy]] and [[http://​scitools.org.uk/iris/index.html|Iris]] web sites+Where: [[http://​scitools.org.uk/​cartopy/​docs/​latest/​|Cartopy]] and [[https://scitools-iris.readthedocs.io/en/stable/|Iris]] web sites
  
 Examples: Examples:
-  * [[http://​scitools.org.uk/​cartopy/​docs/​latest/​gallery.html|Gallery on the Cartopy web site]] +  * [[other:python:​maps_by_jyp|Examples provided by JYP]] 
-  * [[http://​scitools.org.uk/​iris/​docs/​latest/​gallery.html|Gallery on the Iris web site]] +  * Official gallery pages: ​[[https://​scitools.org.uk/​cartopy/​docs/​latest/​gallery/index.html|Cartopy]] [[https://scitools-iris.readthedocs.io/en/stable/generated/gallery/|Iris]]
-  * [[http://​scitools.org.uk/iris/docs/latest/examples/index.html|Examples on the Iris web site]]+
  
-Help on //stack overflow//: [[https://​stackoverflow.com/​questions/​tagged/​cartopy|cartopy ​help]]+Help on //stack overflow//: [[https://​stackoverflow.com/​questions/​tagged/​cartopy|Cartopy help]] - [[https://​stackoverflow.com/​questions/​tagged/​python-iris|Iris ​help]]
  
 ===== Maps and projections resources ===== ===== Maps and projections resources =====
Line 397: Line 315:
  
  
-===== 3D resources =====+===== 3D plots resources =====
  
   * [[https://​ipyvolume.readthedocs.io/​en/​latest/​|Ipyvolume]]   * [[https://​ipyvolume.readthedocs.io/​en/​latest/​|Ipyvolume]]
   * [[https://​zulko.wordpress.com/​2012/​09/​29/​animate-your-3d-plots-with-pythons-matplotlib/​|Animate your 3D plots with Python’s Matplotlib]]   * [[https://​zulko.wordpress.com/​2012/​09/​29/​animate-your-3d-plots-with-pythons-matplotlib/​|Animate your 3D plots with Python’s Matplotlib]]
   * [[https://​stackoverflow.com/​questions/​26796997/​how-to-get-vertical-z-axis-in-3d-surface-plot-of-matplotlib|How to get vertical Z axis in 3D surface plot of Matplotlib?​]]   * [[https://​stackoverflow.com/​questions/​26796997/​how-to-get-vertical-z-axis-in-3d-surface-plot-of-matplotlib|How to get vertical Z axis in 3D surface plot of Matplotlib?​]]
 +
 +===== Data analysis =====
 +
 +==== EDA (Exploratory Data Analysis) ? ====
 +
 +<note tip>
 +The //EDA concept// seems to apply to **time series** (and tabular data), which is not exactly the case of full climate model output data</​note>​
 +
 +  * [[https://​www.geeksforgeeks.org/​what-is-exploratory-data-analysis/​|What is Exploratory Data Analysis ?]]
 +    * //The method of studying and exploring record sets to apprehend their predominant traits, discover patterns, locate outliers, and identify relationships between variables. EDA is normally carried out as a preliminary step before undertaking extra formal statistical analyses or modeling.//
 +
 +  * [[https://​medium.com/​codex/​automate-the-exploratory-data-analysis-eda-to-understand-the-data-faster-not-better-2ed6ff230eed|Automate the exploratory data analysis (EDA) to understand the data faster and easier]]: a nice comparison of some Python libraries listed below ([[#​ydata_profiling|YData Profiling]],​ [[#​d-tale|D-Tale]],​ [[#​sweetviz|sweetviz]],​ [[#​autoviz|AutoViz]])
 +
 +  * [[https://​www.geeksforgeeks.org/​exploratory-data-analysis-in-python/​|EDA in Python]]
 +
 +
 +==== Easy to use datasets ====
 +
 +If you need standard datasets for testing, example, demos, ...
 +
 +  * [[https://​docs.xarray.dev/​en/​stable/​generated/​xarray.tutorial.load_dataset.html|Tutorial datasets]] from [[#​xarray|xarray]] (requires internet)
 +    * Example: [[https://​docs.xarray.dev/​en/​stable/​examples/​visualization_gallery.html|Using the 'air temperature'​ dataset]]
 +
 +  * [[https://​scikit-learn.org/​stable/​datasets.html|Toy,​ real-world and generated datasets]] from [[#​scikit-learn]]
 +    * Example: [[https://​lectures.scientific-python.org/​packages/​scikit-learn/​index.html#​a-simple-example-the-iris-dataset|using the '​iris'​ dataset]]
 +
 +  * [[https://​scikit-image.org/​docs/​stable/​api/​skimage.data.html|Test images and datasets]] from [[#​scikit-image]]
 +    * Example: [[https://​lectures.scientific-python.org/​packages/​scikit-image/​index.html#​data-types|Using the '​camera'​ dataset]]
 +
 +  * [[https://​esgf-node.ipsl.upmc.fr/​search/​cmip6-ipsl/​|CMIP6 data]] on ESGF
 +    * Example : ''​orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc'':​
 +      * [[http://​vesg.ipsl.upmc.fr/​thredds/​fileServer/​cmip6/​CMIP/​IPSL/​IPSL-CM6A-LR/​piControl/​r1i1p1f1/​fx/​orog/​gr/​v20200326/​orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc|HTTP]] download link
 +      * [[http://​vesg.ipsl.upmc.fr/​thredds/​dodsC/​cmip6/​CMIP/​IPSL/​IPSL-CM6A-LR/​piControl/​r1i1p1f1/​fx/​orog/​gr/​v20200326/​orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc.dods|OpenDAP]] download link
 +
 +  * [[https://​github.com/​xCDAT/​xcdat/​issues/​277|xCDAT test data GH discussion]]
 +
 +
 +==== Pandas ====
 +
 +Summary: //pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool//
 +
 +Where: [[http://​pandas.pydata.org|Pandas web site]]
 +
 +JYP's comment: pandas is supposed to be quite good for loading, processing and plotting time series, without writing custom code. It is **very convenient for processing tables in xlsx files** (or csv, etc...). You should at least have a quick look at:
 +
 +  * Some //Cheat Sheets//:
 +    - Basics: [[https://​github.com/​fralfaro/​DS-Cheat-Sheets/​blob/​main/​docs/​files/​pandas_cs.pdf|Pandas Basics Cheat Sheet]] (associated with the [[https://​www.datacamp.com/​cheat-sheet/​pandas-cheat-sheet-for-data-science-in-python#​python-for-data-science-cheat-sheet:​-pandas-basics-useth|Pandas basics]] //​datacamp//​ introduction page)
 +    - Intermediate:​ [[https://​github.com/​pandas-dev/​pandas/​blob/​main/​doc/​cheatsheet/​Pandas_Cheat_Sheet.pdf|Data Wrangling with pandas Cheat Sheet]]
 +  * Some tutorials:
 +    * [[http://​pandas.pydata.org/​docs/​user_guide/​10min.html|10 minutes to pandas]]
 +    * The [[https://​lectures.scientific-python.org/​packages/​statistics/​index.html|Statistics in Python]] tutorial that combines Pandas, [[#​statsmodels|statsmodels]] and [[http://​seaborn.pydata.org/​|Seaborn]]
 +    * More [[http://​pandas.pydata.org/​docs/​getting_started/​tutorials.html|Community tutorials]]...
 +
 +
 +==== statsmodels ====
 +
 +[[https://​www.statsmodels.org/​|statsmodels]] is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
 +
 +Note: check the example in the [[https://​lectures.scientific-python.org/​packages/​statistics/​index.html|Statistics in Python]] tutorial
 +
 +
 +==== scikit-learn ====
 +
 +[[http://​scikit-learn.org/​|scikit-learn]] is a Python library for machine learning, and is one of the most widely used tools for supervised and unsupervised machine learning. Scikit–learn provides an easy-to-use,​ consistent interface to a large collection of machine learning models, as well as tools for model evaluation and data preparation
 +
 +Note: check the example in [[https://​lectures.scientific-python.org/​packages/​scikit-learn/​index.html|scikit-learn:​ machine learning in Python]]
 +
 +
 +==== scikit-image ====
 +
 +[[https://​scikit-image.org/​|scikit-image]] is a collection of algorithms for image processing in Python
 +
 +Note: check the example in [[https://​lectures.scientific-python.org/​packages/​scikit-image/​index.html|scikit-image:​ image processing]]
 +
 +
 +==== YData Profiling ====
 +
 +[[https://​docs.profiling.ydata.ai/​|YData Profiling]]:​ a leading package for data profiling, that automates and standardizes the generation of detailed reports, complete with statistics and visualizations.
 +
 +
 +==== D-Tale ====
 +
 +[[https://​github.com/​man-group/​dtale|D-Tale]] brings you an easy way to view & analyze Pandas data structures. It integrates seamlessly with ipython notebooks & python/​ipython terminals.
 +
 +
 +==== Sweetviz ====
 +
 +[[https://​github.com/​fbdesignpro/​sweetviz|Sweetviz]] is pandas based Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code.
 +
 +
 +==== AutoViz ====
 +
 +[[https://​github.com/​AutoViML/​AutoViz|AutoViz]]:​ the One-Line Automatic Data Visualization Library. Automatically Visualize any dataset, any size with a single line of code
 +
  
 =====  Data file formats =====  =====  Data file formats ===== 
  
-We list here some resources about non-NetCDF data formats that can be useful+  * We list below some resources about **non-NetCDF data formats** that can be useful 
 + 
 +  * Check the [[#​using_netcdf_files_with_python|Using NetCDF files with Python]] section otherwise 
 + 
 +==== The shelve package ==== 
 + 
 +The [[https://​docs.python.org/​3/​library/​shelve.html|built-in shelve package]], can be easily used for storing data (python objects like lists, dictionaries,​ numpy arrays that are not too big, ...) on disk and retrieving them later 
 + 
 +Use case: 
 +  - Use a script do to the heavy data pre-processing and store the (intermediate) results in a file using ''​shelve'',​ or update the results 
 +  - Use another script for plotting the results stored with ''​shelve''​. This way you don't have to wait for the pre-processing step to finish each time you want to improve your plot(s)
  
 +Warning:
 +  * read the [[https://​docs.python.org/​3/​library/​shelve.html|documentation]] and the example carefully (it's quite small)
 +    * if you get the impression that the data is not saved correctly, re-read the parts about updating correctly the content of the shelve file
 +    * you should be able to store most python objects in a shelve file, but it is safer to make tests
 +  * do not forget to close the output file
 +  * if you are dealing with big arrays and want to avoid performance issues, you should use netCDF files for storing the intermediate results
 ==== json files ==== ==== json files ====
  
Line 413: Line 441:
 //json// files look basically like a **list of (nested) python dictionaries** that would have been dumped to a text file //json// files look basically like a **list of (nested) python dictionaries** that would have been dumped to a text file
  
-  * [[https://​docs.python.org/​2/​library/​json.html|json module]] documentation+  * [[https://​docs.python.org/​3/​library/​json.html|json module]] documentation
   * [[https://​realpython.com/​python-json/​|Working With JSON Data in Python]] tutorial   * [[https://​realpython.com/​python-json/​|Working With JSON Data in Python]] tutorial
   * example script: ''/​home/​users/​jypeter/​CDAT/​Progs/​Devel/​beaugendre/​nc2json.py''​   * example script: ''/​home/​users/​jypeter/​CDAT/​Progs/​Devel/​beaugendre/​nc2json.py''​
Line 433: Line 461:
   * [[https://​github.com/​LibraryOfCongress/​bagger|Bagger]] (BagIt GUI)   * [[https://​github.com/​LibraryOfCongress/​bagger|Bagger]] (BagIt GUI)
   * [[https://​github.com/​LibraryOfCongress/​bagit-python|bagit-python]]   * [[https://​github.com/​LibraryOfCongress/​bagit-python|bagit-python]]
-===== Pandas ===== 
  
-Summary: //pandas is a library providing high-performance,​ easy-to-use data structures and data analysis tools//+==== Protocol Buffers ====
  
-Where: [[http://pandas.pydata.org|Pandas web site]]+//Protocol Buffers are (Google'​s) language-neutral,​ platform-neutral extensible mechanisms for serializing structured data//
  
-JYP's comment: pandas is supposed to be quite good for loading, processing and plotting time series, without writing custom code. It is **very convenient for processing tables in xlsx files** (or csv, etc...). You should at least have a quick look at: +  ​* https://protobuf.dev/ 
- +  * [[https://protobuf.dev/getting-started/pythontutorial/|Protocol Buffer Basics: Python]] 
-  * Some //Cheat Sheets// (in the following order): +    ''​mamba install protobuf''​
-    - Basics: [[http://​datacamp-community-prod.s3.amazonaws.com/​dbed353d-2757-4617-8206-8767ab379ab3|Pandas basics]] (associated with the [[https://www.datacamp.com/​community/​blog/​python-pandas-cheat-sheet|Pandas Cheat Sheet for Data Science in Python]] pandas introduction page) +
-    - Intermediate:​ [[https://​github.com/​pandas-dev/pandas/​tree/​master/​doc/​cheatsheet|github Pandas doc page]] +
-    - Advanced: the cheat sheet on the [[https://​www.enthought.com/​services/​training/​pandas-mastery-workshop/​|Enthought workshops advertising page]] +
-  ​* Some tutorials:​ +
-    ​* [[https://www.datacamp.com/community/​blog/​python-pandas-cheat-sheet|Pandas Cheat Sheet for Data Science in Python]] pandas introduction page +
-    * The [[http://www.scipy-lectures.org/​packages/​statistics/​index.html|Statistics in Python]] tutorial that combines Pandas, [[http://​statsmodels.sourceforge.net/​|Statsmodels]] and [[http://​seaborn.pydata.org/​|Seaborn]] +
- +
-===== Scipy Lecture Notes ===== +
- +
-Summary: //One document to learn numerics, science, and data with Python// +
- +
-Where: [[http://​www.scipy-lectures.org/​_downloads/​ScipyLectures-simple.pdf|pdf]] - [[http://​www.scipy-lectures.org/​|html]] +
- +
-This is **a really nice and useful document** that is regularly updated and used for the [[https://​www.euroscipy.org/​|EuroScipy]] tutorials. You will learn more things about python, numpy and matplotlib, debugging and optimizing scripts, and also learn about using python for statistics, image processing, machine learning, washing dishes (this is just to check if you have read this page), etc...+
  
 ===== Quick Reference and cheat sheets ===== ===== Quick Reference and cheat sheets =====
Line 465: Line 478:
  
   * [[https://​www.cheatography.com/​weidadeyue/​cheat-sheets/​jupyter-notebook/​pdf_bw/​|Jupyter Notebook Keyboard Shortcuts]]   * [[https://​www.cheatography.com/​weidadeyue/​cheat-sheets/​jupyter-notebook/​pdf_bw/​|Jupyter Notebook Keyboard Shortcuts]]
 +
 +===== Miscellaneous Python stuff =====
 +
 +Check the page about [[other:​python:​misc_by_jyp|useful python stuff that has not been sorted yet]]
  
 ===== Misc tutorials ===== ===== Misc tutorials =====
Line 496: Line 513:
  
 Depending on the distribution,​ the editor and the programming environment you use, you may have access to a graphical version of the debugger. UV-CDAT users can use ''​pydebug my_script.py''​ Depending on the distribution,​ the editor and the programming environment you use, you may have access to a graphical version of the debugger. UV-CDAT users can use ''​pydebug my_script.py''​
 +
 +===== jupyter and notebook stuff =====
 +
 +FIXME Misc notes, resources and links to organize later
 +
 +  * [[https://​beta.jupyterbook.org/​|jupyter {book}]]: Jupyter Book is an open source project for building beautiful, publication-quality books and documents from computational material.
  
 ===== Using a Python IDE ===== ===== Using a Python IDE =====
Line 538: Line 561:
 ===== Python 2.7 vs Python 3 ===== ===== Python 2.7 vs Python 3 =====
  
-The official [[https://​docs.python.org/​2.7/​howto/​pyporting.html|Porting Python 2 Code to Python 3]] page gives the required information to make the transition from python 2 to python ​3. It is still safe to use Python 2.7, so there is no rush to change to Python ​3.+It is still safe to use Python 2.7, but **you should consider upgrading to Python 3**, unless some key modules you need are not compatible (yet) with Python 3 
 + 
 +You should start writing code that will, when possible, work both in Python 2 and Python 3 
 + 
 +Some interesting reading: 
 + 
 +  * [[https://​docs.python.org/​3/​whatsnew/​3.0.html|What’s New In Python 3.0]].\\ Examples: 
 +    * ''​print''​ is now a function. Use ''​print('​Hello'​)''​ 
 +    * You cannot test a difference with ''<>''​ any longer! Use ''​!=''​ 
 + 
 +  * The official [[https://​docs.python.org/​2.7/​howto/​pyporting.html|Porting Python 2 Code to Python 3]] page gives the required information to make the transition from python 2 to python 3. 
  
 ===== What now? ===== ===== What now? =====
  
 You can do a lot more with python! But if you have read at least a part of this page, you should be able to find and use the modules you need. Make sure you do not reinvent the wheel! Use existing packages when possible, and make sure to report bugs or errors in the documentations when you find some You can do a lot more with python! But if you have read at least a part of this page, you should be able to find and use the modules you need. Make sure you do not reinvent the wheel! Use existing packages when possible, and make sure to report bugs or errors in the documentations when you find some
 +
 +
 +===== Out-of-date stuff =====
 +
 +
 +==== CDAT-related resources ====
 +
 +Some links, in case they can't be found easily on the [[https://​cdat.llnl.gov|CDAT]] web site...
 +
 +  * [[https://​cdat.llnl.gov/​tutorials.html|Tutorials in ipython notebooks]]
 +  * [[http://​cdat-vcs.readthedocs.io/​en/​latest/​|VCS:​ Visualization Control System]]
 +    * [[https://​github.com/​CDAT/​vcs/​issues/​238|Colormaps in vcs examples]]
 +  * [[https://​github.com/​CDAT/​cdat-site/​blob/​master/​eztemplate.md|EzTemplate Documentation]]
 +
  
 /* standard page footer */ /* standard page footer */
other/python/jyp_steps.1567157785.txt.gz · Last modified: 2019/08/30 09:36 by jypeter