Differences

This shows you the differences between two versions of the page.

--- other:python:jyp_steps [2023/12/14 09:42]
jypeter Moved packages to the new Data analysis section, and updated a bit
+++ other:python:jyp_steps [2023/12/15 15:56]
jypeter Reorganized the NetCDF section
@@ Line 165: / Line 165: @@
 ===== Using NetCDF files with Python =====
-<note tip>People using CMIPn and model data on the IPSL servers can easily search and process NetCDF files using:
-  * the [[https://climaf.readthedocs.io/|Climate Model Assessment Framework (CliMAF)]] environment
-  * and the [[https://github.com/jservonnat/C-ESM-EP/wiki|CliMAF Earth System Evaluation Platform (C-ESM-EP)]]
-</note>
-  * There is a good chance that your input array data will be stored in a  [[other:newppl:starting#netcdf_and_related_conventions|NetCDF]] file.
+==== What is NetCDF? ====
+  * If you are working with climate model output data, there is a good chance that your input array data will be stored in a NetCDF file!
+  * Read the [[other:newppl:starting#netcdf_and_related_conventions|NetCDF and related Conventions]] for more information
   * There may be different ways of dealing with NetCDF files, depending on which [[other:python:starting#some_python_distributions|python distribution]] you have access to
-==== cdms2 ====
-Summary: cdms2 can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. cdms2 is available in the [[other:python:starting#cdat|CDAT distribution]], and can theoretically be installed independently of CDAT (e.g. it will be installed when you install [[https://cmor.llnl.gov/mydoc_cmor3_conda/|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data.
+==== CliMAF and C-ESM-EP ====
+People using **//CMIPn// and model data on the IPSL servers** can easily search and process NetCDF files using:
+  * the [[https://climaf.readthedocs.io/|Climate Model Assessment Framework (CliMAF)]] environment
+  * and the [[https://github.com/jservonnat/C-ESM-EP/wiki|CliMAF Earth System Evaluation Platform (C-ESM-EP)]]
-How to get started:
-  - read [[http://www.lsce.ipsl.fr/Phocea/file.php?class=page&file=5/pythonCDAT_jyp_2sur2_070306.pdf|JYP's cdms tutorial]], starting at page 54
-    - the tutorial is in French (soooorry!)
-    - you have to replace //cdms// with **cdms2**, and //MV// with **MV2** (sooorry about that, the tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data)
-  - read the [[http://cdms.readthedocs.io/en/docstanya/index.html|official cdms documentation]] (link may change)
 ==== xarray ====
-Summary: [[http://xarray.pydata.org/en/stable/|xarray]] is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun! [...] It is particularly tailored to working with netCDF files
+[[https://docs.xarray.dev/|xarray]] makes working with labelled multi-dimensional arrays in Python simple, efficient, and fun! [...] It is particularly tailored to working with netCDF files
 === Some xarray related resources ===
-Note: more packages (than listed below) may be listed in the [[other:uvcdat:cdat_conda:cdat_8_2_1#extra_packages_list|Extra packages list]]
+Note: more packages (than listed below) may be listed in the [[other:uvcdat:cdat_conda:cdat_8_2_1#extra_packages_list|Extra packages list]] page
-  * [[https://xcdat.readthedocs.io/|xcdat]]: xarray extended with Climate Data Analysis Tools
+  * [[https://docs.xarray.dev/en/stable/generated/xarray.tutorial.load_dataset.html|xarray test datasets]]
+  * **[[https://xcdat.readthedocs.io/|xCDAT]]: ''xarray'' extended with Climate Data Analysis Tools**
   * [[https://xoa.readthedocs.io/en/latest/|xoa]]: xarray-based ocean analysis library
   * [[https://uxarray.readthedocs.io/|uxarray]]: provide xarray styled functionality for unstructured grid datasets following [[https://ugrid-conventions.github.io/ugrid-conventions/|UGRID Conventions]]
 ==== netCDF4 ====
-Summary: //netCDF4 can read/write netCDF files and is available in most python distributions//
+[[http://unidata.github.io/netcdf4-python/|netCDF4]] is a Python interface to the netCDF C library
+==== cdms2 ====
+<note important>
+  * ''cdms2'' is unfortunately not maintained anymore and is slowly being **phased out in favor of a combination of [[#xarray|xarray]] and [[https://xcdat.readthedocs.io/|xCDAT]]**
+  * ''cdms2'' will [[https://github.com/CDAT/cdms/issues/449|not be compatible with numpy after numpy 1.23.5]] :-(
+</note>
+[[https://cdms.readthedocs.io/en/docstanya/|cdms2]] can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. ''cdms2'' is available in the [[other:python:starting#cdat|CDAT distribution]], and can theoretically be installed independently of CDAT (e.g. it will be installed when you install [[https://cmor.llnl.gov/mydoc_cmor3_conda/|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data.
+How to get started:
+  - read [[http://www.lsce.ipsl.fr/Phocea/file.php?class=page&file=5/pythonCDAT_jyp_2sur2_070306.pdf|JYP's cdms tutorial]], starting at page 54
+    - the tutorial is in French (soooorry!)
+    - you have to replace //cdms// with **cdms2**, and //MV// with **MV2** (sooorry about that, the tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data)
+  - read the [[http://cdms.readthedocs.io/en/docstanya/index.html|official cdms documentation]] (link may change)
-Where: [[http://unidata.github.io/netcdf4-python/]]
 ===== CDAT-related resources =====
@@ Line 315: / Line 332: @@
 ===== Data analysis =====
+==== EDA (Exploratory Data Analysis) ? ====
+<note tip>
+The //EDA concept// seems to apply to **time series** (and tabular data), which is not exactly the case of full climate model output data</note>
+  * [[https://www.geeksforgeeks.org/what-is-exploratory-data-analysis/|What is Exploratory Data Analysis ?]]
+    * //The method of studying and exploring record sets to apprehend their predominant traits, discover patterns, locate outliers, and identify relationships between variables. EDA is normally carried out as a preliminary step before undertaking extra formal statistical analyses or modeling.//
+  * [[https://medium.com/codex/automate-the-exploratory-data-analysis-eda-to-understand-the-data-faster-not-better-2ed6ff230eed|Automate the exploratory data analysis (EDA) to understand the data faster and easier]]: a nice comparison of some Python libraries listed below ([[#ydata_profiling|YData Profiling]], [[#d-tale|D-Tale]], [[#sweetviz|sweetviz]], [[#autoviz|AutoViz]])
+  * [[https://www.geeksforgeeks.org/exploratory-data-analysis-in-python/|EDA in Python]]
+==== Easy to use datasets ====
+If you need standard datasets for testing, example, demos, ...
+  * [[https://docs.xarray.dev/en/stable/generated/xarray.tutorial.load_dataset.html|Tutorial datasets]] from [[#xarray|xarray]] (requires internet)
+    * Example: [[https://docs.xarray.dev/en/stable/examples/visualization_gallery.html|Using the 'air temperature' dataset]]
+  * [[https://scikit-learn.org/stable/datasets.html|Toy, real-world and generated datasets]] from [[#scikit-learn]]
+    * Example: [[https://lectures.scientific-python.org/packages/scikit-learn/index.html#a-simple-example-the-iris-dataset|using the 'iris' dataset]]
+  * [[https://scikit-image.org/docs/stable/api/skimage.data.html|Test images and datasets]] from [[#scikit-image]]
+    * Example: [[https://lectures.scientific-python.org/packages/scikit-image/index.html#data-types|Using the 'camera' dataset]]
+  * [[https://esgf-node.ipsl.upmc.fr/search/cmip6-ipsl/|CMIP6 data]] on ESGF
+    * Example : ''orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc'':
+      * [[http://vesg.ipsl.upmc.fr/thredds/fileServer/cmip6/CMIP/IPSL/IPSL-CM6A-LR/piControl/r1i1p1f1/fx/orog/gr/v20200326/orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc|HTTP]] download link
+      * [[http://vesg.ipsl.upmc.fr/thredds/dodsC/cmip6/CMIP/IPSL/IPSL-CM6A-LR/piControl/r1i1p1f1/fx/orog/gr/v20200326/orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc.dods|OpenDAP]] download link
+  * [[https://github.com/xCDAT/xcdat/issues/277|xCDAT test data GH discussion]]
@@ Line 325: / Line 375: @@
 JYP's comment: pandas is supposed to be quite good for loading, processing and plotting time series, without writing custom code. It is **very convenient for processing tables in xlsx files** (or csv, etc...). You should at least have a quick look at:
-  * Some //Cheat Sheets// (in the following order):
+  * Some //Cheat Sheets//:
-    - Basics: [[https://www.datacamp.com/cheat-sheet/pandas-cheat-sheet-for-data-science-in-python#python-for-data-science-cheat-sheet:-pandas-basics-useth|Pandas basics]] (associated with the [[https://www.datacamp.com/cheat-sheet/pandas-cheat-sheet-for-data-science-in-python|Pandas Cheat Sheet for Data Science in Python]] pandas introduction page)
+    - Basics: [[https://github.com/fralfaro/DS-Cheat-Sheets/blob/main/docs/files/pandas_cs.pdf|Pandas Basics Cheat Sheet]] (associated with the [[https://www.datacamp.com/cheat-sheet/pandas-cheat-sheet-for-data-science-in-python#python-for-data-science-cheat-sheet:-pandas-basics-useth|Pandas basics]] //datacamp// introduction page)
-    - Intermediate: [[https://github.com/pandas-dev/pandas/tree/master/doc/cheatsheet|github Pandas doc page]]
+    - Intermediate: [[https://github.com/pandas-dev/pandas/blob/main/doc/cheatsheet/Pandas_Cheat_Sheet.pdf|Data Wrangling with pandas Cheat Sheet]]
-    - Advanced: the cheat sheet on the [[https://www.enthought.com/services/training/pandas-mastery-workshop/|Enthought workshops advertising page]]
   * Some tutorials:
-    * [[https://www.datacamp.com/community/blog/python-pandas-cheat-sheet|Pandas Cheat Sheet for Data Science in Python]] pandas introduction page
+    * [[http://pandas.pydata.org/docs/user_guide/10min.html|10 minutes to pandas]]
-    * The [[https://lectures.scientific-python.org/packages/statistics/index.html|Statistics in Python]] tutorial that combines Pandas, [[http://statsmodels.sourceforge.net/|Statsmodels]] and [[http://seaborn.pydata.org/|Seaborn]]
+    * The [[https://lectures.scientific-python.org/packages/statistics/index.html|Statistics in Python]] tutorial that combines Pandas, [[#statsmodels|statsmodels]] and [[http://seaborn.pydata.org/|Seaborn]]
+    * More [[http://pandas.pydata.org/docs/getting_started/tutorials.html|Community tutorials]]...
 ==== statsmodels ====
-[[https://www.statsmodels.org/|statsmodels ]] is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
+[[https://www.statsmodels.org/|statsmodels]] is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
+Note: check the example in the [[https://lectures.scientific-python.org/packages/statistics/index.html|Statistics in Python]] tutorial
 ==== scikit-learn ====
-[[http://scikit-learn.org/|scikit-learn]] is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.
+[[http://scikit-learn.org/|scikit-learn]] is a Python library for machine learning, and is one of the most widely used tools for supervised and unsupervised machine learning. Scikit–learn provides an easy-to-use, consistent interface to a large collection of machine learning models, as well as tools for model evaluation and data preparation
+Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-learn/index.html|scikit-learn: machine learning in Python]]
@@ Line 346: / Line 401: @@
 [[https://scikit-image.org/|scikit-image]] is a collection of algorithms for image processing in Python
+Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-image/index.html|scikit-image: image processing]]
+==== YData Profiling ====
+[[https://docs.profiling.ydata.ai/|YData Profiling]]: a leading package for data profiling, that automates and standardizes the generation of detailed reports, complete with statistics and visualizations.
+==== D-Tale ====
+[[https://github.com/man-group/dtale|D-Tale]] brings you an easy way to view & analyze Pandas data structures. It integrates seamlessly with ipython notebooks & python/ipython terminals.
+==== Sweetviz ====
+[[https://github.com/fbdesignpro/sweetviz|Sweetviz]] is pandas based Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code.
+==== AutoViz ====
+[[https://github.com/AutoViML/AutoViz|AutoViz]]: the One-Line Automatic Data Visualization Library. Automatically Visualize any dataset, any size with a single line of code
 =====  Data file formats =====
-We list here some resources about non-NetCDF data formats that can be useful
+  * We list below some resources about **non-NetCDF data formats** that can be useful
+  * Check the [[#using_netcdf_files_with_python|Using NetCDF files with Python]] section otherwise
 ==== The shelve package ====

PMIP3 wiki

User Tools

Site Tools

Differences

Page Tools