Differences

This shows you the differences between two versions of the page.

--- other:python:jyp_steps [2023/12/15 15:16] – Added the EDA section jypeter
+++ other:python:jyp_steps [2023/12/15 16:40] – [Using NetCDF files with Python] Rewrote the beginning of the section jypeter
@@ Line 170: / Line 170: @@
 </note>
-  * There is a good chance that your input array data will be stored in a  [[other:newppl:starting#netcdf_and_related_conventions|NetCDF]] file.
-  * There may be different ways of dealing with NetCDF files, depending on which [[other:python:starting#some_python_distributions|python distribution]] you have access to
+==== What is NetCDF? ====
-==== cdms2 ====
+  * If you are working with climate model output data, there is a good chance that your input array data will be stored in a NetCDF file!
-Summary: cdms2 can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. cdms2 is available in the [[other:python:starting#cdat|CDAT distribution]], and can theoretically be installed independently of CDAT (e.g. it will be installed when you install [[https://cmor.llnl.gov/mydoc_cmor3_conda/|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data.
+  * Read the [[other:newppl:starting#netcdf_and_related_conventions|NetCDF and related Conventions]] for more information
-How to get started:
-  - read [[http://www.lsce.ipsl.fr/Phocea/file.php?class=page&file=5/pythonCDAT_jyp_2sur2_070306.pdf|JYP's cdms tutorial]], starting at page 54
-    - the tutorial is in French (soooorry!)
-    - you have to replace //cdms// with **cdms2**, and //MV// with **MV2** (sooorry about that, the tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data)
-  - read the [[http://cdms.readthedocs.io/en/docstanya/index.html|official cdms documentation]] (link may change)
+  * There may be different ways of dealing with NetCDF files, depending on which [[other:python:starting#some_python_distributions|python distribution]] you have access to
 ==== xarray ====
@@ Line 194: / Line 188: @@
   * [[https://docs.xarray.dev/en/stable/generated/xarray.tutorial.load_dataset.html|xarray test datasets]]
-  * [[https://xcdat.readthedocs.io/|xcdat]]: xarray extended with Climate Data Analysis Tools
+  * **[[https://xcdat.readthedocs.io/|xCDAT]]: ''xarray'' extended with Climate Data Analysis Tools**
   * [[https://xoa.readthedocs.io/en/latest/|xoa]]: xarray-based ocean analysis library
@@ Line 200: / Line 194: @@
   * [[https://uxarray.readthedocs.io/|uxarray]]: provide xarray styled functionality for unstructured grid datasets following [[https://ugrid-conventions.github.io/ugrid-conventions/|UGRID Conventions]]
+==== cdms2 ====
+<note important>''cdms2'' is unfortunately not maintained anymore and is slowly being **phased out in favor of a combination of [[#xarray|xarray]] and [[https://xcdat.readthedocs.io/|xCDAT]]**</note>
+Summary: cdms2 can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. cdms2 is available in the [[other:python:starting#cdat|CDAT distribution]], and can theoretically be installed independently of CDAT (e.g. it will be installed when you install [[https://cmor.llnl.gov/mydoc_cmor3_conda/|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data.
+How to get started:
+  - read [[http://www.lsce.ipsl.fr/Phocea/file.php?class=page&file=5/pythonCDAT_jyp_2sur2_070306.pdf|JYP's cdms tutorial]], starting at page 54
+    - the tutorial is in French (soooorry!)
+    - you have to replace //cdms// with **cdms2**, and //MV// with **MV2** (sooorry about that, the tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data)
+  - read the [[http://cdms.readthedocs.io/en/docstanya/index.html|official cdms documentation]] (link may change)
@@ Line 326: / Line 332: @@
     * //The method of studying and exploring record sets to apprehend their predominant traits, discover patterns, locate outliers, and identify relationships between variables. EDA is normally carried out as a preliminary step before undertaking extra formal statistical analyses or modeling.//
-  * [[https://medium.com/codex/automate-the-exploratory-data-analysis-eda-to-understand-the-data-faster-not-better-2ed6ff230eed|Automate the exploratory data analysis (EDA) to understand the data faster and easier]]: a nice comparison of some Python libraries listed below (''ydata-profiling'', ''D-Tale'', ''sweetviz'', ''autoviz'')
+  * [[https://medium.com/codex/automate-the-exploratory-data-analysis-eda-to-understand-the-data-faster-not-better-2ed6ff230eed|Automate the exploratory data analysis (EDA) to understand the data faster and easier]]: a nice comparison of some Python libraries listed below ([[#ydata_profiling|YData Profiling]], [[#d-tale|D-Tale]], [[#sweetviz|sweetviz]], [[#autoviz|AutoViz]])
   * [[https://www.geeksforgeeks.org/exploratory-data-analysis-in-python/|EDA in Python]]
 ==== Easy to use datasets ====
@@ Line 348: / Line 356: @@
   * [[https://github.com/xCDAT/xcdat/issues/277|xCDAT test data GH discussion]]
 ==== Pandas ====
@@ Line 370: / Line 380: @@
 Note: check the example in the [[https://lectures.scientific-python.org/packages/statistics/index.html|Statistics in Python]] tutorial
+==== scikit-learn ====
+[[http://scikit-learn.org/|scikit-learn]] is a Python library for machine learning, and is one of the most widely used tools for supervised and unsupervised machine learning. Scikit–learn provides an easy-to-use, consistent interface to a large collection of machine learning models, as well as tools for model evaluation and data preparation
+Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-learn/index.html|scikit-learn: machine learning in Python]]
+==== scikit-image ====
+[[https://scikit-image.org/|scikit-image]] is a collection of algorithms for image processing in Python
+Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-image/index.html|scikit-image: image processing]]
@@ Line 390: / Line 414: @@
 [[https://github.com/AutoViML/AutoViz|AutoViz]]: the One-Line Automatic Data Visualization Library. Automatically Visualize any dataset, any size with a single line of code
-==== scikit-learn ====
-[[http://scikit-learn.org/|scikit-learn]] is a Python library for machine learning, and is one of the most widely used tools for supervised and unsupervised machine learning. Scikit–learn provides an easy-to-use, consistent interface to a large collection of machine learning models, as well as tools for model evaluation and data preparation
-Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-learn/index.html|scikit-learn: machine learning in Python]]
+=====  Data file formats =====
-==== scikit-image ====
-[[https://scikit-image.org/|scikit-image]] is a collection of algorithms for image processing in Python
+  * We list below some resources about **non-NetCDF data formats** that can be useful
-Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-image/index.html|scikit-image: image processing]]
-=====  Data file formats =====
-We list here some resources about non-NetCDF data formats that can be useful
+  * Check the [[#using_netcdf_files_with_python|Using NetCDF files with Python]] section otherwise
 ==== The shelve package ====