User Tools

Site Tools


other:python:jyp_steps

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
other:python:jyp_steps [2023/12/15 14:16]
jypeter Added the EDA section
other:python:jyp_steps [2023/12/15 15:40]
jypeter [Using NetCDF files with Python] Rewrote the beginning of the section
Line 170: Line 170:
 </​note>​ </​note>​
  
-  * There is a good chance that your input array data will be stored in a  [[other:​newppl:​starting#​netcdf_and_related_conventions|NetCDF]] file. 
  
-  * There may be different ways of dealing with NetCDF ​files, depending on which [[other:​python:​starting#​some_python_distributions|python distribution]] you have access to+==== What is NetCDF? ====
  
-==== cdms2 ====+  * If you are working with climate model output data, there is a good chance that your input array data will be stored in a NetCDF file!
  
-Summary: cdms2 can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. cdms2 is available in the [[other:python:starting#cdat|CDAT distribution]], ​and can theoretically be installed independently of CDAT (e.g. it will be installed when you install [[https://​cmor.llnl.gov/​mydoc_cmor3_conda/​|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful ​for handling time axis data. +  * Read the [[other:newppl:starting#netcdf_and_related_conventions|NetCDF ​and related Conventions]] for more information
- +
-How to get started: +
-  - read [[http://​www.lsce.ipsl.fr/​Phocea/​file.php?​class=page&​file=5/​pythonCDAT_jyp_2sur2_070306.pdf|JYP'​s cdms tutorial]], starting at page 54 +
-    - the tutorial is in French (soooorry!) +
-    - you have to replace //cdms// with **cdms2**, and //MV// with **MV2** (sooorry about that, the tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data) +
-  - read the [[http://​cdms.readthedocs.io/​en/​docstanya/​index.html|official cdms documentation]] (link may change)+
  
 +  * There may be different ways of dealing with NetCDF files, depending on which [[other:​python:​starting#​some_python_distributions|python distribution]] you have access to
 ==== xarray ==== ==== xarray ====
  
Line 194: Line 188:
   * [[https://​docs.xarray.dev/​en/​stable/​generated/​xarray.tutorial.load_dataset.html|xarray test datasets]]   * [[https://​docs.xarray.dev/​en/​stable/​generated/​xarray.tutorial.load_dataset.html|xarray test datasets]]
  
-  * [[https://​xcdat.readthedocs.io/​|xcdat]]: xarray extended with Climate Data Analysis Tools+  ​* **[[https://​xcdat.readthedocs.io/​|xCDAT]]: ''​xarray'' ​extended with Climate Data Analysis Tools**
  
   * [[https://​xoa.readthedocs.io/​en/​latest/​|xoa]]:​ xarray-based ocean analysis library   * [[https://​xoa.readthedocs.io/​en/​latest/​|xoa]]:​ xarray-based ocean analysis library
Line 200: Line 194:
   * [[https://​uxarray.readthedocs.io/​|uxarray]]:​ provide xarray styled functionality for unstructured grid datasets following [[https://​ugrid-conventions.github.io/​ugrid-conventions/​|UGRID Conventions]]   * [[https://​uxarray.readthedocs.io/​|uxarray]]:​ provide xarray styled functionality for unstructured grid datasets following [[https://​ugrid-conventions.github.io/​ugrid-conventions/​|UGRID Conventions]]
  
 +
 +==== cdms2 ====
 +
 +<note important>''​cdms2''​ is unfortunately not maintained anymore and is slowly being **phased out in favor of a combination of [[#​xarray|xarray]] and [[https://​xcdat.readthedocs.io/​|xCDAT]]**</​note>​
 +
 +Summary: cdms2 can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. cdms2 is available in the [[other:​python:​starting#​cdat|CDAT distribution]],​ and can theoretically be installed independently of CDAT (e.g. it will be installed when you install [[https://​cmor.llnl.gov/​mydoc_cmor3_conda/​|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data.
 +
 +How to get started:
 +  - read [[http://​www.lsce.ipsl.fr/​Phocea/​file.php?​class=page&​file=5/​pythonCDAT_jyp_2sur2_070306.pdf|JYP'​s cdms tutorial]], starting at page 54
 +    - the tutorial is in French (soooorry!)
 +    - you have to replace //cdms// with **cdms2**, and //MV// with **MV2** (sooorry about that, the tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data)
 +  - read the [[http://​cdms.readthedocs.io/​en/​docstanya/​index.html|official cdms documentation]] (link may change)
  
  
Line 326: Line 332:
     * //The method of studying and exploring record sets to apprehend their predominant traits, discover patterns, locate outliers, and identify relationships between variables. EDA is normally carried out as a preliminary step before undertaking extra formal statistical analyses or modeling.//     * //The method of studying and exploring record sets to apprehend their predominant traits, discover patterns, locate outliers, and identify relationships between variables. EDA is normally carried out as a preliminary step before undertaking extra formal statistical analyses or modeling.//
  
-  * [[https://​medium.com/​codex/​automate-the-exploratory-data-analysis-eda-to-understand-the-data-faster-not-better-2ed6ff230eed|Automate the exploratory data analysis (EDA) to understand the data faster and easier]]: a nice comparison of some Python libraries listed below (''​ydata-profiling''​''​D-Tale''​''​sweetviz''​''​autoviz''​)+  * [[https://​medium.com/​codex/​automate-the-exploratory-data-analysis-eda-to-understand-the-data-faster-not-better-2ed6ff230eed|Automate the exploratory data analysis (EDA) to understand the data faster and easier]]: a nice comparison of some Python libraries listed below ([[#​ydata_profiling|YData Profiling]][[#d-tale|D-Tale]][[#​sweetviz|sweetviz]][[#autoviz|AutoViz]])
  
   * [[https://​www.geeksforgeeks.org/​exploratory-data-analysis-in-python/​|EDA in Python]]   * [[https://​www.geeksforgeeks.org/​exploratory-data-analysis-in-python/​|EDA in Python]]
 +
 +
 ==== Easy to use datasets ==== ==== Easy to use datasets ====
  
Line 348: Line 356:
  
   * [[https://​github.com/​xCDAT/​xcdat/​issues/​277|xCDAT test data GH discussion]]   * [[https://​github.com/​xCDAT/​xcdat/​issues/​277|xCDAT test data GH discussion]]
 +
 +
 ==== Pandas ==== ==== Pandas ====
  
Line 370: Line 380:
  
 Note: check the example in the [[https://​lectures.scientific-python.org/​packages/​statistics/​index.html|Statistics in Python]] tutorial Note: check the example in the [[https://​lectures.scientific-python.org/​packages/​statistics/​index.html|Statistics in Python]] tutorial
 +
 +
 +==== scikit-learn ====
 +
 +[[http://​scikit-learn.org/​|scikit-learn]] is a Python library for machine learning, and is one of the most widely used tools for supervised and unsupervised machine learning. Scikit–learn provides an easy-to-use,​ consistent interface to a large collection of machine learning models, as well as tools for model evaluation and data preparation
 +
 +Note: check the example in [[https://​lectures.scientific-python.org/​packages/​scikit-learn/​index.html|scikit-learn:​ machine learning in Python]]
 +
 +
 +==== scikit-image ====
 +
 +[[https://​scikit-image.org/​|scikit-image]] is a collection of algorithms for image processing in Python
 +
 +Note: check the example in [[https://​lectures.scientific-python.org/​packages/​scikit-image/​index.html|scikit-image:​ image processing]]
  
  
Line 390: Line 414:
  
 [[https://​github.com/​AutoViML/​AutoViz|AutoViz]]:​ the One-Line Automatic Data Visualization Library. Automatically Visualize any dataset, any size with a single line of code [[https://​github.com/​AutoViML/​AutoViz|AutoViz]]:​ the One-Line Automatic Data Visualization Library. Automatically Visualize any dataset, any size with a single line of code
-==== scikit-learn ==== 
  
-[[http://​scikit-learn.org/​|scikit-learn]] is a Python library for machine learning, and is one of the most widely used tools for supervised and unsupervised machine learning. Scikit–learn provides an easy-to-use,​ consistent interface to a large collection of machine learning models, as well as tools for model evaluation and data preparation 
  
-Note: check the example in [[https://​lectures.scientific-python.org/​packages/​scikit-learn/​index.html|scikit-learn:​ machine learning in Python]] +=====  Data file formats ===== 
-==== scikit-image ​====+
  
-[[https://​scikit-image.org/​|scikit-image]] is a collection of algorithms for image processing in Python +  * We list below some resources about **non-NetCDF data formats** that can be useful
- +
-Note: check the example in [[https://​lectures.scientific-python.org/​packages/​scikit-image/​index.html|scikit-image:​ image processing]] +
- +
- +
-=====  Data file formats ​===== +
  
-We list here some resources about non-NetCDF ​data formats that can be useful+  * Check the [[#​using_netcdf_files_with_python|Using ​NetCDF ​files with Python]] section otherwise
  
 ==== The shelve package ==== ==== The shelve package ====
other/python/jyp_steps.txt · Last modified: 2024/03/07 10:15 by jypeter