Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
other:python:jyp_steps [2023/12/15 15:16] – Added the EDA section jypeter | other:python:jyp_steps [2023/12/15 16:56] – Reorganized the NetCDF section jypeter |
---|
===== Using NetCDF files with Python ===== | ===== Using NetCDF files with Python ===== |
| |
<note tip>People using CMIPn and model data on the IPSL servers can easily search and process NetCDF files using: | |
* the [[https://climaf.readthedocs.io/|Climate Model Assessment Framework (CliMAF)]] environment | |
* and the [[https://github.com/jservonnat/C-ESM-EP/wiki|CliMAF Earth System Evaluation Platform (C-ESM-EP)]] | |
</note> | |
| |
* There is a good chance that your input array data will be stored in a [[other:newppl:starting#netcdf_and_related_conventions|NetCDF]] file. | ==== What is NetCDF? ==== |
| |
| * If you are working with climate model output data, there is a good chance that your input array data will be stored in a NetCDF file! |
| |
| * Read the [[other:newppl:starting#netcdf_and_related_conventions|NetCDF and related Conventions]] for more information |
| |
* There may be different ways of dealing with NetCDF files, depending on which [[other:python:starting#some_python_distributions|python distribution]] you have access to | * There may be different ways of dealing with NetCDF files, depending on which [[other:python:starting#some_python_distributions|python distribution]] you have access to |
| |
==== cdms2 ==== | |
| |
Summary: cdms2 can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. cdms2 is available in the [[other:python:starting#cdat|CDAT distribution]], and can theoretically be installed independently of CDAT (e.g. it will be installed when you install [[https://cmor.llnl.gov/mydoc_cmor3_conda/|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data. | ==== CliMAF and C-ESM-EP ==== |
| |
| People using **//CMIPn// and model data on the IPSL servers** can easily search and process NetCDF files using: |
| |
| * the [[https://climaf.readthedocs.io/|Climate Model Assessment Framework (CliMAF)]] environment |
| |
| * and the [[https://github.com/jservonnat/C-ESM-EP/wiki|CliMAF Earth System Evaluation Platform (C-ESM-EP)]] |
| |
How to get started: | |
- read [[http://www.lsce.ipsl.fr/Phocea/file.php?class=page&file=5/pythonCDAT_jyp_2sur2_070306.pdf|JYP's cdms tutorial]], starting at page 54 | |
- the tutorial is in French (soooorry!) | |
- you have to replace //cdms// with **cdms2**, and //MV// with **MV2** (sooorry about that, the tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data) | |
- read the [[http://cdms.readthedocs.io/en/docstanya/index.html|official cdms documentation]] (link may change) | |
| |
==== xarray ==== | ==== xarray ==== |
| |
Summary: [[https://docs.xarray.dev/|xarray]] makes working with labelled multi-dimensional arrays in Python simple, efficient, and fun! [...] It is particularly tailored to working with netCDF files | [[https://docs.xarray.dev/|xarray]] makes working with labelled multi-dimensional arrays in Python simple, efficient, and fun! [...] It is particularly tailored to working with netCDF files |
| |
=== Some xarray related resources === | === Some xarray related resources === |
* [[https://docs.xarray.dev/en/stable/generated/xarray.tutorial.load_dataset.html|xarray test datasets]] | * [[https://docs.xarray.dev/en/stable/generated/xarray.tutorial.load_dataset.html|xarray test datasets]] |
| |
* [[https://xcdat.readthedocs.io/|xcdat]]: xarray extended with Climate Data Analysis Tools | * **[[https://xcdat.readthedocs.io/|xCDAT]]: ''xarray'' extended with Climate Data Analysis Tools** |
| |
* [[https://xoa.readthedocs.io/en/latest/|xoa]]: xarray-based ocean analysis library | * [[https://xoa.readthedocs.io/en/latest/|xoa]]: xarray-based ocean analysis library |
| |
* [[https://uxarray.readthedocs.io/|uxarray]]: provide xarray styled functionality for unstructured grid datasets following [[https://ugrid-conventions.github.io/ugrid-conventions/|UGRID Conventions]] | * [[https://uxarray.readthedocs.io/|uxarray]]: provide xarray styled functionality for unstructured grid datasets following [[https://ugrid-conventions.github.io/ugrid-conventions/|UGRID Conventions]] |
| |
| |
| |
==== netCDF4 ==== | ==== netCDF4 ==== |
| |
Summary: //netCDF4 can read/write netCDF files and is available in most python distributions// | [[http://unidata.github.io/netcdf4-python/|netCDF4]] is a Python interface to the netCDF C library |
| |
| |
| ==== cdms2 ==== |
| |
| <note important> |
| * ''cdms2'' is unfortunately not maintained anymore and is slowly being **phased out in favor of a combination of [[#xarray|xarray]] and [[https://xcdat.readthedocs.io/|xCDAT]]** |
| |
| * ''cdms2'' will [[https://github.com/CDAT/cdms/issues/449|not be compatible with numpy after numpy 1.23.5]] :-( |
| </note> |
| |
| [[https://cdms.readthedocs.io/en/docstanya/|cdms2]] can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. ''cdms2'' is available in the [[other:python:starting#cdat|CDAT distribution]], and can theoretically be installed independently of CDAT (e.g. it will be installed when you install [[https://cmor.llnl.gov/mydoc_cmor3_conda/|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data. |
| |
| How to get started: |
| - read [[http://www.lsce.ipsl.fr/Phocea/file.php?class=page&file=5/pythonCDAT_jyp_2sur2_070306.pdf|JYP's cdms tutorial]], starting at page 54 |
| - the tutorial is in French (soooorry!) |
| - you have to replace //cdms// with **cdms2**, and //MV// with **MV2** (sooorry about that, the tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data) |
| - read the [[http://cdms.readthedocs.io/en/docstanya/index.html|official cdms documentation]] (link may change) |
| |
Where: [[http://unidata.github.io/netcdf4-python/]] | |
| |
===== CDAT-related resources ===== | ===== CDAT-related resources ===== |
* //The method of studying and exploring record sets to apprehend their predominant traits, discover patterns, locate outliers, and identify relationships between variables. EDA is normally carried out as a preliminary step before undertaking extra formal statistical analyses or modeling.// | * //The method of studying and exploring record sets to apprehend their predominant traits, discover patterns, locate outliers, and identify relationships between variables. EDA is normally carried out as a preliminary step before undertaking extra formal statistical analyses or modeling.// |
| |
* [[https://medium.com/codex/automate-the-exploratory-data-analysis-eda-to-understand-the-data-faster-not-better-2ed6ff230eed|Automate the exploratory data analysis (EDA) to understand the data faster and easier]]: a nice comparison of some Python libraries listed below (''ydata-profiling'', ''D-Tale'', ''sweetviz'', ''autoviz'') | * [[https://medium.com/codex/automate-the-exploratory-data-analysis-eda-to-understand-the-data-faster-not-better-2ed6ff230eed|Automate the exploratory data analysis (EDA) to understand the data faster and easier]]: a nice comparison of some Python libraries listed below ([[#ydata_profiling|YData Profiling]], [[#d-tale|D-Tale]], [[#sweetviz|sweetviz]], [[#autoviz|AutoViz]]) |
| |
* [[https://www.geeksforgeeks.org/exploratory-data-analysis-in-python/|EDA in Python]] | * [[https://www.geeksforgeeks.org/exploratory-data-analysis-in-python/|EDA in Python]] |
| |
| |
==== Easy to use datasets ==== | ==== Easy to use datasets ==== |
| |
| |
* [[https://github.com/xCDAT/xcdat/issues/277|xCDAT test data GH discussion]] | * [[https://github.com/xCDAT/xcdat/issues/277|xCDAT test data GH discussion]] |
| |
| |
==== Pandas ==== | ==== Pandas ==== |
| |
| |
Note: check the example in the [[https://lectures.scientific-python.org/packages/statistics/index.html|Statistics in Python]] tutorial | Note: check the example in the [[https://lectures.scientific-python.org/packages/statistics/index.html|Statistics in Python]] tutorial |
| |
| |
| ==== scikit-learn ==== |
| |
| [[http://scikit-learn.org/|scikit-learn]] is a Python library for machine learning, and is one of the most widely used tools for supervised and unsupervised machine learning. Scikit–learn provides an easy-to-use, consistent interface to a large collection of machine learning models, as well as tools for model evaluation and data preparation |
| |
| Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-learn/index.html|scikit-learn: machine learning in Python]] |
| |
| |
| ==== scikit-image ==== |
| |
| [[https://scikit-image.org/|scikit-image]] is a collection of algorithms for image processing in Python |
| |
| Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-image/index.html|scikit-image: image processing]] |
| |
| |
| |
[[https://github.com/AutoViML/AutoViz|AutoViz]]: the One-Line Automatic Data Visualization Library. Automatically Visualize any dataset, any size with a single line of code | [[https://github.com/AutoViML/AutoViz|AutoViz]]: the One-Line Automatic Data Visualization Library. Automatically Visualize any dataset, any size with a single line of code |
==== scikit-learn ==== | |
| |
[[http://scikit-learn.org/|scikit-learn]] is a Python library for machine learning, and is one of the most widely used tools for supervised and unsupervised machine learning. Scikit–learn provides an easy-to-use, consistent interface to a large collection of machine learning models, as well as tools for model evaluation and data preparation | |
| |
Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-learn/index.html|scikit-learn: machine learning in Python]] | ===== Data file formats ===== |
==== scikit-image ==== | |
| |
[[https://scikit-image.org/|scikit-image]] is a collection of algorithms for image processing in Python | * We list below some resources about **non-NetCDF data formats** that can be useful |
| |
Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-image/index.html|scikit-image: image processing]] | |
| |
| |
===== Data file formats ===== | |
| |
We list here some resources about non-NetCDF data formats that can be useful | * Check the [[#using_netcdf_files_with_python|Using NetCDF files with Python]] section otherwise |
| |
==== The shelve package ==== | ==== The shelve package ==== |