User Tools

Site Tools


other:python:jyp_steps

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
other:python:jyp_steps [2023/12/14 13:52]
jypeter [scikit-learn]
other:python:jyp_steps [2023/12/15 15:56]
jypeter Reorganized the NetCDF section
Line 165: Line 165:
 ===== Using NetCDF files with Python ===== ===== Using NetCDF files with Python =====
  
-<note tip>​People using CMIPn and model data on the IPSL servers can easily search and process NetCDF files using: 
-  * the [[https://​climaf.readthedocs.io/​|Climate Model Assessment Framework (CliMAF)]] environment 
-  * and the [[https://​github.com/​jservonnat/​C-ESM-EP/​wiki|CliMAF Earth System Evaluation Platform (C-ESM-EP)]] 
-</​note>​ 
  
-  ​There is a good chance that your input array data will be stored in a  [[other:​newppl:​starting#​netcdf_and_related_conventions|NetCDF]] ​file.+==== What is NetCDF? ==== 
 + 
 +  ​If you are working with climate model output data, there is a good chance that your input array data will be stored in a NetCDF file! 
 + 
 +  * Read the [[other:​newppl:​starting#​netcdf_and_related_conventions|NetCDF ​and related Conventions]] for more information
  
   * There may be different ways of dealing with NetCDF files, depending on which [[other:​python:​starting#​some_python_distributions|python distribution]] you have access to   * There may be different ways of dealing with NetCDF files, depending on which [[other:​python:​starting#​some_python_distributions|python distribution]] you have access to
  
-==== cdms2 ==== 
  
-Summary: cdms2 can read/write netCDF files (and read //grads// dat+ctl files) ​and provides a higher level interface than netCDF4. cdms2 is available in the [[other:python:​starting#​cdat|CDAT distribution]]and can theoretically be installed independently of CDAT (e.g. it will be installed when you install ​[[https://cmor.llnl.gov/mydoc_cmor3_conda/|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data.+==== CliMAF ​and C-ESM-EP ==== 
 + 
 +People using **//CMIPn// and model data on the IPSL servers** can easily search and process NetCDF files using: 
 + 
 +  * the [[https://​climaf.readthedocs.io/​|Climate Model Assessment Framework (CliMAF)]] environment 
 + 
 +  * and the [[https://github.com/​jservonnat/C-ESM-EP/wiki|CliMAF Earth System Evaluation Platform (C-ESM-EP)]]
  
-How to get started: 
-  - read [[http://​www.lsce.ipsl.fr/​Phocea/​file.php?​class=page&​file=5/​pythonCDAT_jyp_2sur2_070306.pdf|JYP'​s cdms tutorial]], starting at page 54 
-    - the tutorial is in French (soooorry!) 
-    - you have to replace //cdms// with **cdms2**, and //MV// with **MV2** (sooorry about that, the tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data) 
-  - read the [[http://​cdms.readthedocs.io/​en/​docstanya/​index.html|official cdms documentation]] (link may change) 
  
 ==== xarray ==== ==== xarray ====
  
-Summary: ​[[http://xarray.pydata.org/​en/​stable/​|xarray]] ​is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun! [...] It is particularly tailored to working with netCDF files+[[https://docs.xarray.dev/|xarray]] makes working with labelled multi-dimensional arrays ​in Python ​simple, efficient, and fun! [...] It is particularly tailored to working with netCDF files
  
 === Some xarray related resources === === Some xarray related resources ===
  
-Note: more packages (than listed below) may be listed in the [[other:​uvcdat:​cdat_conda:​cdat_8_2_1#​extra_packages_list|Extra packages list]]+Note: more packages (than listed below) may be listed in the [[other:​uvcdat:​cdat_conda:​cdat_8_2_1#​extra_packages_list|Extra packages list]] ​page
  
-  * [[https://​xcdat.readthedocs.io/​|xcdat]]: xarray extended with Climate Data Analysis Tools+  ​* [[https://​docs.xarray.dev/​en/​stable/​generated/​xarray.tutorial.load_dataset.html|xarray test datasets]] 
 + 
 +  * **[[https://​xcdat.readthedocs.io/​|xCDAT]]: ''​xarray'' ​extended with Climate Data Analysis Tools**
  
   * [[https://​xoa.readthedocs.io/​en/​latest/​|xoa]]:​ xarray-based ocean analysis library   * [[https://​xoa.readthedocs.io/​en/​latest/​|xoa]]:​ xarray-based ocean analysis library
  
   * [[https://​uxarray.readthedocs.io/​|uxarray]]:​ provide xarray styled functionality for unstructured grid datasets following [[https://​ugrid-conventions.github.io/​ugrid-conventions/​|UGRID Conventions]]   * [[https://​uxarray.readthedocs.io/​|uxarray]]:​ provide xarray styled functionality for unstructured grid datasets following [[https://​ugrid-conventions.github.io/​ugrid-conventions/​|UGRID Conventions]]
- 
  
  
 ==== netCDF4 ==== ==== netCDF4 ====
  
-Summary: //netCDF4 can read/write netCDF files and is available in most python ​distributions//+[[http://unidata.github.io/​netcdf4-python/​|netCDF4]] is a Python interface to the netCDF C library 
 + 
 + 
 +==== cdms2 ==== 
 + 
 +<note important>​ 
 +  * ''​cdms2''​ is unfortunately not maintained anymore and is slowly being **phased out in favor of a combination of [[#​xarray|xarray]] and [[https://​xcdat.readthedocs.io/​|xCDAT]]** 
 + 
 +  * ''​cdms2''​ will [[https://​github.com/​CDAT/​cdms/​issues/​449|not be compatible with numpy after numpy 1.23.5]] :-( 
 +</​note>​ 
 + 
 +[[https://​cdms.readthedocs.io/​en/​docstanya/​|cdms2]] ​can read/write netCDF files (and read //grads// dat+ctl files) and provides a higher level interface than netCDF4. ''​cdms2'' ​is available in the [[other:python:​starting#​cdat|CDAT distribution]],​ and can theoretically be installed independently of CDAT (e.g. it will be installed when you install [[https://​cmor.llnl.gov/​mydoc_cmor3_conda/​|CMOR in conda)]]. When you can use cdms2, you also have access to //cdtime//, that is very useful for handling time axis data. 
 + 
 +How to get started: 
 +  - read [[http://​www.lsce.ipsl.fr/​Phocea/​file.php?​class=page&​file=5/​pythonCDAT_jyp_2sur2_070306.pdf|JYP'​s cdms tutorial]], starting at page 54 
 +    - the tutorial is in French (soooorry!) 
 +    - you have to replace //cdms// with **cdms2**, and //MV// with **MV2** (sooorry about that, the tutorial was written when CDAT was based on //Numeric// instead of //numpy// to handle array data) 
 +  - read the [[http://​cdms.readthedocs.io/​en/docstanya/index.html|official cdms documentation]] (link may change)
  
-Where: [[http://​unidata.github.io/​netcdf4-python/​]] 
  
 ===== CDAT-related resources ===== ===== CDAT-related resources =====
Line 315: Line 332:
  
 ===== Data analysis ===== ===== Data analysis =====
 +
 +==== EDA (Exploratory Data Analysis) ? ====
 +
 +<note tip>
 +The //EDA concept// seems to apply to **time series** (and tabular data), which is not exactly the case of full climate model output data</​note>​
 +
 +  * [[https://​www.geeksforgeeks.org/​what-is-exploratory-data-analysis/​|What is Exploratory Data Analysis ?]]
 +    * //The method of studying and exploring record sets to apprehend their predominant traits, discover patterns, locate outliers, and identify relationships between variables. EDA is normally carried out as a preliminary step before undertaking extra formal statistical analyses or modeling.//
 +
 +  * [[https://​medium.com/​codex/​automate-the-exploratory-data-analysis-eda-to-understand-the-data-faster-not-better-2ed6ff230eed|Automate the exploratory data analysis (EDA) to understand the data faster and easier]]: a nice comparison of some Python libraries listed below ([[#​ydata_profiling|YData Profiling]],​ [[#​d-tale|D-Tale]],​ [[#​sweetviz|sweetviz]],​ [[#​autoviz|AutoViz]])
 +
 +  * [[https://​www.geeksforgeeks.org/​exploratory-data-analysis-in-python/​|EDA in Python]]
 +
 +
 +==== Easy to use datasets ====
 +
 +If you need standard datasets for testing, example, demos, ...
 +
 +  * [[https://​docs.xarray.dev/​en/​stable/​generated/​xarray.tutorial.load_dataset.html|Tutorial datasets]] from [[#​xarray|xarray]] (requires internet)
 +    * Example: [[https://​docs.xarray.dev/​en/​stable/​examples/​visualization_gallery.html|Using the 'air temperature'​ dataset]]
 +
 +  * [[https://​scikit-learn.org/​stable/​datasets.html|Toy,​ real-world and generated datasets]] from [[#​scikit-learn]]
 +    * Example: [[https://​lectures.scientific-python.org/​packages/​scikit-learn/​index.html#​a-simple-example-the-iris-dataset|using the '​iris'​ dataset]]
 +
 +  * [[https://​scikit-image.org/​docs/​stable/​api/​skimage.data.html|Test images and datasets]] from [[#​scikit-image]]
 +    * Example: [[https://​lectures.scientific-python.org/​packages/​scikit-image/​index.html#​data-types|Using the '​camera'​ dataset]]
 +
 +  * [[https://​esgf-node.ipsl.upmc.fr/​search/​cmip6-ipsl/​|CMIP6 data]] on ESGF
 +    * Example : ''​orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc'':​
 +      * [[http://​vesg.ipsl.upmc.fr/​thredds/​fileServer/​cmip6/​CMIP/​IPSL/​IPSL-CM6A-LR/​piControl/​r1i1p1f1/​fx/​orog/​gr/​v20200326/​orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc|HTTP]] download link
 +      * [[http://​vesg.ipsl.upmc.fr/​thredds/​dodsC/​cmip6/​CMIP/​IPSL/​IPSL-CM6A-LR/​piControl/​r1i1p1f1/​fx/​orog/​gr/​v20200326/​orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc.dods|OpenDAP]] download link
 +
 +  * [[https://​github.com/​xCDAT/​xcdat/​issues/​277|xCDAT test data GH discussion]]
  
  
Line 346: Line 396:
  
 Note: check the example in [[https://​lectures.scientific-python.org/​packages/​scikit-learn/​index.html|scikit-learn:​ machine learning in Python]] Note: check the example in [[https://​lectures.scientific-python.org/​packages/​scikit-learn/​index.html|scikit-learn:​ machine learning in Python]]
 +
 +
 ==== scikit-image ==== ==== scikit-image ====
  
Line 351: Line 403:
  
 Note: check the example in [[https://​lectures.scientific-python.org/​packages/​scikit-image/​index.html|scikit-image:​ image processing]] Note: check the example in [[https://​lectures.scientific-python.org/​packages/​scikit-image/​index.html|scikit-image:​ image processing]]
 +
 +
 +==== YData Profiling ====
 +
 +[[https://​docs.profiling.ydata.ai/​|YData Profiling]]:​ a leading package for data profiling, that automates and standardizes the generation of detailed reports, complete with statistics and visualizations.
 +
 +
 +==== D-Tale ====
 +
 +[[https://​github.com/​man-group/​dtale|D-Tale]] brings you an easy way to view & analyze Pandas data structures. It integrates seamlessly with ipython notebooks & python/​ipython terminals.
 +
 +
 +==== Sweetviz ====
 +
 +[[https://​github.com/​fbdesignpro/​sweetviz|Sweetviz]] is pandas based Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code.
 +
 +
 +==== AutoViz ====
 +
 +[[https://​github.com/​AutoViML/​AutoViz|AutoViz]]:​ the One-Line Automatic Data Visualization Library. Automatically Visualize any dataset, any size with a single line of code
  
  
 =====  Data file formats =====  =====  Data file formats ===== 
  
-We list here some resources about non-NetCDF data formats that can be useful+  * We list below some resources about **non-NetCDF data formats** that can be useful 
 + 
 +  * Check the [[#​using_netcdf_files_with_python|Using NetCDF files with Python]] section otherwise
  
 ==== The shelve package ==== ==== The shelve package ====
other/python/jyp_steps.txt · Last modified: 2024/03/07 10:15 by jypeter