Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
other:python:jyp_steps [2023/12/13 15:49] – Moved and renamed Scipy Lecture Notes jypeter | other:python:jyp_steps [2023/12/15 15:37] – Moved scikit-learn and scikit-image in front of lesser known libraries jypeter |
---|
You can start using python by reading the {{:other:python:python_intro_ipsl_oct2013_v2.pdf|Bien démarrer avec python}} tutorial that was used during a 2013 IPSL python class: | You can start using python by reading the {{:other:python:python_intro_ipsl_oct2013_v2.pdf|Bien démarrer avec python}} tutorial that was used during a 2013 IPSL python class: |
* this tutorial is in French (my apologies for the lack of translation, but it should be easy to understand) | * this tutorial is in French (my apologies for the lack of translation, but it should be easy to understand) |
* If you have too much trouble understanding this French Tutorial, you can read the first 6 chapters of the **Tutorial** in [[#the_official_python_documentation|the official Python documentation]] and chapters 1.2.1 to 1.2.5 in the [[#scipy_lecture_notes|Scipy Lecture Notes]]. Once you have read these, you can try to read the French tutorial again | * If you have too much trouble understanding this French Tutorial, you can read the first 6 chapters of the **Tutorial** in [[#the_official_python_documentation|the official Python documentation]] and chapters 1.2.1 to 1.2.5 in the [[#scientific_python_lectures|Scientific Python Lectures]]. Once you have read these, you can try to read the French tutorial again |
* it's an introduction to python (and programming) for the climate scientist: after reading this tutorial, you should be able to do most of the things you usually do in a shell script | * it's an introduction to python (and programming) for the climate scientist: after reading this tutorial, you should be able to do most of the things you usually do in a shell script |
* python types, tests, loops, reading a text file | * python types, tests, loops, reading a text file |
==== xarray ==== | ==== xarray ==== |
| |
Summary: [[http://xarray.pydata.org/en/stable/|xarray]] is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun! [...] It is particularly tailored to working with netCDF files | Summary: [[https://docs.xarray.dev/|xarray]] makes working with labelled multi-dimensional arrays in Python simple, efficient, and fun! [...] It is particularly tailored to working with netCDF files |
| |
=== Some xarray related resources === | === Some xarray related resources === |
| |
Note: more packages (than listed below) may be listed in the [[other:uvcdat:cdat_conda:cdat_8_2_1#extra_packages_list|Extra packages list]] | Note: more packages (than listed below) may be listed in the [[other:uvcdat:cdat_conda:cdat_8_2_1#extra_packages_list|Extra packages list]] page |
| |
| * [[https://docs.xarray.dev/en/stable/generated/xarray.tutorial.load_dataset.html|xarray test datasets]] |
| |
* [[https://xcdat.readthedocs.io/|xcdat]]: xarray extended with Climate Data Analysis Tools | * [[https://xcdat.readthedocs.io/|xcdat]]: xarray extended with Climate Data Analysis Tools |
| |
| |
===== 3D resources ===== | ===== 3D plots resources ===== |
| |
* [[https://ipyvolume.readthedocs.io/en/latest/|Ipyvolume]] | * [[https://ipyvolume.readthedocs.io/en/latest/|Ipyvolume]] |
* [[https://zulko.wordpress.com/2012/09/29/animate-your-3d-plots-with-pythons-matplotlib/|Animate your 3D plots with Python’s Matplotlib]] | * [[https://zulko.wordpress.com/2012/09/29/animate-your-3d-plots-with-pythons-matplotlib/|Animate your 3D plots with Python’s Matplotlib]] |
* [[https://stackoverflow.com/questions/26796997/how-to-get-vertical-z-axis-in-3d-surface-plot-of-matplotlib|How to get vertical Z axis in 3D surface plot of Matplotlib?]] | * [[https://stackoverflow.com/questions/26796997/how-to-get-vertical-z-axis-in-3d-surface-plot-of-matplotlib|How to get vertical Z axis in 3D surface plot of Matplotlib?]] |
| |
| ===== Data analysis ===== |
| |
| ==== EDA (Exploratory Data Analysis) ? ==== |
| |
| <note tip> |
| The //EDA concept// seems to apply to **time series** (and tabular data), which is not exactly the case of full climate model output data</note> |
| |
| * [[https://www.geeksforgeeks.org/what-is-exploratory-data-analysis/|What is Exploratory Data Analysis ?]] |
| * //The method of studying and exploring record sets to apprehend their predominant traits, discover patterns, locate outliers, and identify relationships between variables. EDA is normally carried out as a preliminary step before undertaking extra formal statistical analyses or modeling.// |
| |
| * [[https://medium.com/codex/automate-the-exploratory-data-analysis-eda-to-understand-the-data-faster-not-better-2ed6ff230eed|Automate the exploratory data analysis (EDA) to understand the data faster and easier]]: a nice comparison of some Python libraries listed below (''ydata-profiling'', ''D-Tale'', ''sweetviz'', ''autoviz'') |
| |
| * [[https://www.geeksforgeeks.org/exploratory-data-analysis-in-python/|EDA in Python]] |
| |
| |
| ==== Easy to use datasets ==== |
| |
| If you need standard datasets for testing, example, demos, ... |
| |
| * [[https://docs.xarray.dev/en/stable/generated/xarray.tutorial.load_dataset.html|Tutorial datasets]] from [[#xarray|xarray]] (requires internet) |
| * Example: [[https://docs.xarray.dev/en/stable/examples/visualization_gallery.html|Using the 'air temperature' dataset]] |
| |
| * [[https://scikit-learn.org/stable/datasets.html|Toy, real-world and generated datasets]] from [[#scikit-learn]] |
| * Example: [[https://lectures.scientific-python.org/packages/scikit-learn/index.html#a-simple-example-the-iris-dataset|using the 'iris' dataset]] |
| |
| * [[https://scikit-image.org/docs/stable/api/skimage.data.html|Test images and datasets]] from [[#scikit-image]] |
| * Example: [[https://lectures.scientific-python.org/packages/scikit-image/index.html#data-types|Using the 'camera' dataset]] |
| |
| * [[https://esgf-node.ipsl.upmc.fr/search/cmip6-ipsl/|CMIP6 data]] on ESGF |
| * Example : ''orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc'': |
| * [[http://vesg.ipsl.upmc.fr/thredds/fileServer/cmip6/CMIP/IPSL/IPSL-CM6A-LR/piControl/r1i1p1f1/fx/orog/gr/v20200326/orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc|HTTP]] download link |
| * [[http://vesg.ipsl.upmc.fr/thredds/dodsC/cmip6/CMIP/IPSL/IPSL-CM6A-LR/piControl/r1i1p1f1/fx/orog/gr/v20200326/orog_fx_IPSL-CM6A-LR_piControl_r1i1p1f1_gr.nc.dods|OpenDAP]] download link |
| |
| * [[https://github.com/xCDAT/xcdat/issues/277|xCDAT test data GH discussion]] |
| |
| |
| ==== Pandas ==== |
| |
| Summary: //pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool// |
| |
| Where: [[http://pandas.pydata.org|Pandas web site]] |
| |
| JYP's comment: pandas is supposed to be quite good for loading, processing and plotting time series, without writing custom code. It is **very convenient for processing tables in xlsx files** (or csv, etc...). You should at least have a quick look at: |
| |
| * Some //Cheat Sheets//: |
| - Basics: [[https://github.com/fralfaro/DS-Cheat-Sheets/blob/main/docs/files/pandas_cs.pdf|Pandas Basics Cheat Sheet]] (associated with the [[https://www.datacamp.com/cheat-sheet/pandas-cheat-sheet-for-data-science-in-python#python-for-data-science-cheat-sheet:-pandas-basics-useth|Pandas basics]] //datacamp// introduction page) |
| - Intermediate: [[https://github.com/pandas-dev/pandas/blob/main/doc/cheatsheet/Pandas_Cheat_Sheet.pdf|Data Wrangling with pandas Cheat Sheet]] |
| * Some tutorials: |
| * [[http://pandas.pydata.org/docs/user_guide/10min.html|10 minutes to pandas]] |
| * The [[https://lectures.scientific-python.org/packages/statistics/index.html|Statistics in Python]] tutorial that combines Pandas, [[#statsmodels|statsmodels]] and [[http://seaborn.pydata.org/|Seaborn]] |
| * More [[http://pandas.pydata.org/docs/getting_started/tutorials.html|Community tutorials]]... |
| |
| |
| ==== statsmodels ==== |
| |
| [[https://www.statsmodels.org/|statsmodels]] is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. |
| |
| Note: check the example in the [[https://lectures.scientific-python.org/packages/statistics/index.html|Statistics in Python]] tutorial |
| |
| |
| ==== scikit-learn ==== |
| |
| [[http://scikit-learn.org/|scikit-learn]] is a Python library for machine learning, and is one of the most widely used tools for supervised and unsupervised machine learning. Scikit–learn provides an easy-to-use, consistent interface to a large collection of machine learning models, as well as tools for model evaluation and data preparation |
| |
| Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-learn/index.html|scikit-learn: machine learning in Python]] |
| |
| |
| ==== scikit-image ==== |
| |
| [[https://scikit-image.org/|scikit-image]] is a collection of algorithms for image processing in Python |
| |
| Note: check the example in [[https://lectures.scientific-python.org/packages/scikit-image/index.html|scikit-image: image processing]] |
| |
| |
| ==== YData Profiling ==== |
| |
| [[https://docs.profiling.ydata.ai/|YData Profiling]]: a leading package for data profiling, that automates and standardizes the generation of detailed reports, complete with statistics and visualizations. |
| |
| |
| ==== D-Tale ==== |
| |
| [[https://github.com/man-group/dtale|D-Tale]] brings you an easy way to view & analyze Pandas data structures. It integrates seamlessly with ipython notebooks & python/ipython terminals. |
| |
| |
| ==== Sweetviz ==== |
| |
| [[https://github.com/fbdesignpro/sweetviz|Sweetviz]] is pandas based Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. |
| |
| |
| ==== AutoViz ==== |
| |
| [[https://github.com/AutoViML/AutoViz|AutoViz]]: the One-Line Automatic Data Visualization Library. Automatically Visualize any dataset, any size with a single line of code |
| |
| |
===== Data file formats ===== | ===== Data file formats ===== |
* [[https://github.com/LibraryOfCongress/bagger|Bagger]] (BagIt GUI) | * [[https://github.com/LibraryOfCongress/bagger|Bagger]] (BagIt GUI) |
* [[https://github.com/LibraryOfCongress/bagit-python|bagit-python]] | * [[https://github.com/LibraryOfCongress/bagit-python|bagit-python]] |
===== Pandas ===== | |
| |
Summary: //pandas is a library providing high-performance, easy-to-use data structures and data analysis tools// | |
| |
Where: [[http://pandas.pydata.org|Pandas web site]] | |
| |
JYP's comment: pandas is supposed to be quite good for loading, processing and plotting time series, without writing custom code. It is **very convenient for processing tables in xlsx files** (or csv, etc...). You should at least have a quick look at: | |
| |
* Some //Cheat Sheets// (in the following order): | |
- Basics: [[http://datacamp-community-prod.s3.amazonaws.com/dbed353d-2757-4617-8206-8767ab379ab3|Pandas basics]] (associated with the [[https://www.datacamp.com/community/blog/python-pandas-cheat-sheet|Pandas Cheat Sheet for Data Science in Python]] pandas introduction page) | |
- Intermediate: [[https://github.com/pandas-dev/pandas/tree/master/doc/cheatsheet|github Pandas doc page]] | |
- Advanced: the cheat sheet on the [[https://www.enthought.com/services/training/pandas-mastery-workshop/|Enthought workshops advertising page]] | |
* Some tutorials: | |
* [[https://www.datacamp.com/community/blog/python-pandas-cheat-sheet|Pandas Cheat Sheet for Data Science in Python]] pandas introduction page | |
* The [[http://www.scipy-lectures.org/packages/statistics/index.html|Statistics in Python]] tutorial that combines Pandas, [[http://statsmodels.sourceforge.net/|Statsmodels]] and [[http://seaborn.pydata.org/|Seaborn]] | |
| |
| |
===== statsmodels ===== | |
| |
[[https://www.statsmodels.org/|statsmodels ]] is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. | |
| |
===== scikit-learn ===== | |
| |
[[http://scikit-learn.org/|scikit-learn]] is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities. | |
| |
===== scikit-image ===== | |
| |
[[https://scikit-image.org/|scikit-image]] is a collection of algorithms for image processing in Python | |
| |
===== Quick Reference and cheat sheets ===== | ===== Quick Reference and cheat sheets ===== |