Data Preparation¶
This page include all API for data preparation (hidrokit.prep).
Excel module¶
Module for reading excel files.
- prep.excel._cell_index(dataframe, template='phderi')[source]¶
Return cell index (column, row) of first value on pivot.
- Parameters
- dataframeDataFrame
Raw dataframe imported from excel
- templatestr, optional
Template, by default ‘phderi’
- Returns
- list
Return [column index, row index]
- Raises
- Exception
Not match with template.
- prep.excel._dataframe_data(pivot, year)[source]¶
Transform pivot table to list
- Parameters
- pivotDataFrame
Pivot table
- yearint
Year
- Returns
- list
Return list of data
- prep.excel._dataframe_table(pivot, year, name='ch')[source]¶
Transform pivot table to single column dataframe.
- Parameters
- pivotDataFrame
Pivot table
- yearint
Year
- namestr, optional
Column name, by default ‘ch’
- Returns
- DataFrame
Dataframe
Read module¶
Module for reading data.
- prep.read.missing_row(dataframe, date_index=True, date_format='%Y/%m/%d')[source]¶
Return dictionary of missing values dataframe.
Return dictionary contains columns name and list of the index missing values.
- Parameters
- dataframeDataFrame
Dataframe
- date_indexbool, optional
Format index to date_format, by default True
- date_formatstr, optional
String representation of strftime() directive, by default ‘%Y/%m/%d’
- Returns
- dict
Return dictionary of columns name and index of missing values.
Examples
Examples for non-date index:
>>> A = pd.DataFrame(data=[[1, 3, 4, np.nan, 2, np.nan], ... [np.nan, 2, 3, np.nan, 1, 4], ... [2, np.nan, 1, 3, 4, np.nan]], ... columns=['A', 'B', 'C', 'D', 'E', 'F']) ... A A B C D E F 0 1.0 3.0 4 NaN 2 NaN 1 NaN 2.0 3 NaN 1 4.0 2 2.0 NaN 1 3.0 4 NaN >>> missing_row(A, date_index=False) {'A': [1], 'B': [2], 'C': [], 'D': [0, 1], 'E': [], 'F': [0, 2]}
Index is timestamp:
>>> date_index = pd.date_range("20190617", "20190619") >>> A.set_index(date_index, inplace=True) ... A A B C D E F 2019-06-17 1.0 3.0 4 NaN 2 NaN 2019-06-18 NaN 2.0 3 NaN 1 4.0 2019-06-19 2.0 NaN 1 3.0 4 NaN >>> missing_row(A, date_format="%m%d") {'A': ['0618'], 'B': ['0619'], 'C': [], 'D': ['0617', '0618'], 'E': [], 'F': ['0617', '0619']}
Time Series module¶
Manipulation timestep dataframe.
- prep.timeseries._timestep_multi(array, index=None, timesteps=2, keep_first=True)[source]¶
Add timesteps array for multiple column array.
- Parameters
- arrayarray
Multiple numeric column two-dimensional array.
- indexlist of int, optional
List of columns index, by default None
- timestepsint, optional
Number of timesteps, by default 2
- keep_firstbool, optional
Include original column if set True, by default True
- Returns
- array
Return 2D array with timesteps.
- prep.timeseries._timestep_single(array, index=0, timesteps=2, keep_first=True)[source]¶
Add timesteps array for single column array.
- Parameters
- arrayarray
Single column two-dimensional array.
- indexint, optional
Index column, by default 0
- timestepsint, optional
Number of timesteps, by default 2
- keep_firstbool, optional
Include original array if set True, by default True
- Returns
- array
Return 2D array with timesteps.
- prep.timeseries.timestep_table(dataframe, columns=None, timesteps=2, keep_first=True, template='{column}_tmin{i}')[source]¶
Generate timesteps directly from DataFrame.
- Parameters
- dataframeDataFrame
Dataframe consist of numeric-column only
- columnslist of str, optional
List of columns name to generate, by default None
- timestepsint, optional
Number of timesteps, by default 2
- keep_firstbool, optional
Column _tmin0 will be included if set True, by default True
- templatestr
Format column name, by default “{column}_tmin{i}”
- Returns
- DataFrame
DataFrame with additional timesteps columns.