# Data Repository — Impedance-Based Estimation of Process Parameters via Circuit-Embedded Neural Networks (CENN)

This repository contains all data supporting the paper *"Impedance-Based Estimation of Process Parameters in Electrolytic Systems via Circuit-Embedded Neural Networks (CENN)"* on nanoporous gold (NPG) electrodes in aqueous H₂SO₄. The data span raw electrochemical impedance spectroscopy (EIS) measurements, physics-aware equivalent-circuit fits, Circuit-Embedded Neural Network (CENN) forward and inverse results, sensitivity analysis, and classical machine learning benchmarks.

---

## Table of Contents

1. [Overview](#overview)
2. [Experimental Domain](#experimental-domain)
3. [Folder Structure](#folder-structure)
4. [Raw_Experimental_Data](#raw_experimental_data)
5. [Data_From_Modeling](#data_from_modeling)
6. [File Formats & Column Conventions](#file-formats--column-conventions)
7. [Data Flow & Pipeline](#data-flow--pipeline)
8. [Key Results Summary](#key-results-summary)
9. [Citation](#citation)

---

## Overview

The dataset supports a physics-aware framework for real-time estimation of electrolyte **concentration** (c) and **temperature** (T) from EIS. It includes:

- **260 EIS spectra** across 20 concentrations (1–20 mM) and 13 temperatures (26–50 °C)
- **ZARC + transmission-line (TL)** equivalent-circuit fits for each spectrum
- **CENN forward model** outputs (impedance predictions, θ(C,T) heatmaps)
- **CENN inverse model** results (C,T estimation from full and reduced-frequency sets)
- **Jacobian-based sensitivity analysis** (band-wise information, frequency anchors)
- **Classical ML regression** benchmarks (Ridge, SVR, GPR, MLP, etc.)

---

## Experimental Domain

| Parameter | Range |
|-----------|-------|
| Concentration | 1–20 mM H₂SO₄ |
| Temperature | 26–50 °C |
| Frequency | 0.1 Hz – 100 kHz (varies by measurement) |
| Electrodes | Nanoporous gold (NPG) WE & CE, Pt pseudo-reference |
| DC bias | 0.1 V vs Pt |
| AC amplitude | 10 mV |

---

## Folder Structure

```
Data/
├── Raw_Experimental_Data/          # Raw EIS measurements + fits
│   └── {1–20}mM/
│       └── {26–50}C/
│           ├── EIS_whole_spectrum_*.csv
│           ├── single_frequency_summary.csv
│           └── fitting_TL/
│               └── EIS_whole_spectrum_*/
│                   ├── params_TL.csv
│                   ├── raw_vs_fit_TL.csv
│                   └── nyquist_TL.png
│
└── Data_From_Modeling/             # All modeling outputs
    ├── Fitted_Data_ZARC_TL/        # Synthetic spectra from fitted EC
    ├── PINN_forward/               # CENN forward model
    ├── PINN_inverse/               # Inverse C,T estimation
    │   ├── pipline_reports/        # Per-band frequency selection
    │   └── operando_piplines/      # Operando/config variants
    ├── sensitivity_analysis/       # Jacobian sensitivity
    ├── ML_regression/              # Classical ML outputs
    └── model_scores/               # Comparative metrics
```

---

## Raw_Experimental_Data

Raw EIS measurements and their equivalent-circuit fits, organized by concentration and temperature.

### Directory Layout

```
Raw_Experimental_Data/
├── 1mM/
│   ├── 26C/
│   │   ├── EIS_whole_spectrum_1mM_H2SO4_26C_10kHz_0.1Hz_pH=3.17_2.csv
│   │   ├── single_frequency_summary.csv
│   │   └── fitting_TL/
│   │       └── EIS_whole_spectrum_1mM_H2SO4_26C_10kHz_0.1Hz_pH=3.17_2/
│   │           ├── params_TL.csv
│   │           ├── raw_vs_fit_TL.csv
│   │           └── nyquist_TL.png
│   ├── 28C/
│   └── ...
├── 2mM/
├── ...
└── 20mM/
```

### File Descriptions

| File | Description |
|------|-------------|
| `EIS_whole_spectrum_*mM_H2SO4_*C_*kHz_0.1Hz_pH=*.csv` | Raw EIS spectrum: frequency, Z′, −Z″, \|Z\|, phase |
| `single_frequency_summary.csv` | Extracted/cut/interpolated data (e.g., 1–5 kHz band) for ML input |
| `fitting_TL/*/params_TL.csv` | Fitted ZARC+TL parameters (Rs, Rp, Y0, n0, r, y0, n1, L, RMSE) |
| `fitting_TL/*/raw_vs_fit_TL.csv` | Raw vs. fitted Z for Nyquist comparison |
| `fitting_TL/*/nyquist_TL.png` | Nyquist plot (data vs. fit) |

### Equivalent Circuit

Z(ω) = Rₛ + Zarc(Rp, Y0, n0) + Z_TL(r, y0, n1, L)

- **Rₛ**: Solution resistance (Ω)
- **Zarc**: ZARC (Rp‖CPE) — interfacial polarization
- **Z_TL**: Finite-length transmission line — porous transport

---

## Data_From_Modeling

All outputs from the modeling pipeline: fits, CENN, inverse, sensitivity, and ML.

### 1. Fitted_Data_ZARC_TL

**Purpose:** Synthetic impedance spectra generated from the fitted ZARC+TL parameters.

**Files:** `singlefreq_{C}mM_{T}C_500pts_0.2-100000Hz.csv` (260 files)

**Format:** 500 logarithmically spaced frequencies (0.2 Hz – 100 kHz) with Z′, −Z″, \|Z\|, −Phase.

**Use:** Training and validation of CENN and classical ML models; consistent frequency grid across (C,T).

---

### 2. PINN_forward

**Purpose:** Circuit-Embedded Neural Network (CENN) forward model outputs.

**Key files:**

| File | Description |
|------|-------------|
| `compiled_dataset.csv` | Full dataset: frequency_Hz, Z_real, Z_imag_neg, concentration_mM, temperature_C |
| `metrics_test.csv` | Test-set metrics (R², MAE, RMSE for Z′, −Z″) |
| `test_predictions_pinn.csv` | Test predictions (true vs. predicted Z′, −Z″) |
| `config_used.json` | Training configuration |
| `theta_heatmaps_csv/theta_grid_long.csv` | θ(C,T) grid: Rs, Rp, Y0, n0, r, y0, n1, L |
| `plots/` | Parity, residual, error histogram, training loss |
| `plots_data/` | CSV data underlying the plots |

**Note:** The trained model (`pinn_model.pt`) may be stored elsewhere; see `Modeling_ML_Scripts` for training.

---

### 3. PINN_inverse

**Purpose:** Inverse estimation of concentration and temperature from EIS spectra.

#### 3a. pipline_reports

Per-band frequency-selection reports (anchors restricted to a single band):

| Subfolder | Frequency band |
|-----------|----------------|
| `low_f_0-1Hz` | 0–1 Hz |
| `midlow_f_1-100Hz` | 1–100 Hz |
| `mid_f_100-1000Hz` | 100 Hz–1 kHz |
| `high_f_1000-10000Hz` | 1–10 kHz |

**Typical files per band:**
- `test_inverse_metrics_summary_*.csv` — MAE for C and T vs. K (number of anchors)
- `selection_debug_top{K}.json` — Selected frequencies and metadata

#### 3b. operando_piplines

Inverse results for different frequency-selection strategies:

| Subfolder | Description |
|-----------|-------------|
| `whole_spectrtum_100data` | Full spectrum (100 frequencies) — baseline |
| `no_restrictions` | No band restrictions; global ranking |
| `operando_report_mixed` | Mixed bands (e.g., sub-Hz + mid-range) |
| `operando_report_midrange` | Mid-range bands |
| `(0_1Hz)-(1-100Hz)` | Two-band: sub-Hz + 1–100 Hz |
| `(0_1Hz)-(100-1000Hz)` | Sub-Hz + 100 Hz–1 kHz |
| `(0_1Hz)-(1000-10000Hz)` | Sub-Hz + 1–10 kHz |
| `(1-100Hz)-(100-1000Hz)` | 1–100 Hz + 100 Hz–1 kHz |
| `(1-100Hz)-(1000-10000Hz)` | 1–100 Hz + 1–10 kHz |
| `(100-1000Hz)-(1000-10000Hz)` | 100 Hz–1 kHz + 1–10 kHz |
| `low_f_0-1Hz`, `midlow_f_1-100Hz`, etc. | Single-band operando configs |

**Typical files per operando config:**
- `chosen_frequency_set.json` — K, chosen frequencies (Hz), calibration metrics
- `global_frequency_ranking.csv` — Jacobian-based frequency ranking
- `calibration_inverse_top{K}.csv` — Calibration C,T predictions
- `test_inverse_predictions.csv` — Test-set C,T predictions
- `test_inverse_metrics.json` — C_MAE (mM), T_MAE (°C)
- `ct_inverse_runtime.py` — Standalone inverse estimator for deployment
- `ct_inverse_runtime_config.json` — Config (model path, freqs, bounds)
- `_meta.json` — Metadata (n_calib, n_test, n_freqs)

**whole_spectrtum_100data:**
- `metrics_inverse_on_test.json` — Full-spectrum baseline (MAE ≈ 0.10 mM, 1.0 °C)
- `inverse_predictions_on_test.csv` — Per-spectrum C,T predictions

---

### 4. sensitivity_analysis

**Purpose:** Jacobian-based sensitivity of impedance to concentration and temperature.

| File | Description |
|------|-------------|
| `jacobian_band_information.csv` | Per-band JᵀJ: I_CC, I_TT, I_CT, det_I, trace_I, I_CC_over_I_TT |
| `jacobian_band_sensitivity_Zr_Zim.csv` | Squared sensitivities: S_Zr_C, S_Zr_T, S_Zim_C, S_Zim_T per band |
| `frequency_band_shares.csv` | Human-readable: Z′ and −Z″ shares, C vs. T contributions per band |
| `band_sensitivity_decomposed.csv` | Band shares, within-band C/T fractions, global C/T contributions |
| `band_dimensionless_sensitivity.csv` | Dimensionless sensitivities (tilde_S) normalized by ΔC, ΔT, Z_rms |

**Frequency bands:**
- 0–1 Hz (0_1)
- 1–100 Hz (1_100)
- 100–1000 Hz (100_1k)
- 1000–10000 Hz (1k_10k)

---

### 5. ML_regression

**Purpose:** Classical machine learning regression outputs (Ridge, SVR, GPR, MLP, etc.).

**Structure:**
- `models/` — Saved models (e.g., `.joblib`)
- `plots/` — Parity, residual, histogram, learning curves
- `plots_data/` — CSV data for plots
- `model_report.csv` — Per-model metrics
- `best_model.joblib` — Best-performing model
- `compiled_dataset.csv` — Input dataset
- `test_predictions_*.csv` — Predictions per model

---

### 6. model_scores

**Purpose:** Comparative metrics across all models (CENN vs. classical ML).

| File | Description |
|------|-------------|
| `metrics_scores.csv` | R², MAE, RMSE for Z′ and −Z″; model names; best hyperparameters |
| `metrics_scores_2.csv` | Alternative or extended metrics |

**Reported in paper:** CENN achieves RMSE ≈ 0.06 Ω; best classical ML (MLP) ≈ 0.22 Ω.

---

## File Formats & Column Conventions

### EIS CSV (raw / single_frequency_summary)

| Column | Description |
|--------|-------------|
| `Frequency (Hz)` or `frequency_Hz` | Excitation frequency (Hz) |
| `Z' (Ω)` or `Z_real` | Real part of impedance (Ω) |
| `-Z'' (Ω)` or `Z_imag_neg` | Negative imaginary part (Ω), Nyquist convention |
| `|Z| (Ω)` | Magnitude (optional) |
| `-Phase (deg)` | Negative phase (optional) |

### params_TL.csv

Fitted parameters for Z(ω) = Rₛ + Zarc + Z_TL:

| Parameter | Unit | Description |
|-----------|------|-------------|
| Rs | Ω | Solution resistance |
| Rp | Ω | Polarization resistance |
| Y0_ZARC / Y0 | Ω⁻¹ sⁿ⁰ | CPE magnitude |
| n0 | — | CPE exponent (0–1) |
| r_line / r | Ω/m | TL series resistance per length |
| y0_line / y0 | Ω⁻¹ sⁿ¹/m | TL shunt admittance per length |
| n1 | — | TL CPE exponent |
| L | m | Effective pore length |
| RMSE (Ω) | Ω | Fit residual |

### Inverse predictions CSV

| Column | Description |
|--------|-------------|
| `spectrum_id` or `id` | Identifier (e.g., C5_T34) |
| `C_true_mM` / `C_true` | True concentration (mM) |
| `T_true_C` / `T_true` | True temperature (°C) |
| `C_pred_mM` / `C_pred` | Predicted concentration (mM) |
| `T_pred_C` / `T_pred` | Predicted temperature (°C) |
| `C_err_mM`, `T_err_C` | Prediction errors |
| `n_freq` | Number of frequencies used |
| `loss` | Inverse optimization loss |
| `se_C`, `se_T` | Approximate standard errors (if computed) |

---

## Data Flow & Pipeline

```
Raw EIS (Raw_Experimental_Data)
    │
    ├─► Physics-aware batch fitting (ZARC+TL)
    │       → params_TL.csv, raw_vs_fit_TL.csv
    │
    ├─► Synthetic spectra (Fitted_Data_ZARC_TL)
    │       → singlefreq_*mM_*C_500pts_*.csv
    │
    └─► Compiled dataset (compiled_dataset.csv)
            │
            ├─► CENN forward training
            │       → PINN_forward/ (metrics, predictions, θ heatmaps)
            │
            ├─► CENN inverse (frequency selection + optimization)
            │       → PINN_inverse/ (operando configs, chosen freqs)
            │
            ├─► Sensitivity analysis
            │       → sensitivity_analysis/ (Jacobian, band shares)
            │
            └─► Classical ML regression
                    → ML_regression/, model_scores/
```

---

## Key Results Summary

| Metric | Value |
|--------|-------|
| **CENN forward RMSE** | ~0.06 Ω (Z′, −Z″) |
| **Full-spectrum inverse** | C MAE ≈ 0.10 mM, T MAE ≈ 1.0 °C |
| **3-anchor inverse (optimized)** | C MAE ≈ 0.12 mM, T MAE ≈ 1.1 °C |
| **Acquisition time (3 anchors)** | ~6 s (vs. ~3 min full spectrum) |
| **Number of EIS spectra** | 260 |
| **Concentration range** | 1–20 mM |
| **Temperature range** | 26–50 °C |

---

## Citation

If you use this data, please cite:

> Ostovar, H., Bossert, M., Sharafian, Z., Korup, O., & Horn, R. Impedance-Based Estimation of Process Parameters in Electrolytic Systems via Circuit-Embedded Neural Networks (CENN).

---

## Related Resources

- **Modeling_ML_Scripts/** — Jupyter notebooks and Python scripts for data extraction, fitting, CENN training, inverse estimation, and sensitivity analysis.
- **Paper** — Full methodology and interpretation in the accompanying manuscript.
