스킬 파일

USGS National Produced Waters Geochemical Database

Name: USGS National Produced Waters Geochemical Database
Author: jpfielding

Query and analyze the USGS National Produced Waters Geochemical Database v3.0 for produced water and oilfield brine chemistry from U.S. oil and gas wells. Use this skill whenever the user asks about produced water composition, brine geochemistry, lithium in oilfield brines, magnesium in formation waters, TDS or salinity of produced water, mineral recovery potential from co-produced fluids, or critical mineral concentrations in subsurface brines. Trigger phrases include: produced water chemistry, brine lithium concentration, Marcellus water quality, Smackover lithium, formation water analysis, Li in produced water, Mg in brine, what formations have high lithium, DLE feedstock, produced water TDS by basin, Appalachian brine chemistry. Covers 115,000+ samples with 150+ analytes across all major U.S. basins. Outputs summary statistics tables and narrative analysis with formation-level context.

jpfielding0 스타2026. 4. 14.

직업
카테고리: 교육

스킬 내용

Fetches and analyzes geochemical data from the USGS National Produced Waters Geochemical Database v3.0 — the most comprehensive public dataset of produced water and oilfield brine chemistry in the United States.

Credential

None required. All data is publicly accessible through USGS ScienceBase.

# No API key needed — data is open access
# Template for consistency with other skills:
# KEY=$(grep '^api_key=' ~/.config/usgs/credentials 2>/dev/null | cut -d= -f2)

Data Source

Field	Value
Dataset	USGS National Produced Waters Geochemical Database (ver. 3.0, December 2023)
DOI	https://doi.org/10.5066/P9DSRCZJ

USGS National Produced Waters Geochemical Database

jpfielding0 스타2026. 4. 14.

직업
카테고리: 교육

스킬 내용

Credential

None required. All data is publicly accessible through USGS ScienceBase.

# No API key needed — data is open access
# Template for consistency with other skills:
# KEY=$(grep '^api_key=' ~/.config/usgs/credentials 2>/dev/null | cut -d= -f2)

Data Source

Field	Value
Dataset	USGS National Produced Waters Geochemical Database (ver. 3.0, December 2023)
DOI	https://doi.org/10.5066/P9DSRCZJ

관련 스킬

# List files for v3.0 (Excel + shapefile)
curl -s "https://www.sciencebase.gov/catalog/item/64fa1e71d34ed30c2054ea11?format=json&fields=files,title" \
  | jq '.files[] | {name, size, url: .downloadUri}'

File	Size	Format
`USGS_NPWGDv3_excel.xlsx`	80 MB	Excel workbook (main data)
`USGS_NPWGDv3_data_dictionary.xlsx`	30 KB	Column definitions
`USGS_NPWGDv3_shape.zip`	24 MB	Shapefile (points with attributes)
`version_history.txt`	6 KB	Version change log

# List files for v2.3 (CSV available)
curl -s "https://www.sciencebase.gov/catalog/item/59d25d63e4b05fe04cc235f9?format=json&fields=files" \
  | jq '.files[] | {name, size, url: .downloadUri}'

File	Size	Format
`USGSPWDBv2.3c.csv`	66 MB	CSV — compiled database (preferred)
`USGSPWDBv2.3n.csv`	63 MB	CSV — numeric-only version
`USGSPWDBv2.3c.xlsx`	29 MB	Excel — compiled database
`USGSPWDBv2.3 Data Dictionary.csv`	7 KB	Column definitions
`USGSPWDBv2.3 Data Sources.csv`	2 KB	Source dataset list

# Download v2.3 compiled CSV (~66 MB)
curl -L -o /tmp/USGSPWDBv2.3c.csv \
  "https://www.sciencebase.gov/catalog/file/get/59d25d63e4b05fe04cc235f9?f=__disk__11%2F1e%2F8e%2F111e8e3952b1637289254fd398af95a81f937389"

# Download v3.0 Excel workbook (~80 MB)
curl -L -o /tmp/USGS_NPWGDv3_excel.xlsx \
  "https://www.sciencebase.gov/catalog/file/get/64fa1e71d34ed30c2054ea11?f=__disk__e7%2Fef%2F17%2Fe7ef17fcb71c49e2241da4139ed775f8e328bdab"

# Search for other produced water datasets
curl -s "https://www.sciencebase.gov/catalog/items?q=produced+water+geochemical&format=json&max=10" \
  | jq '.items[] | {id, title}'

Formation	Basin	States	Li (mg/L)	Mg (mg/L)	Economic Potential
Smackover	Gulf Coast	AR, TX, LA	50-477	1,000-5,000+	Highest U.S. brine Li; active DLE projects
Marcellus	Appalachian	WV, PA, OH	10-200+	500-3,000	Large water volumes; Li correlates with TDS
Utica / Pt. Pleasant	Appalachian	OH, WV, PA	20-150	500-2,500	Co-produced with Marcellus
Bakken	Williston	ND, MT	10-70	500-4,000	High volumes; moderate Li
Wolfcamp	Permian	TX, NM	5-50	300-3,000	Massive volumes; lower Li

User asks about...	Filter by...
A specific formation	`FORMSIMPLE` or `FORMATION`
A state or region	`STATE`, `COUNTY`, or `BASIN`
Lithium potential	`Li` column with minimum threshold
A type of play	`PLAYTYPE` (Shale, Sedimentary, Coal, etc.)
Specific wells	`API` number or `WELLNAME`
High-TDS brines	`TDS` with minimum threshold
Specific analyte	Any chemistry column (see `references/schema.md`)

Check if a local copy exists (cached from prior use):

ls -la /tmp/USGSPWDBv2.3c.csv 2>/dev/null || ls -la /tmp/USGS_NPWGDv3_excel.xlsx 2>/dev/null

If not cached, download from ScienceBase:

# Prefer v3.0 — has FORMSIMPLE, PLAYTYPE, 23 new Li-brine datasets, standardized units
# v3.0 is Excel-only; use openpyxl or convert to CSV after download
curl -L -o /tmp/USGS_NPWGDv3_excel.xlsx \
  "https://www.sciencebase.gov/catalog/file/get/64fa1e71d34ed30c2054ea11?f=__disk__e7%2Fef%2F17%2Fe7ef17fcb71c49e2241da4139ed775f8e328bdab"
# Fallback: v2.3 CSV if openpyxl unavailable
# curl -L -o /tmp/USGSPWDBv2.3c.csv \
#   "https://www.sciencebase.gov/catalog/file/get/59d25d63e4b05fe04cc235f9?f=__disk__11%2F1e%2F8e%2F111e8e3952b1637289254fd398af95a81f937389"

Warn the user about file size (~66 MB for v2.3 CSV, ~80 MB for v3.0 Excel).

import csv, statistics

# Load CSV
with open("/tmp/USGSPWDBv2.3c.csv") as f:
    reader = csv.DictReader(f)
    rows = [r for r in reader if "marcellus" in r.get("FORMATION", "").lower()]

# Extract Li values
li_vals = [float(r["Li"]) for r in rows if r.get("Li") and r["Li"] not in ("", "-9999")]

# Statistics
print(f"Marcellus Li: n={len(li_vals)}, "
      f"median={statistics.median(li_vals):.1f}, "
      f"mean={statistics.mean(li_vals):.1f}, "
      f"max={max(li_vals):.1f} mg/L")

# Count samples by state (bash + awk on CSV)
awk -F',' 'NR==1{for(i=1;i<=NF;i++) if($i=="STATE") col=i}
           NR>1{count[$col]++}
           END{for(s in count) print count[s], s}' \
  /tmp/USGSPWDBv2.3c.csv | sort -rn | head -20

# Extract high-Li samples (Python one-liner)
python3 -c "
import csv
with open('/tmp/USGSPWDBv2.3c.csv') as f:
    r = csv.DictReader(f)
    hits = [row for row in r if row.get('Li') and float(row['Li'] or 0) > 100]
print(f'{len(hits)} samples with Li > 100 mg/L')
for h in sorted(hits, key=lambda x: float(x.get('Li',0)), reverse=True)[:10]:
    print(f\"  Li={h['Li']} mg/L  State={h.get('STATE','')}  Fm={h.get('FORMATION','')}\")
"

## Marcellus Shale — Produced Water Lithium Analysis

| Statistic | Li (mg/L) | Mg (mg/L) | TDS (mg/L) |
|-----------|-----------|-----------|------------|
| N         | 847       | 1,023     | 1,156      |
| Min       | 0.1       | 12        | 1,200      |
| P25       | 28        | 580       | 85,000     |
| Median    | 62        | 1,250     | 145,000    |
| Mean      | 75        | 1,380     | 155,000    |
| P75       | 105       | 2,100     | 210,000    |
| Max       | 235       | 3,200     | 345,000    |

**Summary:** The Marcellus Shale produced water database contains 1,156
samples across WV, PA, OH, and NY. Lithium concentrations (n=847) have a
median of 62 mg/L with 25% of samples exceeding 105 mg/L — above or near
the DLE economic threshold of 100-150 mg/L. The highest Li value (235 mg/L)
was reported from a well in WV. Mg averages 1,380 mg/L, making co-recovery
viable in high-TDS wells. Note: Li data coverage is incomplete (73% of
samples have Li measured). Sampling is biased toward PA and WV where
Marcellus production is highest.

## West Virginia — Produced Water Chemistry by Formation

| Formation | N | Li Median (mg/L) | Li Max (mg/L) | TDS Median (mg/L) |
|-----------|---|-------------------|----------------|---------------------|
| Marcellus | 312 | 78 | 235 | 165,000 |
| Utica | 45 | 92 | 148 | 195,000 |
| Oriskany | 28 | 35 | 88 | 120,000 |
| Clinton | 19 | 22 | 41 | 95,000 |
| Devonian Shale | 67 | 18 | 52 | 68,000 |

**Summary:** West Virginia produced waters span multiple Appalachian Basin
formations. The Marcellus and Utica formations show the highest lithium
potential, with median Li of 78 and 92 mg/L respectively. The top Utica
samples approach DLE-economic concentrations (148 mg/L max). TDS correlates
with Li across all formations. Data availability is strongest for the
Marcellus (n=312) reflecting modern shale gas development.

Issue	Cause	Action
ScienceBase item 404 / "not found"	Item ID changed or item secured	Search by title: `q=National+Produced+Waters+Geochemical+Database`
Download timeout	Large file (~66-80 MB)	Increase timeout; suggest download to local file first
CSV parsing errors	Encoding issues, embedded commas	Use `errors="replace"` encoding; check for quoted fields
Column not found	v2.3 vs v3.0 schema difference	v3.0 has FORMSIMPLE, PLAYTYPE; v2.3 uses FORMATION, WELLTYPE
Empty results for filter	Misspelled formation or state	Use FORMSIMPLE values from `references/formations.md`; STATE uses full names not abbreviations
-9999 values	Missing data sentinel	Exclude -9999 from all numeric calculations
Li/Mg column all blank	Analyte not measured for those samples	Report coverage (N with data / N total); note sparse coverage for trace elements
Excel file (v3.0) cannot be parsed	No openpyxl installed	Fall back to v2.3 CSV; or `pip install openpyxl`

Sampling bias. The database is a compilation from published sources, not a statistically designed survey. Some formations and states are heavily overrepresented (Appalachian Basin, Texas) while others have few samples. Sample density does not reflect production volumes.
Temporal coverage. Samples span from the 1930s to 2020s. Older samples may use different analytical methods with lower precision. Comparison across eras requires caution.
Incomplete analyte coverage. Not all samples have all analytes measured. Lithium coverage is notably sparse in older datasets. Always report the N (number of non-null values) alongside statistics.
Approximate locations. Many samples have approximate coordinates (county centroid, field centroid, or state centroid). Check the COORDAPX column before using lat/lon for spatial analysis.
Units. All chemistry in v3.0 is mg/L (v3.0 converted all ppm using specific gravity). In v2.3, the UNITSORIG column indicates whether the original was mg/L or ppm.
TDS calculation. The TDS column is a "best available" value using measured TDS first, then reported-calculated TDS, then ion-sum-calculated TDS. Check TDSDESC for the method.
Formation name inconsistency. Use FORMSIMPLE (v3.0) for reliable formation-level grouping. The FORMATION column has many spelling variants and qualifiers that make grouping unreliable without normalization.
v2.3 vs v3.0 differences. v3.0 added 23 new datasets (especially high-Li brines), fixed location errors, and standardized formation names. For lithium research, v3.0 is strongly preferred. However, v3.0 is only available as Excel (not CSV), requiring openpyxl or similar to parse programmatically.
Self-reported industry data. Some source datasets are from industry reports. Quality control varies by source. The CHARGEBAL column (charge balance) can be used as a data quality indicator — values far from 0% suggest analytical errors.
Not a reserves estimate. Concentration data alone does not determine economic viability. Recovery depends on water volume, flow rates, brine chemistry complexity, infrastructure, and DLE technology efficiency.

USGS National Produced Waters Geochemical Database

Credential

Data Source

USGS National Produced Waters Geochemical Database

Credential

Data Source

API Structure

ScienceBase File Discovery

Download the CSV (v2.3)

Download the Excel file (v3.0)

Key Columns

Identification and Location

Geology

Chemistry (all in mg/L)

Physical Properties

Key Formations

Workflow

Step 1 — Resolve Intent

Step 2 — Discover and Download Data

Step 3 — Filter and Analyze

Step 4 — Produce Output

Output Format

Example 1: Formation Query

Example 2: State Query

Error Handling

Caveats

Implementation Notes

Update Skills

Eval Harness

Ecc Tools Cost Audit

Code Tour

Rules Distill

Design System

USGS National Produced Waters Geochemical Database

Credential

Data Source

USGS National Produced Waters Geochemical Database

Credential

Data Source

API Structure

ScienceBase File Discovery

Download the CSV (v2.3)

Download the Excel file (v3.0)

Search ScienceBase for Related Datasets

Key Columns

Identification and Location

Geology

Chemistry (all in mg/L)

Physical Properties

Key Formations

Workflow

Step 1 — Resolve Intent

Step 2 — Discover and Download Data

Step 3 — Filter and Analyze

Step 4 — Produce Output

Output Format

Example 1: Formation Query

Example 2: State Query

Error Handling

Caveats

Implementation Notes

Update Skills

Eval Harness

Ecc Tools Cost Audit

Code Tour

Rules Distill

Design System