LSMS Library¶

A Python library providing a uniform interface to Living Standards Measurement Study (LSMS) household surveys from multiple countries and years, without the data loss typical of traditional harmonization approaches.

The Problem¶

LSMS datasets are invaluable for studying poverty, consumption, and household welfare across developing countries. However, each country's survey uses different variable names, food classification systems, questionnaire structures, and file formats.

Researchers typically spend weeks learning each new dataset's idiosyncrasies or use pre-harmonized datasets that sacrifice detail and comparability.

The Solution¶

LSMS Library provides an abstraction layer that gives you a consistent interface to work with any supported LSMS dataset. Instead of harmonizing the data itself (which loses information), we harmonize the way you access the data:

import lsms_library as ll

uga = ll.Country('Uganda')
uga.waves          # ['2005-06', '2009-10', ..., '2019-20']
uga.data_scheme    # ['food_acquired', 'household_roster', ...]

food = uga.food_expenditures()   # Standardized DataFrame, all waves

Cross-Country Analysis¶

The Feature class makes it easy to assemble a single harmonized DataFrame across every country that provides a given table:

roster = ll.Feature('household_roster')
roster.countries   # ['Burkina_Faso', 'Ethiopia', 'Mali', 'Uganda', ...]
df = roster()      # Load all countries into one DataFrame

Key Features¶

Uniform Interface -- consistent names across countries (e.g. food_expenditures(), household_characteristics())
Multi-Wave Panel Support -- household IDs harmonized across waves automatically
Zero Data Loss -- original survey detail preserved
Cross-Country Analysis -- Feature class concatenates harmonized data across countries
DVC Integration -- stream data from remote storage
Three-Tier Cache -- DVC blob cache (L1, populated lazily on first read of any tracked file), per-wave harmonized parquet (L2-wave, written by Wave.grab_data on first call), and per-country aggregated parquet (L2-country, the v0.7.0 fast path that serves warm reads in ~0.5 s). All three tiers live under data_root() and are controlled together by LSMS_DATA_DIR. See the Caching guide.
Extensible -- add new surveys via YAML configuration files

Next Steps¶

Getting Started -- installation and first steps
Country Guide -- single-country workflows
Feature Guide -- cross-country analysis
API Reference -- complete class documentation