Cross-Country Analysis with Feature¶
The Feature class answers "what do we know about a
given table across the developing world?" It assembles a single harmonized
DataFrame for a table across all countries that provide it.
Basic Usage¶
import lsms_library as ll
roster = ll.Feature('household_roster')
# Which countries provide this table?
roster.countries
# ['Burkina_Faso', 'China', 'Ethiopia', 'GhanaLSS', 'India', 'Mali', ...]
# What are the guaranteed columns?
roster.columns
# ['Sex', 'Age', 'Generation', 'Distance', 'Affinity']
Loading Data¶
Feature is callable. Invoke it to load and concatenate data:
# Load all available countries
df = roster()
# Load specific countries
df = roster(['Mali', 'Uganda'])
The returned DataFrame has a country index level prepended:
Sex Age Generation Distance Affinity
country t i pid
Mali 2014-15 1003 1 M 45 0 0 consanguineal
2 F 38 0 0 affinal
Uganda 2019-20 4001 1 M 52 0 0 consanguineal
Design Decisions¶
Lazy Loading¶
Construction is cheap -- Feature('household_roster') discovers which
countries declare the table but loads nothing from disk. Data is fetched on
demand when you call the instance.
Union of Columns¶
The returned DataFrame takes the union of all columns across countries, not
the intersection. Required columns from data_info.yml guarantee a common
core; country-specific extras appear as NA where absent.
This makes gaps visible (e.g. District present for Ethiopia but NA for
Mali) rather than silently dropping information.
Harmonization Flows Through¶
All per-country harmonization (kinship decomposition, canonical spellings,
categorical mappings, dtype coercion) is applied by each Country before
concatenation. Feature delegates to each country's existing table method.
Error Handling¶
If a country's data fails to load, Feature emits a warning and continues
with the countries that succeeded:
import warnings
with warnings.catch_warnings(record=True) as w:
warnings.simplefilter("always")
df = roster()
# Inspect w for any per-country failures
Available Tables¶
Any table declared in a country's data_scheme.yml can be used with Feature.
Common ones include:
| Table | Description |
|---|---|
household_roster |
Demographics, kinship decomposition |
cluster_features |
Region, rural/urban classification |
food_acquired |
Food acquisition with units |
food_expenditures |
Derived food spending |
shocks |
Household shocks and coping |
individual_education |
Educational attainment |
panel_ids |
Cross-wave household ID linkage |