Country¶
Country ¶
Country(country_name: str, preload_panel_ids: bool = False, verbose: bool = False, assume_cache_fresh: bool = False, trust_cache: bool = False)
Primary interface to a single country's LSMS survey data.
Provides access to all survey waves, standardized tables, and panel data.
Tables listed in data_scheme are available as callable attributes
(e.g. country.food_expenditures()).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
country_name
|
str
|
Directory name under |
required |
preload_panel_ids
|
bool
|
If True, compute panel ID mappings eagerly at construction time. Default is False (lazy). |
False
|
verbose
|
bool
|
Enable verbose logging. |
False
|
assume_cache_fresh
|
bool
|
If True, read existing cached Parquet files directly, bypassing DVC
and the normal build pipeline. Use this when you know the cache is
up-to-date and want to skip all existence / staleness checks.
|
False
|
trust_cache
|
bool
|
Deprecated alias for |
False
|
Examples:
>>> import lsms_library as ll
>>> uga = ll.Country('Uganda')
>>> uga.waves
['2005-06', '2009-10', ...]
>>> uga.data_scheme
['food_acquired', 'household_roster', ...]
>>> food = uga.food_expenditures()
data_scheme
property
¶
List of data objects available for country.
Includes derived tables (e.g. food_expenditures, household_characteristics) when their source table is present, even if they are not explicitly registered in data_scheme.yml.
categorical_mapping
property
¶
Get the categorical mapping for the country. Searches current directory, then parent directory. Also merges global .org files from lsms_library/categorical_mapping/ (GH #168). Global tables are loaded first; per-country tables override on name collision.
panel_ids
property
¶
Raw panel-ID tables keyed by wave. Computed lazily on first access.
Gated on _panel_ids_attempted rather than the cache value so a
legitimate negative result (country without panel design) is
cached as None without re-running _compute_panel_ids.
updated_ids
property
¶
Mapping {old_id: new_id} per wave for ID harmonization. Computed lazily.
cached_datasets ¶
List dataset names currently cached for this country.
Discovers caches at three locations under data_root()
(DVC blob cache L1 lives under dvc-cache/ and is not enumerated
by this method):
- L2-country:
data_root(country)/var/*.parquet. - Country-level companion:
data_root(country)/_/*.{parquet,json}. - L2-wave:
data_root(country)/{wave}/_/*.parquetfor every wave subdirectory (excluding the special_/andvar/directories above). This catches script-path tables (Nigeria's PP/PHhousehold_roster, Tanzania's multi-round tables) whose only cache lives at the wave level.
test_all_data_schemes ¶
Test whether all method_names in obj.data_scheme can be successfully built. Falls back to Makefile if not in data_scheme.