Skip to content

Country

Country

Country(country_name: str, preload_panel_ids: bool = False, verbose: bool = False, assume_cache_fresh: bool = False, trust_cache: bool = False)

Primary interface to a single country's LSMS survey data.

Provides access to all survey waves, standardized tables, and panel data. Tables listed in data_scheme are available as callable attributes (e.g. country.food_expenditures()).

Parameters:

Name Type Description Default
country_name str

Directory name under lsms_library/countries/ (e.g. 'Uganda', 'Tanzania').

required
preload_panel_ids bool

If True, compute panel ID mappings eagerly at construction time. Default is False (lazy).

False
verbose bool

Enable verbose logging.

False
assume_cache_fresh bool

If True, read existing cached Parquet files directly, bypassing DVC and the normal build pipeline. Use this when you know the cache is up-to-date and want to skip all existence / staleness checks. _finalize_result (kinship expansion, canonical spelling, dtype coercion, _join_v_from_sample) still runs on every read — only the cache-lookup / DVC layer is bypassed. Useful on clusters where the parquet cache has been pre-built. Ignores LSMS_NO_CACHE.

False
trust_cache bool

Deprecated alias for assume_cache_fresh. Will be removed in v0.8.0.

False

Examples:

>>> import lsms_library as ll
>>> uga = ll.Country('Uganda')
>>> uga.waves
['2005-06', '2009-10', ...]
>>> uga.data_scheme
['food_acquired', 'household_roster', ...]
>>> food = uga.food_expenditures()

waves property

waves: list[str]

List of names of waves available for country.

data_scheme property

data_scheme: list[str]

List of data objects available for country.

Includes derived tables (e.g. food_expenditures, household_characteristics) when their source table is present, even if they are not explicitly registered in data_scheme.yml.

resources property

resources: dict[str, Any]

formatting_functions property

formatting_functions: dict[str, Callable[..., Any]]

categorical_mapping property

categorical_mapping: dict[str, DataFrame]

Get the categorical mapping for the country. Searches current directory, then parent directory. Also merges global .org files from lsms_library/categorical_mapping/ (GH #168). Global tables are loaded first; per-country tables override on name collision.

mapping property

mapping: dict[str, Any]

panel_ids property

panel_ids: dict[str, Any] | None

Raw panel-ID tables keyed by wave. Computed lazily on first access.

Gated on _panel_ids_attempted rather than the cache value so a legitimate negative result (country without panel design) is cached as None without re-running _compute_panel_ids.

updated_ids property

updated_ids: dict[str, dict[str, str]] | None

Mapping {old_id: new_id} per wave for ID harmonization. Computed lazily.

cached_datasets

cached_datasets() -> list[str]

List dataset names currently cached for this country.

Discovers caches at three locations under data_root() (DVC blob cache L1 lives under dvc-cache/ and is not enumerated by this method):

  • L2-country: data_root(country)/var/*.parquet.
  • Country-level companion: data_root(country)/_/*.{parquet,json}.
  • L2-wave: data_root(country)/{wave}/_/*.parquet for every wave subdirectory (excluding the special _/ and var/ directories above). This catches script-path tables (Nigeria's PP/PH household_roster, Tanzania's multi-round tables) whose only cache lives at the wave level.

test_all_data_schemes

test_all_data_schemes(waves: list[str] | None = None) -> dict[str, str]

Test whether all method_names in obj.data_scheme can be successfully built. Falls back to Makefile if not in data_scheme.