First Stage Units in the 2017 Periodic Labor Force Survey use a simple substitution cipher

by Nils Enevoldsen
February 6, 2023

Author’s note: Thanks to Jyotirmoy Bhattacharya for discovering the concordance.

Given China’s just-announced population decline, India is now the world’s most populous country. Enumerating its residents is surely tricky, especially during a pandemic, but the government has yet to schedule another “decennial” census that was last completed in 2011. The census is one of many official long-running surveys that produce valuable data on Indian households.

India’s National Sample Survey Office has been releasing high quality survey data since 1950, but has  become more tight-lipped of late. The crucial Household Consumer Expenditure Survey (HCES, alias “Schedule 1.0”) of the National Sample Survey (NSS), on which poverty estimates have historically been based, has not released the microdata it collected in 2017-18 as part of the 75th NSS round. The last released round of the HCES is now over a decade old, having been collected in 2011-12 in the 68th round. While it used to be run approximately every five years, the next round of the HCES is not yet even scheduled.

Another survey, the Employment and Unemployment Survey (EUS), is also no longer found within the NSS, but this one has a brighter story. Forked from the NSS after 2012, it lives on, with some changes, as the Periodic Labour Force Survey (PLFS). Unfortunately the PLFS has a serious flaw, but fortunately the flaw has a remedy.

Can the Periodic Labour Force Survey provide a household panel? 

The Periodic Labour Force Survey has, at the time of this writing, released four years of data. PLFS 1 covers mid-2017 to mid-2018, PLFS 2 2018-19, and PLFS 3 2019-20, and PLFS 4 2020-21. PLFS surveys two sectors, rural and urban. New urban and rural panels are added each quarter. 

Rural households are not revisited. Urban households are revisited quarterly for a total of four quarters, whereupon they drop out. Can an urban PLFS panel dataset be generated?

PLFS does not include household identifiers, but according to survey documentation they can be reconstructed as unique combinations of the panel identifier, first stage unit (FSU), sample subgroup/subblock number, second stage stratum number, and sample household number. In turn, PLFS does not include panel identifiers, but they can be reconstructed by using appropriate combinations of sector, quarter1, and visit number as described in the sampling documentation. The problem is that, while this procedure matches households correctly within each PLFS year, there appear to be no FSUs in common between PLFS 1 and PLFS 2. This mismatch suggests that, for some purposes, exploiting the panel nature of the PLFS is impossible.

Bhattacharya2 suggested the PLFS 1 FSUs may simply be a one-to-one renumbering rather than a reorganization. He used a set of time-invariant household-level variables to generate the most plausible FSU concordance out of all possible concordances.

FSUs in PLFS 1 are ciphered

Although he did not realize it, the optimal concordance found by Bhattacharya is a simple substitution cipher. That the unconstrained optimization found a concordance satisfying an extremely tight constraint is proof that it is the correct concordance. 

PLFS 1 0 1 2 3 4 5 6 7 8 9
PLFS 2 6 4 5 0 7 1 8 2 9 3

Specifically, to change a PLFS 1 FSU to a PLFS 2 FSU, perform the digit substitutions in the provided table. For example, 53335 in PLFS 1 corresponds to 10001 in PLFS 2.

The existence of this cipher also means that PLFS 1 rural FSUs can be corrected, even though we cannot perform the household variable matching algorithm on rural households due to the lack of rural household revisits.

FSUs in PLFS 2 are correct

It is clear that PLFS 2 FSUs are the intended FSUs.

  • Only the PLFS 1 FSUs are different, suggesting an error was corrected in subsequent years.
  • The last two digits of the PLFS 2 FSUs [almost] never have breaks in sequence except when the first three digits also increment.
  • Sector 1 (rural) PLFS 2 FSUs begin with “1”, and sector 2 (urban) PLFS 2 FSUs begin with “2”.

1Note that the quarter numbers are off-by-one in the PLFS 1 person revisit file (RVP1). Also note that the PLFS 2 household first-visit file (FVH2) has the wrong internal file name (“FVH1”).

2Jyotirmoy Bhattacharya. Indian urban workers’ labour market transitions, 2021. URL https://arxiv.org/abs/2110.05482.