Data
We offer open access to several large-scale surveys conducted by EGC researchers. Several external datasets are also available for use exclusively by EGC affiliates, visitors, and students at Yale.
Open-Access Datasets
ISSER-Northwestern-Yale Long Term Ghana Socioeconomic Panel Survey (GSPS)
The ISSER-Northwestern-Yale Long Term Ghana Socioeconomic Panel Survey (GSPS) is a collaboration between the Economic Growth Center (EGC) at Yale University, the Global Poverty Research Lab at Northwestern University and the Institute of Statistical, Social, and Economic Research (ISSER) at the University of Ghana, Legon. The survey is principally funded and designed by the EGC, Northwestern and ISSER, and carried out and supervised by ISSER.
The main objective is to provide a scientific framework for a wide range of potential studies of the medium-and long-term changes that are taking place during the process of development. The survey is meant to remedy a major constraint on the understanding of development in low-income countries – the absence of detailed, multi-level and long-term scientific data that follows households over time and describes both the natural and built environment in which individuals reside. Most data collection efforts are short-term – carried out at one point in time; and limited in scope – collecting information on only a few aspects of the lives of the persons in the study; and when there are multiple rounds of data collection, households who leave the study area are dropped. This means that the most mobile people are not included in existing surveys and studies, perhaps substantially biasing inferences about who benefits from and who bears the cost of the development process. The goal of this project is to follow all households, or a random subset, over time using a comprehensive set of survey instruments to shed new light on long-run processes of economic development. Our strategy is to permit the investigation of unexpected connections between the multiple transformations that occur during the process of economic development.
To do so, we have implemented a large-scale, nation-wide panel survey in Ghana starting in 2009, which is currently on its fourth wave. The first three survey waves have been completed–conducted in 2009/2010, 2013/2014, and 2017/2018 and the datasets for these three waves are uploaded here on the Dataserve project page linked below to serve as a resource for the research community. In 2019, the Ghana Socioeconomic Panel Survey was also administered to a sample of participants in Ghana’s rural north who had been classified as “extremely poor” through community-level focus groups. As of 2022, the fourth wave of the GSPS is being implemented. For further details, please refer to the documentation uploaded in the Dataverse Dataset linked below. (2022-11-03)
Indian female migrants face greater barriers to post−Covid recovery than males: evidence from a panel study
On March 24, 2020, the Indian Government announced a nationwide lockdown to curb the spread of Covid-19, effective with a few hours of notice. For an estimated 40 million migrant workers in the country, this resulted in loss of income, food shortages, and uncertainty about the future. Over 10 million returned to rural homes in one of the largest internal migrations in the country's history. Once returned, they faced stays in government-run quarantine centers, stigma, and uncertain labor prospects. Over the next year, migrants navigated shifting mobility restrictions aimed at mitigating the spread of the pandemic, widespread outbreaks, and patchwork of social protection schemes in order to make ends meet. These data were collected across four rounds of phone surveys with a random sample of 8,265 migrants that had returned from worksites to Bihar and Chhattisgarh shortly after nationwide lockdowns in order to understand the long-term labor and well-being effects of the pandemic on this population. The study sample frame was constructed drawing from government records that attempted to catalogue all entrants in a given time period. These phone surveys included a repeated set of questions on employment and earnings, migration, access to social protections, and coping strategies, as well as single-wave modules on quarantine experiences, health behaviors and beliefs, household composition, migration networks, and discrimination. These data were collected with support from J-PAL's Jobs and Opportunity Initiative; the Institute of Labour Economics (IZA)/UK Aid (FCDO) Gender, Growth and Labour Markets in Low Income Countries Programme; and the Evidence-based Measures of Empowerment for Research on Gender Equality (EMERGE) program at University of California San Diego. (2024-04-25)
Increasing Women’s Engagement with Mobile Technology
This dataset comes from the final follow-up survey that was conducted as part of a study that aims to understand the sources of India’s digital gender gap and identify potential solutions. Please note that this dataset can be used to conduct descriptive analysis but does not contain treatment indicators. (2023-12-24)
Calling for Health: Can Mobile Phones Improve Awareness and Takeup of Maternity Benefits?
This data was collected with support from J-PAL's Cash Transfers for Child Health (CaTCH) initiative with the aim to understand if mobile phones can improve women's awareness and take-up of maternity benefits. The data collected also is part of a larger study focused on understanding constraints to women's mobile phone use and how to close India’s digital gender gap. Under the CaTCH research, women were called and provided information about how to access public maternal health-focused conditional cash transfers (CCTs); phone and in-person surveys were used to understand knowledge changes. This dataset includes three waves of phone survey and a final follow-up survey conducted in-person. Please note that this dataset can be used to conduct descriptive analysis but does not contain treatment indicators. (2023-12-24)
Gujarat Pollution Audit Intervention
In many regulated markets, private, third-party auditors are chosen and paid by the firms that they audit, potentially creating a conflict of interest. In collaboration with the Gujarat Pollution Control Board (GPCB), researchers affiliated with the EGC (Rohini Pande and Nicholas Ryan) and J-PAL (Esther Duflo and Michael Greenstone) conducted a two-year field experiment in the Indian state of Gujarat that sought to curb such a conflict by altering the market structure for environmental audits of industrial plants to incentivize accurate reporting. The researchers designed and evaluated a modified audit system that sought to improve the accuracy of auditor reporting on pollution. The sample consisted of the population of audit-eligible plants in the two largest cities of Gujarat, India.
The researchers obtained from GPCB a list of all red-category (i.e., high pollution potential) small- or medium-scale plants. Just before the 2009 audit season, the researchers selected a provisional sample of audit-eligible plants and then randomly assigned half of the plants within this provisional sample, stratified by region, to the audit treatment group. Treatment plants were formally notified of the changes in the audit regulation that would apply to them by a letter from GPCB. Relative to the status quo, the treatment altered three components of the audit system during year 1: an auditor was randomly assigned to the plant, paid from a central pool at a fixed rate, and its reports were backchecked for accuracy. In year 2 only, direct incentive pay for auditor accuracy was added.
The researchers collected data from several sources to evaluate the intervention, especially on the basis of the accuracy of auditor reporting and the pollution response of plants. Two data sources are used to measure accuracy. First, audit reports were filed with GPCB in 2009 and 2010. These reports cover a mandated set of water pollutants and air pollutants. The second source of data for auditor accuracy is the backchecks, which were conducted in a sample of treatment plants throughout 2009 and 2010. The third source of data, on actual plant pollution emissions, is an endline survey conducted from April through July 2011, approximately six months after the last audit visits in the treatment group. The fourth source of data comes from the GPCB administrative records. These data cover GPCB’s plant inspections for plants in the audit sample between 2008 and 2011. These datasets are available here.
Gujarat Environmental Inspection Trial
High pollution persists in many developing countries despite strict environmental rules. To study how plant emission standards are enforced, the researchers, again in collaboration with GPCB, experimentally doubled the rate of inspection for treatment plants and required that the extra inspections be assigned randomly. The goal of the experiment was to estimate the impact of moving from the status quo, infrequent inspections allocated with discretion, to regular inspections of all plants at prescribed inspection rates. Such a reform would bring the GPCB into compliance with its own prescribed inspection rates and the Central Pollution Control Board’s (CPCB) inspection rules.
To this end, between August 2009 and May 2011 EGC affiliates Rohini Pande and Nicholas Ryan and co-authors worked with GPCB to increase inspection frequency for a random subset of highly polluting plants. By CPCB rules, these plants are supposed to be inspected either once per year if they are small scale or once in 3 months if they are medium scale. From this population, the sample of 960 plants was drawn in two batches. The researchers selected all 473 audit-eligible plants in Ahmedabad and Surat and then randomly selected 488 plants from the remaining audit-ineligible population. Inspection treatment assignment was randomized within region by audit-treatment status strata. The treatment was thus cross-randomized and implemented concurrently with the pollution audit reform treatment.
The plants assigned to the inspection treatment were assigned at least one annual initial (routine) inspection and up to four inspections per year. In the first quarter, the plant was assigned one initial inspection, after which it was randomly assigned on a quarterly basis to be inspected again with probability 0.66. After four quarters, this cycle started over. Regional GPCB teams consisting of an environmental engineer and scientist conducted treatment inspections. Each morning in each region, the designated inspection team was randomly assigned a list of plants from the treatment group at which to conduct initial “routine” inspections that day. This mimicked GPCB’s practice of assigning teams to plants, except that the plant assignment was random, rather than being based on an official’s discretion.
For program evaluation, the researchers collected data from two sources: an end-line plant survey and GPCB administrative records. The end-line survey was conducted between April and July 2011 by independent agencies. The survey collected pollution readings, expenditures for abatement equipment investment and maintenance, and data on other aspects of plant operations. The second source of data comprises GPCB documents on its interactions with plants; these documents were categorized by (a) whether they record an action of the regulator or a plant and (b) the type of action they record. These datasets are available here.
Kolkata Flexible Credit Intervention
Village Financial Services (VFS) is a microfinance institution operating in peri-urban neighborhoods of Kolkata, India. Most of the loans VFS offers resemble traditional micro-credit contracts, made to groups of women and repaid weekly. Access to credit or savings, both formal and informal, is limited in these neighborhoods, and VFS faces almost no competition from other lenders. VFS works exclusively with women, most of whom have a household income of less than two dollars a day. There is a high rate of business ownership, and selling and tailoring saris are common occupations.
EGC Director Rohini Pande and co-authors examined variations in microfinance contract design in partnership with VFS. They compared weekly and monthly repayments in one evaluation, tested a two-month grace period before initiating repayment in another, and expanded the repayment frequency experiment to evaluate the effect on financial stress in a third.
Monthly repayments: Researchers examined how repayment frequency affected default and late payment rates. VFS offered loans of Rs. 4000 (about 100 USD) with a fixed Rs. 400 interest payment to 1026 first-time borrowers in 100 groups. These were randomly assigned to one of three different repayment schedules: (1) standard weekly repayment, i.e., 30 groups repaid Rs. 100 every week for 44 weeks; (2) monthly repayment, i.e., 38 groups repaid Rs. 400 every month for 11 months; and (3) monthly repayment with weekly meetings, i.e., 32 groups repaid monthly, but met with a loan officer every week for the first three months.
Two-month grace period: Researchers examined how delaying the first payment until two months after disbursing the loan affected investment in businesses and loan repayment. Eight hundred and forty-five clients in 169 loan groups received loans ranging from Rs. 4000 (about 90 USD) to Rs. 10,000 (about 225 USD). The groups paid the same amount in interest but were assigned to two different repayment schedules: (1) standard schedule such that 85 groups began repayment two weeks after receiving the loan; and (2) grace period such that 84 groups began repayment two months after receiving the loan.
Monthly repayments with a focus on financial stress: Researchers replicated the repayment frequency experiment and included additional questions on levels of financial stress. Seven hundred and forty clients in 148 groups were assigned to weekly or monthly repayment frequencies. A subgroup of 213 clients was surveyed by cell phone every 48 hours for seven weeks, and they were asked questions about their confidence in their ability to repay the loan, their anxiety about loan repayment, arguments with their spouse about finances, and the amount of time they spent thinking about loan repayment.
Data related to this intervention can be found here.
Rural Banks Can Reduce Poverty
Across the world, the rural poor struggle to access formal financial services - like bank accounts, loans, and insurance - that wealthier, more urban populations can access. Can innovative financial service delivery models reduce poverty in rural areas? The team at Inclusion Economics conducted the first large-scale experiment to understand how improving formal financial services affects economic development measures including poverty, entrepreneurship, and agricultural investments.
EGC Director and Inclusion Economics co-Director Rohini Pande and co-authors Giorgia Barboni, and Erica Field collected and harmonized household-level data from government labor force surveys for their working paper “Rural Banks Can Reduce Poverty: Experimental Evidence from 870 Indian Villages.” Now the dataset is open for use by other researchers.
You can access daily Consumption data and household data via the links below.
Cross-country Rotating Panel Labor Force Surveys to Examine Labor Market Flows
EGC affiliate Kevin Donovan and coauthors Will Lu and Todd Schoellman collected and harmonized individual-level data from government labor force surveys for their recent working paper “Labor Market Dynamics and Development.” The full set includes data on 75 million people in 45 countries over two consecutive quarters.
Now the team has made the harmonization code openly available. The data allow researchers to observe how individuals move between employment states – self-employment, wage work, unemployment, inactivity – and how those flows vary by gender, education level, occupation, and other variables.
This is an ongoing project and the dataset will be updated annually as governments release further data. Current collections cover the Covid-19 crisis and will be added in the near future. Eventually the dataset may shine light on the effects of Covid-19 on labor market flows, allowing researchers to observe how labor market dynamics vary and how the economic recovery progresses across countries.