NIGERIA YOUWIN! IMPACT EVALUATION PRE-ANALYSIS PLAN
David McKenzie
Version 1: November 14, 2012 (filed at J-PAL Hypothesis Registry)
Version 2: Updated September 23, 2013 (using track changes)
1. Introduction
The Youth Enterprise With Innovation in Nigeria (YouWiN!) programme is a business plan
competition for young entrepreneurs in Nigeria. It is collaboration between the Ministry of Finance,
the Ministry of Communication Technology, and the Ministry of Youth Development with support
from DFID and the World Bank, and has the stated objective of encouraging innovation and job
creation through the creation of new businesses and expansion of existing businesses.
The YouWin impact evaluation is intended to measure the impact of participating in this
program on Nigerian firms. The purpose of this document is to set out in advance how this impact
will be measured and estimated, to serve as a roadmap for analysis once data is collected. A
baseline report was prepared by the author and finalized on May 11, 2012 and provides details of
the selection process, randomization, and baseline data collection, and so will not be repeated here.
Since this plan is being compiled before any follow-up data are collected and analyzed, this pre-
specification acts as a guard against selective reporting and data-mining.
2. Key aspects of the intervention
The program provides a four-day training course on preparation of a business plan to applicants who
make it through a first stage, and then grants to the winning 1200 submissions, with each winner
eligible for an amount up to 5 million Naira (approximately US$32,000) for new businesses and 10
million Naira (approximately US$64,000) for existing businesses, with the amount any winner
getting varying between 1 and 10 million Naira depending on the funding needs identified in their
business plan and the assessment of independent consultants of what actual needs are. Winners
then receive ongoing monitoring, coupled with some mentoring and group training events.
3. Description of Data Sources and Anticipated Sample Sizes
a. Pre-intervention data
There are two sources of pre-intervention data.
i) Application data (November 2011): data from the original application forms are
available for 23,844 individuals, corresponding to 20,230 new business applications
and 3,614 existing business applications. This first stage application form has a
relatively limited number of variables, but includes: age, gender, region, new or
existing business, highest education level, application score, proposed industry, and
for existing businesses: number of years in business, turnover, number of
employees.
ii) Business plan data (January 2012): somewhat more detailed data are available for
the 4,510 individuals who submitted business plans. This includes marital status,
whether they have lived abroad, whether or not they would choose a risky gamble,
some asset ownership indicators, whether their business had had a loan before, and
the business plan score.
b. Administrative program data: Data from the program administration unit will provide
information on which firms received payments under the YouWin program and the dates
when they received these payments.
c. November 2012 Follow-up Survey: A first follow-up survey is taking place in November and
December 2012, and is being conducted by TNSRMS. This survey timing corresponds to 10-
11 months after the business plan training; 7-8 months after the winners were announced
(announcement took place March 20, 2012); and 4-5 months after the first disbursements
were made to winners (first disbursements were made June 28, 2012). The follow-up survey
collects data on personal characteristics, details of business operations for existing
businesses and business plans for new businesses, and details of participation in enterprise
programs like YouWin.
The sample for the follow-up survey consists of 3139 firms from the following three groups:
Experimental sample: 1841 firms (729 treated, 1112 control) which were all semi-
finalists in the business plan competition and selected by random number generator
as to whether they were ordinary winners or not. This will form the basis of our
main estimations.
Non-experimental winner sample: 475 firms that were selected as national merit
winners or zonal merit winners in the business plan competition. They will be used
in supplementary analysis and in descriptive analysis of whether business plan
scores predict business growth.
Regression-discontinuity booster sample: Firms in the North-Central, South-Eastern,
and South-Western regions, that are within 5 points either side of the cutoff. 770 of
these firms (329 existing, 441 new) are already included in the 2316 firms noted
above. This leaves up to 3890 firms that we could potentially add to the survey. We
take all 323 existing firms, and then a random sample of 500 of the new firms. This
gives a total booster RD sample of 823 firms
In practice this survey took place between November 2012 and May 2013, with a response
rate of 79% for the experimental sample. The difficulty and prolonged length of time taken to
do this survey pushed back the next round.
d. April October 2013 Follow-up Survey: An additional survey 6 months after the November
survey is currently planned, with theThis survey will take place on the same samples and
similar survey design as the November first round survey. The timing of the survey depends
in part on whether there are further delays in the disbursement of program payments if
so, we may choose to delay this survey further. Additional rounds of follow-up surveys may
take place depending on budget available.
4. Theory of Change/Model for Interpreting Outcomes
The main objective of the YouWIN! program is to generate jobs by encouraging and supporting
aspiring entrepreneurial youth in Nigeria to develop and execute business ideas that will lead to
job creation. The announcement of the program claimed that it would generate 80,000 to
110,000 new jobs for currently unemployed Nigerian youth over the three years during which
the three cycles will be implemented. This corresponds to 20-30 new jobs per winning firm.
How might participating in the YouWIN program lead to more jobs in the winning firms?
Consider a simple model where a firm’s production Y is a function f(.) of their productivity A,
their capital stock K, the owner’s entrepreneurial skill E, and outside labor L. The firm owner’s
problem is to choose K and L given A and E.

(1)
With complete markets the firm production decision will be separable from the household
consumption decision and firms will choose capital and labor such that their marginal products
are equal to the market interest rate and market wage rate respectively:

(2)

(3)
Case 1: Perfect markets, YouWIN program is just a grant
If firms are not credit-constrained and the program just changes the resources firm owners have
available to them, then there is no change in the first-order conditions (2) and (3), and so no
change in employment or output. The grant will merely make the owner richer, but not change
their production decisions.
Case 2: Perfect markets, YouWIN program is a conditional grant
The YouWIN program does not make a single lump sum grant to firm owners, but instead is
payable in tranches, conditional on the firm owner taking certain actions with the first and
second tranches typically paying for more working capital and investment, and the third and
fourth tranches being triggered by reaching jobs and turnover triggers. This conditionality does
not fundamentally change the equilibrium first-order conditions, but can be viewed as causing a
temporary increase in the returns to capital and labor in the firm- therefore we would predict a
short-term increase in capital and labor, which would then dissipate once all the tranche
payments have been received.
Case 3: YouWIN program is more than just a grant
It is possible that participating in, and especially winning, the YouWIN program may also have
other impacts on the productivity of the firm (A), and the skills of the owner (E). Potential
channels for this include:
(i) Training increasing skills: the 4-day business plan training, and the short “school for
start-ups” and online materials provided may increase the entrepreneurial skills of the
owner. Assuming these are complementary with other inputs, we should have dK
*
/dE>0
and dL
*
/dE>0 so that both capital and labor increase.
(ii) Networks increase productivity or entrepreneurial skills: participating in the program
may cause the firm owners to meet other successful business owners. This could
increase their own productivity and skills if they learn from these owners, or can use
these networks to obtain better business deals.
(iii) Improvements in confidence and attitudes: entrepreneurial self-confidence could
directly impact on productivity, or the program, by declaring the owner a winner, may
spur their self-belief in the business and cause them to work harder. In addition, the
signal provided by winning the competition could cause firm owners who are uncertain
about their entrepreneurial type to update their priors and thus change their output
levels if they underinvest because they are unsure of whether they have the skills to
make it at a larger scale.
(iv) Mentoring increases A and E. The YouWIN program in principle provides some basic
mentoring services, which could increase A and E.
(v) Reputation effects: A could increase if winning the competition increases the
businesses’ reputation, signaling quality to customers and therefore allowing it to gain
more customers.
(vi) Formalization effects: the program requires firms to be formal to receive the award. If
firms are rationally informal, formalizing will have no effect, but if information barriers
or other constraints stop firms from formalizing, formalizing may make the firms more
productive.
(vii) Change in the interaction with government: winning the competition could give the firm
some protection against government officials asking for bribes or otherwise inhibiting
firm productivity, since now the firm is seen as a favored firm which shouldn’t be
touched; or conversely winning the competition may make the firm be targeted by rent-
seeking officials therefore reducing productivity.
(viii) Changes in family demands: winning the competition may cause the firm owner be
targeted for more requests for money and or free goods by extended family members.
This could lower firm productivity, or conversely cause the firm owner to invest more in
the firm if money in the firm is less subject to capture than money held at home.
Case 4: YouWIN program with capital and labor market constraints
If firms are credit-constrained, than they invest less in their firms than optimal according to (2).
Winning the YouWIN program could reduce credit constraints in three ways:
(i) Directly by providing a grant to the firm
(ii) Indirectly, through providing a signal of quality that leads to more bank lending
(iii) Indirectly, through providing co-financing and a signal of quality that leads to more
outside investments from partners.
The impact of these channels will be to increase capital stock. This may increase or decrease
labor depending on the shape of the production function a heavily credit-constrained firm may
have previously substituted capital for labor, and so reduce workers once it can buy machines to
replace them. Conversely, if capital and labor are complements in production, more capital will
enable the firm to hire more workers.
Firms may also face constraints in the labor market, if workers are reluctant to work for
unknown firms winning the competition may make it easier for the firm to match to willing
workers, increasing employment.
Finally, if firm owners are risk averse and insurance markets are incomplete, the grant may
induce firm owners to undertake riskier investments through subsidizing these investments.
This simple model suggests we should therefore aim to measure the following channels of
influence:
Participation in Changes in A, E, changes in use of firm growth
YOUWIN and access to K and L inputs
5. Estimation methodology
The core challenge for any impact evaluation is to derive an estimate of the counterfactual what would
have happened in the absence of the program. Simple comparisons of the winners to the losers of the
business plan competition will overstate the impact if the judging has succeeded in selecting businesses
with greater chances of growth. The approach used for impact evaluation will instead be to use a
Randomized Controlled Trial (RCT) based on the random selection of ordinary winners, and to
supplement this with matching analysis to estimate impacts on the national and zonal merit winners,
and regression discontinuity analysis to estimate impacts of the 4-day business plan competition
RCT Analysis
Estimation of the effect of being selected to receive a grant through the YouWin! project on the YouWIN
ordinary winner pool will take place through estimating a regression of the form:

    
     
 
(4)
Where this is estimated separately for existing and new businesses, and Region*Gender
i
controls for the
randomization strata. Huber-White standard errors will be used. The coefficient b then gives the average
effect of being assigned to receive a grant amongst the group of ordinary finalists. This is not the same
as the effect of actually receiving the grant, since a small number of winners were later disqualified and
replaced with 9 individuals non-randomly selected from the control group. We can therefore also
estimate a local average treatment effect of actually getting the money by replacing AssignTreat with
GetMoney in (4), and then instrumenting getting the money with treatment assignment.
Since the impacts of the outcomes are likely to vary with time and surveys are planned to be at least 6
months apart, we currently plan to estimate equation (4) wave by wave. When baseline data on the
outcome of interest is available, we will control for it as an additional regressor in an ANCOVA
specification, thereby increasing power as follows:

    
     
   
 
(5)
Non-experimental analysis with other winners
This sample will be used for two main forms of analysis.
First, testing whether the business plan scores are predictive of firm outcomes among the 1200 winners.
I will run the following regressions on the winner sample:

    
     
   
 
And test b=0, d=0, and b=d=0. That is, that among the winner group the business plan score and
application score are not significant predictors of success conditional on region and gender.
The outcomes to be tested here will include: Firm start-up (for new businesses), Firm survival (for
existing businesses), Current wage and salary workers, Firm sales and firm profits (existing businesses),
and employment index (all defined above).
I will also test further whether these scores have any additional predictive power beyond basic
characteristics of the business also measured in the baseline:

    
     
   
 
Where X is a vector of the following baseline control variables:
New businesses: Owner age, Owner is married, Owner has university education, Owner has
postgraduate education, Owner has worked or lived abroad, Owner would choose a risky gamble,
Owner has internet access at home, proposed sector is crop or animal, proposed sector is
manufacturing, proposed sector is trade, proposed sector is IT.
Existing businesses: The same characteristics as for new businesses, plus years in business, number of
workers at baseline, ever had a formal loan.
These same two regressions will also be run separately on the control group sample.
Second, propensity-score matching will be used to estimate the impact of the program on the national
and zonal winners. This will proceed by the following steps:
1) Estimate propensity score following these specifications (pre-determined and coded before
seeing the follow-up data):
New businesses:
#delimit ;
pscore psmtreat female age married highschooledn university postgraduate lived_worked_abro
chooserisk have_internet owncomputer ownsatelitedish ownfreezer cropandanimal manufacturing
trade IT if new==1, pscore(pscore1) comsup;
Existing businesses: We will use the variables as above, plus baseline workers and whether they had had
a loan before at baseline. These latter two variables have missing values, which will be dummied out.
(These are the variables in the test of randomization except for the application and business plan scores
since the point is to find firms that are similar among the semi-finalists, without paying heed to their
business plan score).
2) Estimate the treatment effect using this propensity score within the common support, using the
following two specifications:
a) attr outcome psmtreat if new==1, pscore(pscore1) comsup
b) attk outcome psmtreat if new==1, pscore(pscore1) comsup
And likewise for the existing firms setting new==0.
Regression-discontinuity analysis
Regression-discontinuity analysis will be used to test for an effect of participating in the 4-day business
training program on firm outcomes. This will be done by surveying individuals who had first-round
application scores just above or just below the thresholds for being invited to training. The graphs below
show how the likelihood of being invited to training jumps from 0 to 100% around a score of 52 for new
business applicants, and a score of 50 for existing business applicants. Surveying will restrict this sample
to firms in the North-Central, South-Eastern, and South-Western regions, since the other regions have
few firms close to the cutoffs.
In total there are 4008 new enterprises and 652 existing enterprises that are within 5 points either side
of the cutoff. 770 of these firms (329 existing, 441 new) are already included in the 2316 firms in our
experimental plus winner sample. This leaves up to 3890 firms that we could add to the survey. Given
budget constraints, we chose to add all 323 existing firms, and then a random sample of 500 of the new
firms with scores around the threshold.
Here is the distribution of the proportion invited to training by application score among newenterprises,
for scores in the range 47 to 57. We see that someone with a score of 49 has 0% chance of being invited
to training, while someone with score 53 has 100% chance. The fuzzy cut-off varies by region, being
either 51 or 52. So firms with very similar scores have different chances of getting training, which
provides the scope for a (fuzzy) regression-discontinuity design.
The same is true for existing firms. There are 477 existing enterprises with scores in the range 45 to 55,
with 50 the fuzzy cutoff in two regions, and 49 the fuzzy cutoff in one region.
0
.5
1
0
.5
1
45 50 55 60
45 50 55 60
North-Central Region South-East Region
South-West Region
meaninvite Fitted values
Fitted values Fitted values
Fitted values Fitted values
Fitted values
Total mark given in first round scoring
Discontinuity for New Firms
Having a score just about the threshold has two impacts on firms they are invited to training, and they
have a chance of having their business plan selected as a winner. We will therefore conduct two sets of
analysis:
A) The overall effect of being selected for the business plan training, which incorporates both these
effects. This will use all firms in our sample that have application scores within 5 points of the
cutoffs. The RD design is valid under the usual assumptions for this estimate.
B) The effect of getting the business plan training, but not being selected as a winner. This will
exclude the winners from the sample. In order for the RD design to be valid here, we require a
further assumption that being selected as a winner from among these marginal applicants is
random conditional on observables.
we run the following regressions:
First-stage:
Linear control:
xi: reg invite abovethreshold totalmark i.region_n if existing==1 & totalmark>=45 &
totalmark<=55, robust
0
.5
1
0
.5
1
45 50 55
45 50 55
North-Central Region South-East Region
South-West Region
meaninvite Fitted values
Fitted values Fitted values
Fitted values Fitted values
Fitted values
Total mark given in first round scoring
Discontinuity for Existing Firms
Quadratic control:
xi: reg invite abovethreshold totalmark totalmark2 totalmark3 totalmark4 i.region_n if
existing==1 & totalmark>=45 & totalmark<=55, robust
Impact of being above threshold on outcome:
Above regressions with outcome instead of invite.
2SLS:
xi: ivreg2 outcome (invite=abovethreshold) totalmark i.region_n if existing==1 & totalmark>=45
& totalmark<=55, robust
and similarly, with fourth-order polynomial.
The new firms will be estimated similarly.
6. Hypotheses to be tested and families of outcomes
We begin with setting out hypotheses and families of outcomes to examine for existing
businesses, and then discuss the case of new businesses.
EXISTING BUSINESSES
The classification of existing business will be based on status at the time of application.
Individuals who are verified as no longer running a business will have their business outcomes
coded as zero (e.g. they are not formal, they have zero profits, and zero workers).
FAMILY A: CHANGES IN A, E, AND ACCESS TO K AND L
HYPOTHESIS A: Winning the YouWIN competition leads to positive increases in A, E, and access
to K and L.
We will consider the following outcomes in this family:
1. Entrepreneurial self-efficacy Measured as the number of business activities that the
owner rates themselves as “very confident” in their ability to do (P12a-P12i). This is coded
as 1 for each item if the owner answers 4 = very confident, and 0 if they answer 1 through 3,
or 9 (not applicable or refuse).
2. Formality we classify a firm as formal if it has a registered business name (B6a=1),
municipal license (B6b=1), and income tax registration (B6c=1), and not formal if it does not
answer yes to all these three questions.
3. Mentoring firm has a business mentor (IN21=1)
4. Network number of other business owners the owner discusses business matters with
(IN22, top coded at 99
th
percentile of overall distribution).
5. Participated in a training program: IN23=1 and for individuals no longer running a business
YE2=1 for any category.
6. Participated in a training program other than YouWIN provided (IN24_1=1 or IN24_2=1, or
IN24_4=1 or IN24_5=1) and for individuals no longer running a business YE2=1 for
categories other than YouWIN.
7. Business has taken a formal loan in 2012 (FB5a=1 or FB5b=1 or FB5d=1 (loan from bank,
microfinance or NGO)).
8. Received a new investment from equity holder in 2012 (FB9=1).
9. Standardized z-score index of these measures.
FAMILY B: CHANGES IN BUSINESS INPUTS
HYPOTHESIS B: Treatment leads to increased use of business inputs.
We will consider the following outcomes in this family:
1. Owner’s labor hours: measured as hours in the last week (EF1). This will be top-coded at the
99
th
percentile of the overall distribution.
2. Consulting services: number of hours of consulting services used in 2012 (IN20, top coded at
99
th
percentile of overall distribution); coded as 0 for those who don’t use consulting
services (IN18=2).
3. Value of Inventories and Raw Materials (BF2, top-coded at the 99
th
percentile of the overall
distribution).
4. Purchase of capital worth more than 100,000 Naira in 2012 (BF3=1)
5. Total purchases of capital over 100,000 Naira (BF4a + BF4b, top coded at 99
th
percentile,
coded as 0 for those not making new capital investments).
6. Standardized z-score index of these measures
FAMILY C: BUSINESS PRACTICES AND INNOVATION
HYPOTHESIS C1: Winning the YouWIN competition leads to improvements in business practices
1. An index of business practices formed from section 8, aggregating the number of practices
firms carry out, using the same coding as used by de Mel et al. with these questions in Sri
Lanka.
HYPOTHESIS C2: Treatment leads firms to undertake more innovation
This will be measured by the following set of outcomes:
1. Firm introduced a new product or service in 2012 (IN1=1)
2. Firm significantly improved an existing product or service in 2012 (IN8=1)
3. Firm introduced a new or improved process in 2012 (IN9=1)
4. Firm introduced a new design or packaging (IN12a=1)
5. Firm introduced a new channel for selling goods (IN12b=1)
6. Firm introduced a new method for pricing goods (IN12c=1)
7. Firm introduced a new way of advertising (IN12d=1)
8. Firm changed way work is organized in the firm (IN12f=1)
9. Firm introduced new quality control standards (IN12g=1)
10. Firm licensed a new technology (IN13c=1)
11. Firm obtained a new quality accreditation (IN13f=1)
12. Firm uses the internet (IN15=1)
13. Innovation index: the average of 1-12.
FAMILY D: CHANGES IN BUSINESS SALES AND PROFITABILITY
HYPOTHESIS D1: Treatment leads to greater sales and profits in the medium term, but likely has no
discernible impact in the first follow-up survey.
This will be measured as the following set of outcomes:
1. Number of customers in a typical week (B12). This will be top-coded at the 99
th
percentile of
the overall distribution to account for outliers.
2. Total sales in the last month with no truncation: BF5. For businesses not answering the exact
answer, but answer the range question, the midpoint of the range will be used. For firms in the
top range, a value equal to the median of firms with sales in this top range will be used.
3. Total sales in the last month truncated at the 99
th
percentile. As in 2, except truncated at the
top 99
th
percentile.
4. Total sales in 2012 to date, truncated at the 99
th
percentile. BF6 measured as per 3.
5. Sales are higher than one year ago. BF7=3
6. Total profits in the last month with no truncation: BF9. For businesses not answering the exact
answer, but answer the range question, the midpoint of the range will be used. For firms in the
top range, a value equal to the median of firms with sales in this top range will be used.
7. Total profits in the last month truncated at the 99
th
percentile. As in 6, except truncated at the
top 99
th
percentile.
8. Total profits in the best month of the year, truncated at the 99
th
percentile. BF10, measured
as per 7.
9. The inverse hyperbolic sine transformation of total business profits in the past month
log(y+(y
2
+1)
1/2
) which is similar to the log transformation, but can deal with zero profits. BF9.
For businesses not answering the exact answer, but answer the range question, the midpoint of
the range will be used. For firms in the top range, a value equal to the median of firms with sales
in this top range will be used.
10. Sales of main product in past month. BF12b*BF12d
11. Mark-up profit on main product in past month: (BF12b-BF12c)*BF12d
12. A standardized profits and sales impact will be obtained by aggregating these different effects
as described below in our methods section as a standardized z-score.
HYPOTHESIS D2: Winning YOUWIN does not affect reporting errors.
A concern with any program involving business training or improvements in record-keeping is that it
may lead to changes in the accuracy of the information being reported, even if the underlying business
financial position does not change. If businesses systematically under- or over-state sales and profits,
this will lead to a bias in the measured treatment effect.
We will test whether the treatment has affected the reporting of existing firms through estimating the
treatment impact on the number of reporting errors made. The following will be deemed a reporting
error, and our measure will be the total number of such errors:
a) Total sales in last month exceed total sales so far in 2012 (BF5>BF6)
b) Profits in last month exceed sales in last month (BF9>BF5)
c) Profits in best month are less than profits in last month (BF10<BF9)
d) Revenues in last month from main product exceed total revenues in last month
(BF12b*BF12d >BF5)
FAMILY E : CHANGES IN EMPLOYMENT
HYPOTHESIS E: The YouWIN program leads to an increase in employment in existing businesses
We will measure the impact of the YouWIN program on employment as measured by the following set
of outcomes:
1. Business owner is employed (in business or not) measured by either SC = 1 (owns a business),
or NB1<=5 (worked for pay in the last month).
2. Business has survived (SC=1)
3. Current number of wage or salaried workers (EF3_1a)
4. Current number of casual or daily workers (EF3_2a)
5. Current number of unpaid workers (EF3_4a)
6. Workers hired in 2012 (EF4)
7. Workers fired in 2012 (EF6)
8. Employment Index this will be a standardized z-score average of 1-6.
NEW BUSINESSES
Firms are classified as new businesses if they applied as a new business to the YouWIN program.
FAMILY A: CHANGES IN A AND E
HYPOTHESIS A: Winning the YouWIN competition leads to positive increases in A, E, and access to K and
L.
We will consider the following outcomes in this family:
1. Entrepreneurial self-efficacy Measured as the number of business activities that the owner
rates themselves as “very confident” in their ability to do (P12a-P12i). This is coded as 1 for each
item if the owner answers 4 = very confident, and 0 if they answer 1 through 3, or 9 (not
applicable or refuse).
2. Participated in a training program: YE2=1 for any category or PE2=1 for any category or YE15=1
or PE15A=1
3. Participated in a training program other than YOUWIN: YE2=1 for any category except YOUWIN
or PE2=1 for any category except YOUWIN, or YE15=1 or PE15A=1
FAMILY B: STEPS TOWARDS OPENING A BUSINESS
HYPOTHESIS B: Treatment leads to individuals taking more steps towards starting a business.
We will consider the following outcomes (coded automatically as 1 if the owner has started a business)
1. Interested in starting a business in next 12 months. PN1=1
2. Knows type of business they would like to start. PN3 has an answer. Coded as zero if they don’t
want to start a business, coded as 1 if already started.
3. Has identified specific location where they expect to start the business. PN4=1. Coded as zero
if they don’t want to start a business, coded as 1 if already started.
4. Has identified the costs of starting the business. PN6 has an answer. Coded as zero if they don’t
want to start a business, coded as 1 if already started.
5. Has gauged demand for new business. PN7a=1.
6. Has worked out money needed to start new business. PN7b=1.
7. Has visited competitors to see how they operate. PN7d=1
8. Has taken training course to get skills for new line of business. PN7e=1
9. Has identified sources of financing to pay for costs of new business. PN7f==1
10. Standardized Index: Number of Steps taken towards opening the business (sum of 1-9).
11. Standardized Index of Steps for Non-Business Owners: Number of steps taken for firms which
are not business owners. This codes as missing those owners who have started a business. This
is for robustness only, since if treatment affects the likelihood of opening a business, or the
selection as to who opens a business, this will involve conditioning on a selected sample.
FAMILY C : CHANGES IN EMPLOYMENT IN NEW BUSINESSES
HYPOTHESIS C: The YouWIN program leads to an increase in employment in new businesses
We will measure the impact of the YouWIN program on employment as measured by the following set
of outcomes:
1. Business owner is employed (in business or not) measured by either SC = 1 (owns a business),
or NB1<=5 (worked for pay in the last month).
2. Owner has started a business (SC=1)
3. Current number of wage or salaried workers (EF3_1a). Coded as zero for those without a
business.
4. Current number of casual or daily workers (EF3_2a). Coded as zero for those without a
business.
5. Current number of unpaid workers (EF3_4a). Coded as zero for those without a business.
6. Workers hired in 2012 (EF4). Coded as zero for those without a business.
7. Workers fired in 2012 (EF6). Coded as zero for those without a business.
8. Employment Index this will be a standardized z-score average of 1-6.
FAMILY D: BUSINESS OUTCOMES FOR NEW BUSINESSES
HYPOTHESIS D: Treatment leads to greater sales and profits for the businesses started by new business
applicants.
This will be measured as the following set of outcomes. Our pure experimental estimates will be
unconditional, using zeros for individuals not operating businesses. For exploration purposes we will also
examine conditional outcomes, conditional on operating a business. These will only be valid
experimental estimates if the treatment does not affect the selection of who operates a firm.
1. Number of customers in a typical week (B12). This will be top-coded at the 99
th
percentile of
the overall distribution to account for outliers.
2. Total sales in the last month with no truncation: BF5. For businesses not answering the exact
answer, but answer the range question, the midpoint of the range will be used. For firms in the
top range, a value equal to the median of firms with sales in this top range will be used.
3. Total sales in the last month truncated at the 99
th
percentile. As in 2, except truncated at the
top 99
th
percentile.
4. Total sales in 2012 to date, truncated at the 99
th
percentile. BF6 measured as per 3.
5. Sales are higher than one year ago. BF7=3
6. Total profits in the last month with no truncation: BF9. For businesses not answering the exact
answer, but answer the range question, the midpoint of the range will be used. For firms in the
top range, a value equal to the median of firms with sales in this top range will be used.
7. Total profits in the last month truncated at the 99
th
percentile. As in 6, except truncated at the
top 99
th
percentile.
8. Total profits in the best month of the year, truncated at the 99
th
percentile. BF10, measured
as per 7.
9. The inverse hyperbolic sine transformation of total business profits in the past month
log(y+(y
2
+1)
1/2
) which is similar to the log transformation, but can deal with zero profits. BF9.
For businesses not answering the exact answer, but answer the range question, the midpoint of
the range will be used. For firms in the top range, a value equal to the median of firms with sales
in this top range will be used.
10. Sales of main product in past month. BF12b*BF12d
11. Mark-up profit on main product in past month: (BF12b-BF12c)*BF12d
12. A standardized profits and sales impact will be obtained by aggregating these different effects
as described below in our methods section as a standardized z-score.
7. Heterogeneity of Treatment Effect Outcomes
The analysis will separate the new and existing business applicants since some of the outcome measures
will differ for these two groups, and the impacts of the program are likely different.
Within these two groups we will examine the heterogeneity in treatment effects with respect to the
following key variables of policy interest:
1) Business plan score: The key question of interest here is whether the treatment effect is larger
or smaller for those with higher business plan scores. This is important for thinking about
targeting of the program. I will examine this in two different specifications:
a) Linear specification: I will include a linear control for the business plan score, and then
interact this with the treatment effect.
b) Quartiles: I will include dummies for having a business plan score in the 2
nd
, 3
rd
and 4
th
(top)
quartiles among the experimental sample, and interactions of each of these quartiles for the
business plan score with the treatment.
2) Female: Power may be low for this interaction given that only 16-17% of the sample is female,
but we will examine whether there are significant differences by gender. This is one of the
variables the randomization was stratified on.
3) Region: The randomization was stratified on six regions we will include dummies for 5 of
these regions and the interactions between region and treatment effect.
Given the sample size I will examine these interactions one at a time.
Treatment heterogeneity will be examined with respect to the following outcomes:
- Standardized profit and sales index
- Owner has started a business (new businesses), or owner has a surviving business (existing
businesses)
- Employment index
- Current number of wage and salaried workers
For the first survey follow-up analysis, if there is no significant overall impact on any of these 4
measures, then I will also examine heterogeneity in response to intermediate measure indices to see
whether steps are being made towards ultimate outcomes.
8. Additional information regarding use of testing and data
Dealing with Testing for Multiple Outcomes through Standardized Treatment Effects and Adjustments
for Multiple Inference
To deal with multiple hypothesis testing we employ two approaches. First, we group our outcome
measures into domains or families, based on the idea that items within a domain are measuring an
underlying common factor. Then we sign the outcomes within each family so that the hypothesized
effects go in the same direction, and take a standardized treatment effect within that domain. We follow
Kling, Katz and Liebman in constructing this standardized treatment effect.
Secondly, to account for multiple inference within a domain we will compute and report the family-wise
error rate adjusted p-values using the Dubey/Armitage-Parmar adjustment to the Bonferroni procedure
which provides a correction for correlation among outcomes, as set out here:
http://blogs.worldbank.org/impactevaluations/tools-of-the-trade-a-quick-adjustment-for-multiple-
hypothesis-testing
Inflation
Once multiple rounds of survey data are available, nominal values will be converted to real Naira using
the Nigeria consumer price index from the Central Bank of Nigeria.
Procedures for Addressing Missing Data and Questions with Limited Variation
The following sections detail the procedures for addressing the cases of survey attrition, item non-
response, and questions with limited variation.
Survey attrition
Depending on response rates and budget, the follow-up survey will potentially use more expensive
methods to try and get a subsample of the individuals who can be obtained through the standard survey
to respond. If this is done, all data will be probability-reweighted to reflect this.
Let A
i
be an indicator of whether individual i attrits from the study by not responding to or being able to
be contacted for a follow-up survey. We will first estimate whether attrition is related to treatment
status by means of the following regression:
 
 
 
Where X
s
are dummy variables for each randomization strata s (consisting of region*gender). Since
randomization is at the individual-level, conditional on these strata, Huber-White standard errors will be
used. We will test
to determine whether attrition from the survey is related to treatment status
or not.
If treatment status is found not to significantly affect attrition at the 5 percent significance level, then all
estimation will proceed without any adjustment for attrition. If attrition is found to be related to
treatment status, we postulate that attrition will be higher for the control group (although it is possible
that the repeated requests for information that the treatment group has received may instead lower
their willingness to participate). We will then employ two bounding approaches to test robustness to
attrition:
(i) Lee bounds: the group with lower attrition will have either the top or the bottom tail of
responses trimming following the Lee method. For continuous outcomes robustness to
assuming that the attrited observations were at the 95
th
, 90
th
, and 75
th
percentiles will be
used for the lower bound, and 5
th
, 10
th
, 25
th
percentiles for the upper bound.
(ii) Behaghel et al bounds: we will use the number of attempts it took to contact respondents to
form bounds following the approach set out in their paper.
Missing data from item non-response
No imputation for missing data from item non-response at follow-up will be performed. Missing data on
baseline variables will be dummied out of the ANCOVA specifications. We will check whether item non-
response is correlated with treatment status following the same procedures as for survey attrition, and
if it is, construct bounds for our treatment estimates that are robust to this.
Questions with Limited Variation
In order to limit noise caused by variables with minimal variation, questions for which 95 percent of
observations have the same value within the relevant sample will be omitted from the analysis and will
not be included in any indicators or hypothesis tests. In the event that omission decisions result in the
exclusion of all constituent variables for an indicator, the indicator will be not be calculated.
-