Forecasting Pre-K Enrollment In Georgia Counties

7m ago
41 Views
0 Downloads
209.39 KB
29 Pages
Transcription

Forecasting Pre-K Enrollment in Georgia CountiesTable of ContentsExecutive Summary . iiiI.Introduction . 1II.Data . 2Population of Four Year Olds. 2CDC Data on Births . 2Geolytics Forecast . 3Pre-K Enrollment. 4III.Methodology . 5Forecasting Population of Four Year Olds . 5Estimating Enrollment Ratio . 11Pre-K Enrollment. 12IV.“How To” / Manual . 14Re-Estimating Four Year Old Population with Additional Data . 14Adjustment of Number of Lags in ARMA(p,q) Model . 15Reproducing the Forecast of Population of Four Year Olds . 20Adjusting Enrollment Ratios . 20Appendices . 22ii

Forecasting Pre-K Enrollment in Georgia CountiesExecutive SummaryTo forecast pre-kindergarten (Pre-K) enrollment in Georgia by county for2007 through 2011 we use data on actual Pre-K enrollment and data on thepopulation of four year olds. Due to data limitations, we rely on forecasting thepopulation of four year olds from 2005 forward and we forecast Pre-K enrollmentbased on past relationships between population and Pre-K enrollment.Data for the four year old population come from the SurveillanceEpidemiology and End Results (SEER) database at the National Cancer Institute. Weperform a number of consistency checks on the data base by comparing our forecastwith the one obtained from the Centers for Disease Control (CDC) data on births aswell as from Geolytics population forecasts. Geolytics is a private company thatdevelops population forecasts. Data for Pre-K enrollment comes from Bright fromthe Start (BFTS).We use a county specific time-series autoregressive moving averageeconometric model, an ARMA(p,q) model, to forecast population. Statistical testsand in sample forecasts show that the model is able to explain the data very well andthat we are able to predict trends that have been occurring so far. Although there arevariations across counties, aggregate forecasts for Georgia are nearly identical toobserved numbers for the in sample forecasts. Nevertheless, one needs to check forerrors in purely statistical forecasts and assess them in relation to externalinformation.This document provides a manual on the forecasting methodology to provideguidance to someone who is interested in replicating results or reestimating the modelwhen the new data becomes available.iii

Forecasting Pre-K Enrollment in Georgia CountiesI.IntroductionBright from the Start (BFTS), Georgia Department of Early Care andLearning, administers Georgia’s pre-kindergarten program (Pre-K) in addition toother administrative and policy oversight related to early learning and child care. Aspart of its annual planning and budget activities, BFTS forecasts the Pre-K populationby county for the state. The Pre-K forecasts by county can be used for programbudgeting, for analysis of the coverage of Pre-K, and for long-term planning relatedto expansion of the Pre-K program.The Andrew Young School of Policy Studies, Georgia State University wascontracted to provide a methodology for forecasting the Pre-K population. Thisreport provides a manual that documents the methodology and provides the actualforecast by county for 2007-2011. This methodology is to be “handed off” to BFTSalong with training so that future forecasting can be done in-house at BFTS.1

Forecasting Pre-K Enrollment in Georgia CountiesII.DataTwo key components of data used to forecast Pre-K enrollment are data onactual Pre-K enrollment and data on population of four year olds. As data on actualPre-K enrollment are available only from 2001 to 2006, it is nearly impossible to usea statistical model to accurately forecast 5 periods ahead given we have only 6 actualvalues of enrollment. Therefore, we rely on forecasting the population of all fouryear olds. As our forecasting capabilities are much greater given that population datafor four year olds are available from 1969 to 2004, we first forecast the population offour year olds, and then we estimate Pre-K enrollment based on some pastrelationship between population and Pre-K enrollment. We apply this relationship tothe forecast of four year olds to derive the forecast of Pre-K enrollment.The following section describes the data used to forecast the population offour year olds. It describes the data on actual Pre-K enrollment that is used todetermine the relationship between population and enrollment.Population of Four Year OldsTo forecast the population of four year olds our model relies on the previouspopulation of four year olds. Data we use come from the Surveillance Epidemiologyand End Results (SEER) database at the National Cancer Institute. The database hasestimates of four year olds by county for each year from 1969 to 2004. This relativelylong time series allows us to use a purely statistical approach in forecastingpopulation from 2007 to 2011. Nevertheless, we have used several different datasources to forecast the population of four year olds as a robustness check. Thesesources are briefly described below.CDC Data on BirthsCenter for Disease Control and Prevention (CDC) provides data on births bycounty by year from 1968 to 2004. The unit of observation is birth, not county, andtherefore one first needs to aggregate births by county. These data can be very usefulin estimating population of four year olds as the data contain a lot of explanatory2

Forecasting Pre-K Enrollment in Georgia Countiesvariables for each newborn. However, using these data to forecast population of fouryear olds requires making two strong assumptions:1. Newborns did not move from their county of birth until they were overfour years old (or that movement is purely random so that on averagepopulation stays the same);2. The county where the birth certificate was issued is same as the county ofresidence.Making such strong assumptions that can be easily violated may lead to inconsistentestimates of population of four year olds at the county level and in turn toinconsistent estimates of Pre-K enrollment at the county level.Geolytics ForecastA previous AYSPS forecast used population estimates and projections fromGeolytics for 2006 and 2011 for the “less than 5” age group as the base of theforecast of four year olds. Then, several calculations were done in order to derive thepopulation of four year olds from the base group. These calculations involve usingCensus data on “3-4 year old” and data on first grade enrollment to derive a finalpopulation forecast. However, by using these data we are making the followingassumptions:1. There will be no migration between counties in the period from 2006 to2011;2. All children enrolled in public schools (i.e. there were no childrenenrolled in home schooling programs and there were no children enrolledin private schools);3. All children enrolled in public schools in the county where they attendedPre-K;4. In Census population estimate of “3-4 year old,” exactly half are four yearolds and half are three year olds.Again, making such strong assumptions may lead to inconsistent estimates ofpopulation of four year olds at the county level and in turn to inconsistent estimatesof Pre-K enrollment at the county level. We therefore rely on SEER data for our final3

Forecasting Pre-K Enrollment in Georgia Countiesforecast. However, we compare our final forecast with the forecast based onGeolytics data, and we describe this in more detail in the next section.Pre-K EnrollmentData on actual Pre-K enrollment are collected from Bright from the Start.These data are available by county by year from 2001 to 2006. We do not use thesedata to forecast future Pre-K enrollment alone. Nevertheless, we use these data toderive relationship between four year olds and Pre-K enrollment in each county. Inturn our final forecast of Pre-K enrollment will be influenced by these data as theydetermine relationship between population of four year olds and Pre-K enrollment.In almost every county in Georgia we observe an increase in enrollment overtime. This increase in enrollment is much greater (in relative terms) than the increasein population. Therefore, it appears that the enrollment rate has been increasing at afaster rate when compared to the increase in the population of four year olds.However, given only 6 observations, we cannot estimate nor impose any relationshipthat would suggest how this enrollment is to increase in years 2007 through 2011.In addition, there may have been exogenous factors that caused the increasein enrollment. One such factor is a change in policy or targeted focus, and we knowthat such changes have occurred. Therefore, given policy changes and relatively fewobservations of actual enrollment, one would need to rely on “expert opinion” inorder to forecast future Pre-K enrollment, as previous realizations are not very goodpredictors of future ones. In the next two sections, we will discuss in more detailhow such “expert opinion” can be incorporated to adjust estimates of enrollmentratios and forecasts of Pre-K enrollment.4

Forecasting Pre-K Enrollment in Georgia CountiesIII.MethodologyThis section briefly describes the methodology used to forecast Pre-Kenrollment by county. Our forecasting has two components: forecasting population offour year olds and calculating enrollment from the forecasted population. Thepopulation forecasts are based on a purely statistical time-series model. To derive ourfinal Pre-K enrollment forecast, we multiply our population forecast of four year oldsby the estimated enrollment ratio. Depending on the estimated enrollment ratio, wecan have more than one forecasts of Pre-K enrollment, despite having only a singleforecast of population of four year olds.First we describe the methodology used to forecast population of four yearolds. Next, we describe how we calculate the enrollment ratio, and finally wedescribe how these two are combined to obtain forecast of Pre-K enrollment.Forecasting Population of Four Year OldsTo forecast the population of four year olds by county we can rely on purelystatistical models (where we use only previous population to forecast the futurepopulation) or on structural models (“more economic” models where we impose astructure and use other variables such as per capita income and mother’s laborparticipation, as well as population itself to forecast future population). Althoughstructural models have many advantages, they require us to impose a relationshipbetween four year old population and some other variables (such as income, femaleunemployment, etc.). Therefore, these models are very data intensive. In addition,using structural model raises the following issues:1. What variables should be included in the model? Although somevariables, such as births four years ago, have a strong statisticalrelationship to population of four year olds, one can argue that populationof four year olds is influenced by various other factors and there is noclear-cut point at which one can say what variables should and whatshould not be included in the model.2. Asides from selecting relevant variables, one also needs forecasts of suchvariables in order to derive a forecast of four year old population. Someforecasts, such as per capita income, are relatively easy to obtain.However, other variables for which forecasts are not available would5

Forecasting Pre-K Enrollment in Georgia Countiesrequire doing additional forecast and they would thus increase forecasterror.Therefore, we rely on a purely statistical models to forecast the population offour year olds. Although these models are less intuitive, they are very powerful if thetrends are stable as they are in this case.The population forecast literature has some doubts about using standardforecast models in the very long run. However, forecasting seven periods aheadshould not raise any serious concerns under the assumption that there would not beany drastic scenarios. Nevertheless, as Lee and Tuljapurkar point out, “one should notrely on mechanical time series forecasts in any case; they should be assessed inrelation to external information.” 1We have used several methods, and our final forecasts are based on anautoregressive moving average model, ARMA(p,q), where p denotes maximumautoregressive order included in the model and q denotes maximum moving averageorder included in the model. Our model is outlined in the following equation:(1)popi ,t ϕi1 popi ,t 1 . ϕip popi ,t p ε i ,t θi1ε i ,t 1 . θiqε i ,t qwhere pop denotes population, i denotes county, and t denotes year. We let p and qvary by county, and the choice of p and q is data driven (i.e. we let the data tell uswhat lags are significant and should be included). Therefore, we do not impose amodel where we assume each county is same; rathe