Data sgp is an aggregation of student assessment data used to calculate student growth percentiles and projections. It is an important resource for researchers studying educational equity and improvement. While the data sgp dataset is large, it is not ‘big data’. Big data is a term that is used to describe datasets that are too large to be handled by traditional analysis tools and methodologies (like spreadsheets or relational databases). The data sgp dataset is manageable with a standard database application like MySQL or PostgreSQL.
The sgpData data set is an anonymized, panel data set of 5 years of vertically scaled assessment information in the WIDE format used by lower level SGP functions such as studentGrowthPercentiles and studentGrowthProjections. It provides a model for the format of the data needed to run these analyses and is available for download from the SGP website.
In order to use data sgp, you must have a computer running the R software environment. R is a free and open source software program that can be downloaded and installed for Windows, OSX or Linux. Running SGP analyses requires a substantial amount of computing resources. For this reason, it is recommended that you run SGP analyses on a computer that has a quad core processor and at least 4GB of memory.
sgpData is an object of class SGP that contains long formatted data in the @Data slot (as prepared by prepareSGP). The sgpData object also contains state associated meta-data which can be accessed by the higher level SGP wrapper functions studentGrowthPercentiles, studentGrowthProjections, and studentGrowthTrajectories.
A key to understanding how the SGP calculations work is understanding that students are compared with academic peer groups from prior MCAS administrations. The academic peer groups are determined by comparing students’ current assessment scores to the scores of their academic peers from previous administrations in similar subjects. This comparison is used to determine relative performance which is then translated into student growth percentiles.
The higher level SGP functions require that the sgpData data be in the LONG format to take advantage of their capabilities. In particular, studentGrowthPercentiles and studentsGrowthProjections require the sgpData object to contain a vector indicating year(s) for which to produce student growth percentiles and projections. If the sgpData object does not contain this vector then these functions assume that all data has been assessed for the current year.
The sgpData object must also include the student’s unique identifier in the @Data slot. In addition, the studentGrowthPercentiles and StudentGrowthProjections functions require a list of the students in the cohort sample that were selected to be analyzed. Finally, the studentGrowthPercentiles function requires the sgpData object to have a Boolean value passed to it indicating whether the results of the cohort sample subset analysis should be returned for inspection. The default, NULL, does not restrict the results and returns all results from the entire cohort of students. This is the recommended setting for all analyses. The higher level SGP wrapper functions have other options for limiting the number of students that are evaluated.