Descriptive statistics.
Summarize your interpretation of the frequency data provided in the output for respondent’s age, highest school grade completed, and family income from prior month.
Respondent’s age.
The first table in part one helps us visualize descriptive statistics for the age of the participants. From the table, we can tell data on age is available for 1000 participants. The sample size is usually denoted as n, and N represents the total population. However, in SPSS, N also represents the number of participants in a sample (Statistics Solutions, 2021). The age of the respondents has a range of 30.052 years, with the youngest participant being 19.378 years which is approximately 19 years, and the eldest participant being 49.430, which is around 49 years.
There are three central measures of tendency; mean, mode, and median. Central measures of tendency show the central or typical central value in a sample’s distribution. Mode represents the number that appears the most, median is the number or numbers at the center of the distribution, and mean the average age of all participants; calculated as the sum of the ages of the sample divided by the number of respondents (Statistical Consulting Group, n.d.). For example, in our output, the mean equals 36.63733, so most participants are approximately 36 years old on average. Therefore, most respondents are old. Standard deviation and variance are measures of dispersion; Our standard deviation is 6.1987, which means on average, our participants have a 6.1987-year age gap.
Our skewness statistic is -.374. Which might mean our data is negatively skewed because our statistic is a negative number, however, our data is roughly normally distributed because our statistic is close to zero; our information on age will most likely have no problem meeting the normality assumption. When carrying out an independent T-test, two assumptions have to be met, homogeneity of variance and normality; our data would have met at least one of the assumptions.
From our histogram, we can tell that we only have one outlier, which is most likely the youngest participant. However, we can also see that our information is roughly normally distributed, and most of our participants are above 20, around 35-42 years, and below fifty.
Highest School Grade Completed.
The second table helps us visualize descriptive statistics on the highest school grade completed, which is the highest grade recorded for each participant. In SPSS, Valid N is the number of data recorded that is not missing (Statistical Consulting Group, n.d.); therefore, data on the highest grade recorded was only available for 989 students (this includes missing data). If the output was from one sample, we could conclude that there is data missing for this variable because the valid N for age was 1000 participants. The grades’ range was 15, the least grade recorded was one, and the highest grade recorded was 16. On average most students highest score was around 11.28, which means that most students did fairly well, and there is a possibility that the highest and lowest data were outliers. Our standard deviation was 1.561, which means, on average, there is a difference of 1.561 scores between participants. The skewness is -.727; therefore, our data is roughly normally distributed.
Our histogram can confirm our prediction is true because the highest and lowest scores are outliers. We can also confirm that our data is roughly normally distributed, and our mean is roughly 11 points. If we draw a perpendicular line from our crest to our x-axis, the line will meet the x-axis at around 11 points.
Family Income.
The first table in part III represents the descriptive statistics for family income in the previous month from all sources. There was data available for 895 participants; if the data in the file being analyzed was from one survey, there is a probability that there is data missing for some participants because the other variables analyzed above have more participants. The income’s range was $6,593, with the participant with the lowest family income having no income at all and the participant with the highest income having a family income of $6,593. On average, every participant in the survey has a family income of 1,172.59 dollars; therefore, the participant with the highest income is most likely an outlier. The difference between the sample’s mean and the true population mean is most likely $26.345 (Ilola, 2020). There is also a difference of 788.153 dollars between participants family incomes. The skewness is 2.030; therefore, the data is positively skewed, which is problematic, and our graph will probably lie on the left side.
Our histogram confirms that our prediction is true; the highest income is an outlier, and the graph will lie on the left side. From the graph, we can also see that for most participants, the family income is between $800 to $1200, and below $2,000.
References
Australian Bureau of Statistics. (n.d.). Statistical Language – Measures of Central Tendency. https://www.abs.gov.au/websitedbs/D3310114.nsf/Home/Statistical+Language+-+measures+of+central+tendency#:%7E:text=There%20are%20three%20main%20measures,central%20value%20in%20the%20distribution.
Ilola, E. (2020, October 6). A beginner’s guide to standard deviation and standard error. Students 4 Best Evidence. https://s4be.cochrane.org/blog/2018/09/26/a-beginners-guide-to-standard-deviation-and-standard-error/
Statistical Consulting Group. (n.d.). Descriptive statistics | SPSS Annotated Output. UCLA: Statistical Consulting Group. https://stats.oarc.ucla.edu/spss/output/descriptive-statistics/
Statistics Solutions. (2021, August 3). Common Statistical Formulas. https://www.statisticssolutions.com/dissertation-resources/common-statistical-formulas/
Leave a Reply