Body fat percentage is an essential indicator of whether someone is at risk of developing obesity-related diseases that can range from high blood pressure to heart diseases. The weight of an individual is not a strong factor when identifying whether someone is at risk of developing obesity-related diseases such as hypertension, early atherosclerosis, and hyperlipidemia as noted by Chatterjee, Chatterjee, and Bandyopadhyay (2006). However, the identification of someone’s body fat percentage can be useful when identifying such risks despite the weight of the individual. To address the risk presented by obese related diseases, it is important to identify the body fat percentage of individuals. Body fat percentage can be determined by various factors. Examples of these factors can include age, weight, height, as well as other measurements such as neck circumference. This project aims at the development of a model that can be used to calculate the body fat percentage. The proposed model is based on data collected from 252 study participants. The data used in the development of the model to estimate the body fat percentage includes 19 attributes with two of the attributes being calculated body fat percentages. These attributes include percent body fat using Brozek and percent body fat using Siri. For the development of the model to estimate body fat percentage, the percent body fat using Brozek is preferred based on the comparison made by Guerra et al. (2010) to identify the better alternative between the use of the Brozek equation and Siri equation. The variables to consider when developing the model include age, density, weight, height, adiposity index, fat-free weight, circumferences of various parts of the body. These parts include the neck, chest, abdomen, hip, thigh, knee, ankle, extended biceps, forearm, and wrist. The project aims to develop a model that can be used to estimate the body fat percentage based on some or all of the identified variables.
Preview of Data and Data Analysis Method
The dataset used in this project includes 252 instances and 19 attributes. All the data is numeric and there are no missing values for all the attributes. An essential aspect of the data mining process includes the data preparation stage. Under the data preparation stage, some of the activities that are performed include the recovery of incomplete data that includes filling missing values, purification of the data that includes correcting errors in the data set, and resolution of data conflicts (Zhang, Zhang, & Yang, 2003). Analysis of the provided data set did not present any need for conducting some of the activities conducted in the data preparation stage. Therefore, the next stage involves conducting data analysis. The development of the model is conducted using multiple linear regression and decision trees using the Statistical Package for the Social Sciences (SPSS) tool.
The application of multiple linear regression stemmed from the presence of multiple variables in the data set that can be used to determine the body fat percentage of individuals. Unlike linear regression, multiple linear regression can make use of multiple independent variables to predict the outcome of a dependent variable. Some of the previous studies conducted on the issue of body fat percentage also make use of multiple regression. An example is a study by Weiler et al. (2000), that makes use of multiple linear regression after multiple correlation analysis to predict bone mineral content and density using factors such as weight, age, height, and fat. For this study, the assumption made for the use of multiple regression is that the body fat percentage is directly related to a linear amalgamation of the noted attributes (Tranmer & Elliot, 2008). Similar to the use of multiple regression, the decision trees provide the relationships between the independent attributes in relation to the dependent attribute. The identification of the relationships between the independent variables can then be used to predict the independent attributes. In the study conducted by Uçar et al. (2021), decision trees are some of the methods that are used to develop a body fat percentage prediction model.
Analysis and Code
Linear Regression
Correlation Coefficients
Residual Statistics
Decision Tree
References
Chatterjee, S., Chatterjee, P., & Bandyopadhyay, A. (2006). Skinfold thickness, body fat percentage and body mass index in obese and non-obese Indian boys. Asia Pacific Journal of Clinical Nutrition, 15(2).
Guerra, R. S., Amaral, T. F., Marques, E., Mota, J., & Restivo, M. T. (2010). Accuracy of Siri and Brozek equations in the percent body fat estimation in older adults. The journal of nutrition, health & aging, 14(9), 744-748.
Tranmer, M., & Elliot, M. (2008). Multiple linear regression. The Cathie Marsh Centre for Census and Survey Research (CCSR), 5(5), 1-5.
Uçar, M. K., Uçar, Z., Uçar, K., Akman, M., & Bozkurt, M. R. (2021). Determination of body fat percentage by electrocardiography signal with gender based artificial intelligence. Biomedical Signal Processing and Control, 68, 102650.
Weiler, H. A., Janzen, L., Green, K., Grabowski, J., Seshia, M. M., & Yuen, K. C. (2000). Percent body fat and bone mass in healthy Canadian females 10 to 19 years of age. Bone, 27(2), 203-207.
Zhang, S., Zhang, C., & Yang, Q. (2003). Data preparation for data mining. Applied artificial intelligence, 17(5-6), 375-381.
Leave a Reply