The research project will materialize everything we will study during the course. You will feel the application of the concepts we are studying.
I. Data Analysis Project.
Identify the problem(s) to be solved or opportunities to be realized by mining the selected data set.
Consider the following data preparation questions and explain your answers. When appropriate cite resources that support your answer. Explain how the answers, and data preparation, differed when you chose a different data mining method.
Should instances with missing values be deleted?
Should missing values be specially coded and then retained in the data set?
Should numeric values be assigned predetermined ranges or left for the algorithm to split?
Should categorical variables be grouped or coded to reflect a hierarchy?
To explore the problem or opportunity, use two or more of the following data mining methods covered by this course:
regression: linear regression, discriminant analysis or logistic regression,
hierarchical or k-means clustering,
Describe the algorithms chosen, and indicate why you chose them. Exploring a method of interest is a satisfactory reason for this course paper.
Explain how and why you used specific pruning parameters or other adjustments to create a sparser model.
Compare the alternative solutions using methods found in comparative studies in the literature. For example, see “Data mining for network intrusion detection: A comparison of alternative methods”Dan Zhu, G Premkumar, Xiaoning Zhang, Chao-Hsien Chu. Decision Sciences.Atlanta: Fall 2001.Vol.32. http://www.findarticles.com/p/articles/mi_qa3713/is_200110/ai_n8954240 Report the results of the accuracy measures available with the software. If the software used does not have built-in accuracy reporting then manually test the model’s accuracy on a small hold-out test sample of the data. The hold-out method creates separate training and test sets. This is particularly useful when testing the model on data from a later time period.
Create a table showing the number of cases correctly identified, Type I, and Type II errors. In addition, a ROC curve is appropriate with discriminant analysis and logistic regression. For these methods, changing the parameters for the line separating the classes, changes the percentages of Type I and Type II errors. Medical practitioners like ROC curves because they show the tradeoff between false positives and false negatives.
Which data mining method(s) seem superior for the chosen data set? Did the method that performed best in your study also dominate in similar comparative studies?
Compare the results or recommendations that would result from the use of the different methods.
Based on your analysis, justify a conclusion or recommendation.
Organize the paper into the sections of a formal research paper: Introduction, Methods, Results, etc., Use the resources provided in the Effective Writting Center: http://www.umgc.edu/writingcenter/
II. Writing Skills Research Paper
Writing skills are critically important to succeed in the Graduate School (TGS) and in your future careers. If conducting research is new to you, consider using the library’s helpful resources. To learn about effective Internet research, consult the Information and Library Services (ILS) http://www.umgc.edu/writingcenter/onlineguide/index.cfm
Before locating a full-text journal article in a database, read the http://www.umgc.edu/library/libhow/ . Then go to the Computer/Information Science topic area in the Resources by Topic section of the Library Databases and E-Journals page to select a database in which to search. The database ACM Digital Library is a good one to start with.
Use the APA style guide for your citations. After each title add a short note evaluating its quality as a research tool (back to evaluation criteria), and its quality relative to the other sources cited in your bibliography
Please select your topic for research project and post 1-2 paragraphs summary (abstract) on your intended topic as a New topic in this Conference. Please change the title of your post with the title of your project.
I HAVE ATTACHED THE EXAMPLE REPORT AS A GUIDELINE AND ALSO THE DATASET.
Any citation style (APA, MLA, Chicago/Turabian, Harvard)
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.