DATA MINING TECHNIQUES IN ANALYSIS OF STUDENT COURSE OF STUDY
ABSTRACT:
Data mining has a great deal of attention in the information industry in recent year due to the wide availability of high amount of data and the useful information and knowledge. This project is based on the Application of Data mining techniques’ in Analysis of student course of study, that is, to predict course of study for a student that does not meet up with the school cutoff point for post uptime classification algorithm were used in analyzing the data with the incorporation of a relational data base management system. Conclusively, we have been able to develop software that will generate course of study for students in faculty of science and Engineering.
TABLE OF CONTENT
Title Page i
Certification ii
Dedication iii
Acknowledgment iv
Table of Contents v
Abstract vi
CHAPTER ONE: INTRODUCTION
1.1General Overview 1
1.2 Statement of the Problem 2
1.3 Aim and Objectives of the Project 3
1.3.1 Aim 3
1.3.2 Objectives 3
1.4 Research Methodology 3
1.5 Significance of the Study 4
1.6 Scope of the Study 5
1.7 Limitations of the Study 5
1.8 Data Mining Review 5
CHAPTER TWO: LITERATURE REVIEWS
2.1 The Data Base 6
2.2 Database Management System 7
2.2.1 Structure Query Language (SQL) 9
2.3 Data Warehouse 9
2.4 Data Mining 10
2.4.1 The Scope of Data Mining 11
2.4.2 Data Mining Tasks 12
2.5 Other Approach of Data Mining 13
2.6 Knowledge Discovery in Database 14
CHAPTER THREE: METHODOLOGY
3.1 Data Mining Technique 16
3.2Data Sampling16
3.3 Business understanding 17
3.4 Data understanding 17
3.5Data Preparation18
3.6 Modeling 19
3.6.1 Descriptive Tool 19
3.6.2 Predictive Tool 20
3.6.3 Classification Model 20
3.6.4 Types of Classification Algorithm 21
3.7 Naïve Bayesian Algorithm 23
3.7.1 Data Required for Naïve Bayesian Models 23
3.7.2 Technical Notes 24
CHAPTER FOUR: SYSTEM DESIGN AND IMPLEMENTATION
4.1 Organization of Database Table and Field 26
4.2 Problem Definition 26
4.3 Stages involved in solving the problem 27
CHAPTER FIVE
Summary, Conclusion and recommendations
5.1 Summary and Conclusion 33
5.2 Recommendations 33
References
Appendix I
Appendix II
CHAPTER ONE
1.0 INTRODUCTION
1.1 General Overview
In recent years, the technology of database has become more advanced where large amount of data is required to be stored in the databases. Data mining then attract more attention to extract valuable information from the raw data that institution can use for decision-making process. It applies modern statistical and computation technologies to expose useful information hidden within the large database to remain competitiveness among educational field, the institution need deep and enough knowledge for a better assessment, evaluation, planning and decision-making. Data mining helps institution to use their current reporting capabilities to discover and identity the hidden patterns in database and hence can be used to predict performance of the student.
Data mining can be viewed as a result of the natural evolution of information technology because before 1960 when database and information technology had not evolved, analysis of data was basically the primitive file processing which would not give the appropriate useful information despites the huge amount of time consumed. The evolutionary path of data mining has been witnessed in the database industry in the development of the following database and information technology.
1. Data collection and data creation
2. Data management (including data warehouse and data preparation)
3. Data analysis and understanding (involving data mining and data interpretation)
Moreover, data mining is also known as knowledge discovery in large database (KDD). Consequently, data mining consist of more than collecting and managing data; it also includes analysis and predictions. Important decision are often made based not on the information rich data stored in database but rather on decision maker’s institution, simply because maker does not have the tools to extract the valuable knowledge embedded in the vast amount of data.
1.2 Statement of the problem
It is not feasible for people to analyze great amounts of data without the assistance of appropriate computational tools. Therefore, the development of tools of an automatic and intelligent nature becomes essential for analyzing, interpreting, and correlating data in order to develop and select strategies in the context of each application. To serve this new context, the area of Knowledge Discovery in Databases (KDD), came into existence with great interest within the scientific, industrial, and commercial communities. The popular expression “Data Mining” is actually one of the stages of the Discovery of Knowledge in Databases. The term “KDD” was formally recognized in 1989 in reference to the broad concept of procuring knowledge from databases. One of the most popular definitions was proposed in 1996 by a group of researchers. According to Fayyad, et al. (1996): “KDD is a process with many stages, non-trivial, interactive, and iterative, for the identification of comprehensible, valid, and potentially useful patterns from large data sets”. It is of utmost desire to extract valuable information from large databases.
This research work therefore addresses the intelligent prediction of students’ course of study in higher institution based on the historical student academic data. This will facilitate better performance of students in high institutions.
1.3 Aim and Objectives of the Project
1.3.1 Aim
The aim of the research work is to develop a computer application software that will be able to predict student course of study in higher institution using classification algorithm.
1.3.2 Objectives
The following are the set of objectives addressed by the project work:
1. To develop and populate student academic database
2. To develop a computer application program that will be able to mine knowledge from the students’ academic database using Classification algorithm.
3. To predict student course of study according to their Post UTME cutoff.
4.To reduce the rate at which student admission is fortified.
1.4 Research Methodology
The executive of execution of research work includes the following;
1. analysis of some data mining techniques i.e. data mining techniques yield the benefit of automation on existing software and hardware platforms, can be implemented on new system as existing platform are upgraded and new products developed.
2. consideration of sources data record i.e. the admission office student database and the department student database.
3. consultation with some database developers or technologist.
4. browsing on internet to get access to some websites for relevant information.
5. consultation with some professional statistical analy.st
1.5 Significant of the Study
The use of data mining technique in predicting student course of study is very significant and relevant in any academic institution where record of each student has been collected and stored in a database e.g system the need for knowledge discovery in academic environment may be at admission level or faculty level. The institution may want to know from which mode of admission does they have student with better result. The institution may want to know the student performance in general courses and reason for such performance. The institution may want to predict the number of student that is to be admitted to specific department and faculty so as to allocate reasonable amount of resources to various departments for the session
1.6 Scope of the Study
The research work has been centered on only ‘O’ level, pre degree or UTME science courses only.
1.7 Limitation of the study
The research work is limited to the Faculty of Science and Engineering of Osun State Polytechnic, IREE.
1.8 Data mining review
Data mining is process of extract hidden pattern from data. As more data is gathered, with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transport this data into information. It is commonly used in a wide range of profiling practices, while data mining can be used to uncover patterns in data samples, it is important to be aware that the use of non representative sample of data may produce results that are not indicative of domain.
Similarly, data mining will not find pattern that may be present in the domain, if those was mined, there is a tendency for insufficiently knowledge consumer of the result to attribute magical abilities to data mining, treating the techniques as a sort of all seeing crystal.
.