Guías Docentes Electrónicas
1. General information
Course:
DATA MINING
Code:
42348
Type:
ELECTIVE
ECTS credits:
6
Degree:
406 - UNDERGRADUATE DEGREE IN COMPUTER SCIENCE AND ENGINEERING (AB)
Academic year:
2023-24
Center:
604 - SCHOOL OF COMPUTER SCIENCE AND ENGINEERING (AB)
Group(s):
15 
Year:
4
Duration:
First semester
Main language:
Spanish
Second language:
English
Use of additional languages:
The theory and practical classes will be taught in Spanish; however, the majority of the materials such as slides, practice exercises, and problem statements will be in English since it is an English-Friendly subject.
English Friendly:
Y
Web site:
Bilingual:
N
Lecturer: JOSE ANTONIO GAMEZ MARTIN - Group(s): 15 
Building/Office
Department
Phone number
Email
Office hours
ESII/1.C.13
SISTEMAS INFORMÁTICOS
2473
jose.gamez@uclm.es
lunes 10:15-13:15 y martes 17:00-20:00

2. Pre-Requisites

We recommend that students are familiar with Computer Science concepts, like those in the previous courses of the Degree Program. In addition, this subject is based on the skills and knowledge acquired in the following ones:
- Logic
- Statistics
- Algorithm Design
- Intelligent Systems
- Knowledge Based Systems

3. Justification in the curriculum, relation to other subjects and to the profession

Data mining and machine learning are linked to the field of statistics and computer algorithms. They are based on techniques for the extraction of knowledge from data sets. In recent years, these disciplines are gaining importance due to the increase in data production -propitiated by phenomena such as the rise of the Internet or social networks- or the development of new techniques for obtaining genetic information. From a professional point of view, there is a rising demand for data scientists in fields as diverse as marketing, market analysis, security, or biology.


4. Degree competences achieved in this course
Course competences
Code Description
CM05 Ability to acquire, formalise, and represent human knowledge in a computable form for the solution of problems throughout a digital system in any application context, especially the one linked to computational aspects, perception, and behaviour in intelligent frames.
CM07 Ability to know and develop computational learning techniques, and design and implement applications and systems which could use them, including the ones for the automatic extraction of information and knowledge from great batches of information.
INS05 Argumentative skills to logically justify and explain decisions and opinions.
UCLM03 Accurate speaking and writing skills.
5. Objectives or Learning Outcomes
Course learning outcomes
Description
Development and implementation of a small to medium-sized information retrieval system.
Knowledge and development of computational learning techniques, both supervised and unsupervised, and design and implement applications and systems that use them.
Description and application of different phases of the discovery process of knowledge extraction from large volumes of data.
Additional outcomes
Description
To obtain conclusive results from the knowledge extraction process and to be able to present and justify them.
6. Units / Contents
  • Unit 1: Introduction to data mining.
  • Unit 2: The process of knowledge discovery from data.
  • Unit 3: Model validation and evaluation.
  • Unit 4: Nearest neigbhour (kNN) methods.
  • Unit 5: Numerical prediction: regression.
  • Unit 6: Classification and regression trees.
  • Unit 7: Probabilistic classifiers.
  • Unit 8: Dimensionality reduction (feature subset selection).
  • Unit 9: Ensembles (multiclassifiers)
  • Unit 10: Neural networks.
  • Unit 11: Clustering.
  • Unit 12: Association rules.
7. Activities, Units/Modules and Methodology
Training Activity Methodology Related Competences (only degrees before RD 822/2021) ECTS Hours As Com Description
Class Attendance (theory) [ON-SITE] Lectures CM05 CM07 INS05 1.26 31.5 N N It will be used by the lecturer to introduce the main concepts of each topic.
Final test [ON-SITE] Assessment tests CM05 CM07 INS05 UCLM03 0.1 2.5 Y Y Written exam. Individual. The official examination for the subject will be conducted individually. The same test applies to both continuous assessment and non-continuous assessment. In the extraordinary examination period, it will be recovered through an exam specifically scheduled by the ESII.
Workshops or seminars [ON-SITE] Lectures CM05 CM07 INS05 0.08 2 N N Seminar to introduce the tools to be used in the lab tasks.
Class Attendance (practical) [ON-SITE] Lectures CM05 CM07 INS05 0.06 1.5 N N The first part of each practical assignment is devoted to introduce/explain it.
Computer room practice [ON-SITE] Guided or supervised work CM05 CM07 INS05 0.66 16.5 N N Student practical work under lecturer supervision at the computer laboratory.
Problem solving and/or case studies [ON-SITE] Problem solving and exercises CM05 CM07 INS05 0.24 6 N N Classroom problem solving related to the different subjects studied.
Study and Exam Preparation [OFF-SITE] Self-study CM05 CM07 INS05 1.56 39 N N Self-study by the student (tests and exam preparation).
Other off-site activity [OFF-SITE] Practical or hands-on activities CM05 CM07 INS05 0.84 21 N N Self-study and programming hours to complete the laboratory assignments.
Practicum and practical activities report writing or preparation [OFF-SITE] Self-study CM05 CM07 INS05 UCLM03 0.42 10.5 Y Y Writing of the report to explain the main details of the programming assignments as well as to show the experimental results and conclusions achieved. If the practical assessments are not successfully completed during the continuous assessment, they can be recovered during the corresponding submission in the regular examination period, although the assignment may have slight variations in the instructions. The extraordinary and final examination period follows the conditions of the non-continuous assessment.
Writing of reports or projects [OFF-SITE] Project/Problem Based Learning (PBL) CM05 CM07 INS05 UCLM03 0.78 19.5 Y N Autonomous work to solve hand-writing exercises and case of study provided by the lecturer. Discussion in forums regarding these exercises and other case of study.
Total: 6 150
Total credits of in-class work: 2.4 Total class time hours: 60
Total credits of out of class work: 3.6 Total hours of out of class work: 90

As: Assessable training activity
Com: Training activity of compulsory overcoming (It will be essential to overcome both continuous and non-continuous assessment).

8. Evaluation criteria and Grading System
Evaluation System Continuous assessment Non-continuous evaluation * Description
Assessment of active participation 10.00% 0.00% Participation of the student in forums and class, mainly (but not only) related to exercises solving and discussion of different strategies.

Individual, non-compulsory.
Practicum and practical activities reports assessment 15.00% 15.00% A report must be done for each laboratory assignment. A minimum average mark of 4 (over 10) is required.

The assignment can change slightly for the non-continuous evaluation.
Laboratory sessions 15.00% 15.00% The code for each laboratory assignment will be asessed (efficiency, completeness, etc.). A minimum average mark of 4 (over 10) is required.

The assigment can change slightly for the non-continuous evaluation.
Other methods of assessment 15.00% 15.00% The code and results must be defended in an oral presentation to the lecturer. Questions will be addressed to both members of the couple. Different marks can be assigned depending on the answers. A minimum average mark of 4 (over 10) is required.

The assignment can change slightly for the non-continuous evaluation.
Final test 45.00% 55.00% Written exam of the subject. A minimum of 4 (over 10) is required.
Total: 100.00% 100.00%  
According to art. 4 of the UCLM Student Evaluation Regulations, it must be provided to students who cannot regularly attend face-to-face training activities the passing of the subject, having the right (art. 12.2) to be globally graded, in 2 annual calls per subject , an ordinary and an extraordinary one (evaluating 100% of the competences).

Evaluation criteria for the final exam:
  • Continuous assessment:
    - Participation in class and through the forums assumes the contribution of novel solutions and critical discussion of those already presented. Occasionally, exercises carried out during class will be handed in. However, this activity is not compulsory, so it is possible to pass without completing it.

    - Programming assignments must be handed in and defended on the dates assigned to each assignment. It is required to hand in the three assignments and to obtain an average grade >=4.

    - A minimum of 4 is also necessary for the written exam.

    - If the minimum of 4 is achieved in the exam and the lab assignments, then the grade for the course is:
    0.45*theory + 0.45*lab assignment+ 0.1*participation.

    Otherwise, the grade will be:
    minimum( 4.0, exam ) if the theory exam is taken, or "Not Presented" if the theory exam is not taken.

    Originality: The submission of any exercise (exam, practical report, code, problems, etc.) implicitly implies the declaration of originality by the authors, so that in case of detection of plagiarism, copying, etc., the appropriate disciplinary measures will be taken.

    NOTE: By default, the student will be assessed by continuous assessment. If you wish to change to non-continuous assessment, you must indicate this via the following link www.esiiab.uclm.es/alumnos/evaluacion.php before the end of the term.
  • Non-continuous evaluation:
    The student can take 100% of the mark, assessed throught the exam and the lab assignment evaluation. A specific deadline will be set for the lab assignment hand-in. The lab assignments can differ slightly form those used in the continuous assessment.

    The same rules as in the continuous assesment also apply here, except the percentage to compute the final grade if all the minimums are achieved:

    0.55*theory + 0.45*lab assignment

Specifications for the resit/retake exam:
The rules of non-continuous evaluation also apply here.
Specifications for the second resit / retake exam:
The rules of non-continuous evaluation also apply here.
9. Assignments, course calendar and important dates
Not related to the syllabus/contents
Hours hours
Class Attendance (theory) [PRESENCIAL][Lectures] 31.5
Final test [PRESENCIAL][Assessment tests] 2.5
Workshops or seminars [PRESENCIAL][Lectures] 2
Class Attendance (practical) [PRESENCIAL][Lectures] 1.5
Computer room practice [PRESENCIAL][Guided or supervised work] 16.5
Problem solving and/or case studies [PRESENCIAL][Problem solving and exercises] 6
Study and Exam Preparation [AUTÓNOMA][Self-study] 39
Other off-site activity [AUTÓNOMA][Practical or hands-on activities] 21
Practicum and practical activities report writing or preparation [AUTÓNOMA][Self-study] 10.5
Writing of reports or projects [AUTÓNOMA][Project/Problem Based Learning (PBL)] 19.5

Global activity
Activities hours
General comments about the planning: The planning is ORIENTATIVE, and may vary throughout the teaching period depending on teaching needs, holidays, or any other unforeseen cause. The weekly planning of the course can be found on the Virtual Campus platform (Moodle). The in-person activities are organized in three 1.5 hour classes per week. The specific classes to be used to cover the 6 credits (60 in-person hours) will be announced on CampusVirtual in due course.
10. Bibliography and Sources
Author(s) Title Book/Journal Citv Publishing house ISBN Year Description Link Catálogo biblioteca
 
 
Manuales de Python.  
Aurélien Géron Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition Libro O'Reilly Media, Inc. 9781492032649 2019 https://learning.oreilly.com/library/view/hands-on-machine-learning/9781492032632/  
García, Salvador, Luengo, Julián, Herrera, Francisco Data Preprocessing in Data Mining Springer 978-3-319-10246-7 2015 Ficha de la biblioteca
Joel Grus Data Science from Scratch: First Principles with Python Libro O'Reilly UK Ltd 978-1492041139 2019  
José Hernández Orallo, M.José Ramírez Quintana, Cèsar Ferri Ramírez INTRODUCCIÓN A LA MINERÍA DE DATOS Pearson 84 205 4091 9 2004  
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar Introduction to Data Mining Addison-Wesley Longman Publishing Co 0321321367 2005  
Witten, Frank & Hall Data Mining: Practical Machine Learning Tools and Techniques Morgan & Kauffmann 978-0-12-374856-0 2011 Ficha de la biblioteca
Xindong Wu, Vipin Kumar The Top Ten Algorithms in Data Mining Chapman and Hall/CRC 9781420089646 2009  



Web mantenido y actualizado por el Servicio de informática