General information


Subject type: Optional

Coordinator: Vladimir Bellavista Parent

Trimester: Second term

Credits: 6

Teaching staff: 

Sandra Obiol Madrid

Academic year: 2025

Teaching course: 4

Languages ​​of instruction


  • Catalan

The subject will be taught in Catalan. Students will be able to address the teacher in the language they are most comfortable with. Some content, transparencies and bibliography will be in English.

Competencies / Learning Outcomes


Basic skills
  • B2_That students know how to apply their knowledge to their job or vocation in a professional way and have the skills they demonstrate by developing and defending arguments and solving problems within their area of ​​study

  • B3_Students have the ability to gather and interpret relevant data (usually within their area of ​​study), to make judgments that include reflection on relevant social, scientific or ethical issues

  • B4_That students can convey information, ideas, problems and solutions to both specialized and non-specialized audiences

  • B5_That students have developed those learning skills necessary to undertake further studies with a high degree of autonomy

Presentation of the subject


This course introduces the basic methods of Classification (supervised learning) and Clustering (unsupervised learning) in the context of Big Data. Students will follow a case study for each of the learning methods with the help of the teacher. Students will develop a project that will consist of analyzing data using the tools seen during the course. They will also have to explain the information they have been able to extract from the data. The project must be presented orally in class.

Contents


LEFT

1 History of data science. From Business Intelligence to Big Data

2 Data quality and visualization. Reports and dashboards

3 Classification

3.1 GLM

3.2 Trees

3.3 Other methods

PART I

4 Clustering methods

4.1 Distance measurements

4.2 Kmeans

4.3 Hierarchical clustering

4.4 Gaussian Mixture Models

4.5 Optics

5 Association Rules

6 Text analysis

7 Recommendation and Reinforcement Learning Systems

8 Evaluation of the model

9 Project

Activities and evaluation system


The final grade will be calculated as the weighted average of the different activities:

  • 20% Placement Test (Exam)
  • 20% Clustering Test (Exam)
  • 45% Final project (with oral presentation)
  • 15% Participation in practical classes.

The subject will only be evaluated if more than 80% attendance is achieved.

Recovery

The final project part can be recovered.

Rules for carrying out the activities

For each activity, the teaching staff will inform about the specific rules and conditions that govern it. Individual activities presuppose the student's commitment to carry them out individually. All activities in which the student does not comply with this commitment will be considered suspended, regardless of their role (sender or receiver). Likewise, activities that must be carried out in a group presuppose the commitment of the students who make up the group to carry them out within the group. All activities in which the group has not respected this commitment will be considered suspended, regardless of their role (sender or receiver). In activities carried out in groups, the teacher may, based on the information at their disposal, personalize the grade for each member of the group.

It is up to the professor to decide whether or not to accept submissions outside the indicated deadlines. In the event that these late submissions are accepted, it is up to the professor to decide whether to apply any penalty and its amount.

Use of Generative Artificial Intelligence

The use of generative artificial intelligences (IAGs) must be limited to those aspects that are not fundamental in the context of the subject. They can be used, critically, as a mechanism to resolve doubts about the subject and/or to improve the writing of deliverable documents and/or as an aid in the generation of auxiliary code that is outside the scope of the subject topics. In the second case (improvement of the writing) the participation of IAG in the writing must be made explicit in the document. In the last case (code generation) it will be essential to mention its nature as “generated by IAG” by explaining the model used and the prompt supplied, even if it has been subsequently personalized and/or modified. IAGs may not be used to generate programming code, not even in the form of fragments, when this code is within the scope of the subject topics and/or is of an assessable nature. This prohibition remains even if the code is subsequently personalized and/or modified. If you have any doubts regarding the legitimacy or not of the use of IAGs, you must contact, a priori, the professor of the subject. 

Bibliography


Basic

Gareth, James and other authors (2017), An introduction to Statistical Learning: with Applications in R. Springer