General information


Subject type: Optional

Coordinator: Rosa Herrero Antón

Trimester: Second term

Credits: 6

Teaching staff: 

Sandra Obiol Madrid

Teaching languages


  • Spanish

The subject will be taught in Spanish. Students will be able to address the teacher in the language that is most comfortable for them. Some contents, transparencies and bibliography will be in English.

Skills


Basic skills
  • B2_That students know how to apply their knowledge to their job or vocation in a professional way and have the skills they demonstrate by developing and defending arguments and solving problems within their area of ​​study

  • B3_Students have the ability to gather and interpret relevant data (usually within their area of ​​study), to make judgments that include reflection on relevant social, scientific or ethical issues

  • B4_That students can convey information, ideas, problems and solutions to both specialized and non-specialized audiences

  • B5_That students have developed those learning skills necessary to undertake further studies with a high degree of autonomy

Description


This course introduces the basic methods of Classification (supervised learning) and Clustering (unsupervised learning) in the context of Big Data. Students will follow a case study for each of the learning methods with the help of the teachers. The students will do a total of four practices and at the end they will develop a project that will consist of the analysis of some data using the tools seen during the course. They will also have to explain the information they have been able to extract from the data. The project will have to be presented orally to a court.

Contents


LEFT

1 History of data science. From Business Intelligence to Big Data

2 Data quality and visualization. Reports and dashboards

3 Classification

3.1 GLM

3.2 Trees

3.3 Other methods

PART I

4 Clustering methods

4.1 Distance measurements

4.2 Kmeans

4.3 Hierarchical clustering

4.4 Gaussian Mixture Models

4.5 Optics

5 Association Rules

6 Text analysis

7 Recommendation and Reinforcement Learning Systems

8 Evaluation of the model

9 Project

Evaluation system


The final grade will be calculated as the weighted average of the different activities:

20% Placement Test (Exam)

20% Clustering Test (Exam)

45% Final project (with oral presentation)

15% Participation in practice class.

The subject will only be evaluated after 80% attendance

Recovery

The part of the final project can be recovered

Rules for carrying out the activities

For each activity, teachers will be informed of the particular rules and conditions that govern them.

Individual activities presuppose the student's commitment to carry them out individually. All those activities in which the student does not comply with this commitment will be considered suspended, regardless of their role (sender or receiver). Likewise, the activities that must be carried out in groups presuppose the commitment of the students who make up them to carry them out within the group. All those activities in which the group has not respected this commitment will be considered suspended, regardless of their role (sender or receiver).

In group activities, the teacher can, based on the information available, customize the grade for each member of the group.

It is optional for teachers to accept or not deliveries outside the deadlines indicated. In the event that these late deliveries are accepted, it is up to the teacher to decide whether to apply a penalty and the amount of this.

REFERENCES


Basic

Gareth, James and other authors (2017), An introduction to Statistical Learning: with Applications in R. Springer