Stampa la pagina Condividi su Google Condividi su Twitter Condividi su Facebook Statistical Learning and Data Mining

Prof. Alfonso Iodice D’Enza

Contact information: iodicede@unicas.it

Term: Second Semester

Credits (ECTS): 6

Prerequisites: basic statistics knowledge

Language of Instruction: English

Class hours: 42



LEARNING OBJECTIVES:

Cognitive / Knowledge skills

  • Develop an understanding of the statistical learning framework, with general concepts for model building, selection and evaluation.
  • Understand the trade-off’s related to the analysis aim, to the nature and to the amount of available data.
  • Study the theoretical foundation of the basic (linear) methods for regression and classification.
  • Study the computational approaches that support the effective application of the studied methods.

Analytical / Critical Thinking Skills

  • Learn the basic programming skills to implement linear  methods for regression and classification.
  • Interpret the results and identify the most effective way to analyze the available data.
  • Learn to present the results in a rigorous way yet letting non technical audience to understand the main findings of an analysis.

 

COURSE DESCRIPTION:

In the first part of course general concepts applicable to both regression and classification problems. The definition of statistical learning, training and test errors, trade-off’s in choosing the right model. Then the linear models for regression are described: from simple to multiple regression, qualitative predictors, interactions and common issues in the application of such models. Afterwards, classification problems are described, from linear ones, e.g. logistic regression and linear discriminant analysis, to non linear ones, e.g. quadratic discriminant analysis.
The last part of the course is an introduction to  model selection and regularization and to resampling methods for the estimate of the test error (cross validation) and for assessing the accuracy of an estimator (bootstrap). All the methods will be implemented and applied in cran-R metalanguage. 

 

INSTRUCTIONAL FORMAT:

The class will meet for 2 hours (gross of interclass break), twice a week, for a total of 21 sessions. After an introduction aimed at providing the needed background, participants are required to do both conceptual and applied homework. Classes will consist of a lecture by the instructor, till a topic is completely covered. After each topic, a class will be devoted to the presentation by the participants of the assigned homework.

 

TENTATIVE COURSE SCHEDULE:

Week 1 (Textbook, chapter 1 and 2)
Introduction to the Course
Presentation of the Available materials
Clear Statement of Expected Mutual Requirements
Regression and classification problems
Trade-off’s in statistical learning
Parametric and non parametric methods
Supervised vs non supervised

Week 2 (Textbook, chapters 1 and 2)
Introduction to the R meta-language
Measuring errors in regression and classification models

Week 3 (Textbook, chapter 3)
Introduction to linear models for regression
Model fit and inference
Variance estimator

Week 4 (Textbook, chapter 3)
Confidence and prediction intervals
Algebraic formalization of multiple regression
Global test and block-based test
Qualitative predictors and interaction effects

Week 5 (Textbook, chapter 3)
Polynomial regression
Violations of model assumption
Correlated errors
Heteroschedasticity
Multicollinearity

Week 6 (Textbook, chapter 3)
Practical examples of linear regression in R
Implementation and interpretation
Model diagnostics

Week 7 (Textbook, chapter 4)
Classification methods
Logistic regression
Link function and model fit
Linear discriminant analysis
Bayes rule and difference with logistic approach

Week 8 (Textbook, chapter 4)
Multiple LDA and Logit
Class-specific errors
Roc curve
Quadratic discriminant analysis
Comparison of classification methods

Week 9 (Textbook, chapter 4)
Practical examples of classification in R
Implementation and interpretation
Model diagnostics

Week 10 (Textbook, chapters 5 and 6)
Resampling methods
Validation approaches
Bootstrap
Model selection

Week 11 (Textbook, chapter 6)
Shrinkage methods
Ridge regression
Lasso regression
Model selection via regularization

 

WORKLOAD EXPECTATIONS:

All students are expected to spend at least 2,5 hours of time on academic studies outside of, and in addition to, each hour of class time.

 

FORMS OF ASSESSMENT:

The instructor will use numerous and differentiated forms of assessment to calculate the final grade you receive for this course. For the record, these are listed and weighted below. The content, criteria and specific requirements for each assessment category will be explained in greater detail in class. Any questions about the requirements should be discussed directly with your faculty well in advance of the due date for each assignment.

 

FORM OF ASSESSMENT

VALUE

Class Participation        

10%

Homework

30%

Homework discussion

15%

Final project

30%

Final project presentation

15%

 

 ASSESSMENT OVERVIEW:

Class Participation:  This grade will be calculated to reflect your participation in class discussions, your capacity to catch the rationale of the subjects presented in class.

Homework: the homework is supposed to be returned one week after their assignment. Different homework are assigned to different participants, and they refer to both conceptual and applied aspects of the covered topic. The applied part of the homework will require R implementations of the methods.

Homework discussion: the homework is presented in class and each participant is required to present the problem and the corresponding finding to the instructor as well as to the classmates.

Final project: In the final project the participant will implement a throughout analysis of a real data set, from the data pre-processing phase, to the analysis and to the presentation and interpretation of the results.

Final project presentation: The final project will be presented in a one-day workshop in presence of other participants, PhD’s and other scholars interested in the topic.

 

CLASS/INSTRUCTOR POLICIES:

Professionalism and communications: As a student, you are expected to maintain a professional, respectful and conscientious manner in the classroom with your instructors and fellow peers.
You are expected to take your academic work seriously and engage actively in your classes.. Advance preparation, completing your assignments, showing a focused and respectful attitude is expected of all students. Simply showing up for class or meeting minimum outlined criteria will not earn you a good grade in this course. Utilizing communications, properly addressing your faculty and staff, asking questions and expressing your views respectfully demonstrate your professionalism and cultural sensitivity.

Attendance and Classroom behavior: Although attendance is not compulsory, it is highly recommended. All students must have a respectful attitude towards the professor as well as the classmates.

Arriving late / departing early from Class: Once they have decided to attend, students must behave consistently. Arriving late or leaving class early is disruptive and shows a lack of respect for instructor and fellow students.

Make-up classes: The instructor reserves the right to schedule make-up classes in the event of an unforeseen or unavoidable schedule change. Make-up classes may be scheduled outside of typical class hours, as necessary. 

Missing Examinations: Examinations will not be rescheduled. Pre-arranged travel or anticipated absence does not constitute an emergency and requests for missing or rescheduling exams will not be granted.

Use of Cell Phones, Laptops and Other Electronic Devices: Always check with your instructor about acceptable usage of electronic devices in class. Inappropriate usage of your electronic devices will result in a warning and may lead to a deduction in participation grades. Use of a cell phone for phone calls, text messages, emails, or any other purposes during class is impolite, inappropriate and prohibited Faculty determines whether laptops will be allowed in class.

 

REQUIRED READINGS:

Listed below are the required course textbooks and additional readings. These are required materials for the course and you are expected to have constant access to them from the very beginning of the course for reading, highlighting and note-taking. It is required that you have unrestricted access to each. Access to additional sources required for certain class sessions may be provided in paper or electronic format consistent with applicable copyright legislation.

Required textsAn Introduction to Statistical Learning, with Applications in R. James, G., Witten, D., Hastie, T. and Tibshirani, R.. Springer, 2013.

Online Reference & Research Tools:

 cran-R: https://cran.r-project.org  to download the installer of the R program. It is free and available for Windows, OS-X and Linux.

[Ultima modifica: mercoledì 13 settembre 2017]