DECISION-MAKING STATISTICAL METHODS
Learning outcomes of the course unit
Part one is a thorought and advanced revision of the topics presentedin the basic Statistics course. We introduce new classes of tests todeal with more complex and realistic settings. Part two is anot-too-quick glance of today's analysis and mangement techniques thatbelongs to statistics and explorative data analysis.
Prerequisites: Statistica, Analisi AB, Analisi C.
Course contents summary
PART ONE: BASIC MULTIVARIATE TOOLS.
Revision on random variables and statistical inference.
Classical Z, T and F tests for comparing parameters for two normal populations.
Adaptation and independence tests (Fisher-Irwin, chi-square, contingency tables).
Regression: coefficients determination (linear and multilinear models,linearization; coefficient of determination, analysis of residuals,weighted min-squares); inference on coefficients (T and F tests)
Analysis of variance (one-way, two-ways and with interactions).
SECOND HALF: EXPLORATIVE DATA ANALYSIS
Graphical representation of very large and/or high-dimesional data sets(multivariate gaussian distribution, correlation matrix, eigenvaluesand eigenvectors)
Model adaptation (kernel functions, chi-squared test, Kolmogorov-Smirnov test)
Cluster analysis (distances; hierarchical tree clustering, linkage;k-means algorithms; EM algorithms, mixtures of measures, bayesianclassification).
Factor analysis (principal component analysis, common factor analysis,variables reduction, factor interpretation, factors rotations).
Discriminant function analysis (Fisher linear methods, variables reduction).
Neural networks (multilayer perceptron).
Overfitting and overlearning: when the model does not fit the population but the sample.
Non-parametric tests (signs, ranked signs, Wilcoxon's, for the independence of the sample).
Bayesian parametric tests (overview).
S. Ross - Introduction to probability and statistics for engineering and science
Hand, Mannila, Smyth - Principles of data mining.
Theory lessons are supported by practice lessons with the PC on the use of a spreadsheet to solve the problems of statistics.
The exam is in two parts.
The first part, at the computer, consists of a) multiple answer questions on the foundamental concepts of statistics; and b) some problems to solve with MS Excel on the first half of the course.
The second part is a written examination, in the form of a composition, on how are used the advanced techniques (both first and second part of the course) for given datasets.