Databases and exposure scenarios

A.A. 2018/2019
6
Crediti massimi
48
Ore totali
SSD
INF/01 SECS-S/01
Lingua
Inglese
Learning objectives
Non definiti
Expected learning outcomes
Non definiti
Corso singolo

Questo insegnamento non può essere seguito come corso singolo. Puoi trovare gli insegnamenti disponibili consultando il catalogo corsi singoli.

Course syllabus and organization

Edizione unica

Responsabile
Periodo
Primo semestre

Prerequisiti
Regarding Informatics and Databases:
no prerequisites. The exam is written (approximately 1 hour and 30 minutes), it covers all the topics presented during lectures, and it will consist in multiple-choice questions and exercises. The exam aims to verify that the course objectives have been achieved, namely, that students have learned the basic concepts related to the relational data model and to the relational query languages, also with application to biological databases.
Regarding Statistics applied to epidemiology:
basic concepts of mathematics and logic are needed for this module. The examination consists of a written test (approximately 1 hour and 30 minutes) including questions and problems related to all the topics developed during the course. Students are allowed to consult their own material during the examination and the use of a portable calculator is recommended (any device connected to the internet is prohibited).
Informatics and Database
Programma
Introduction to databases. Information systems, information and data. Database and Database
Management System (DBMS). Data models. Schemas and instances. Abstraction levels in DBMSs.
Database languages and users.
Relational databases. The relational model. Relations and tables. Relations with attributes. Relations
and databases. Incomplete information and null values. Integrity constraints. Definitions and properties
of keys. Primary key and foreign key constraints.
Query languages for relational databases. Relational algebra. Union, intersection, difference,
selection, projection, join. Queries in relational algebra.
Query languages for relational databases. SQL. The declarative nature of SQL. Simple SQL
queries. Aggregate queries. Group by queries. Set queries. Nested queries.
Conceptual database modeling. Data conceptualization and aggregation mechanisms. The Entity-
relationship model. Basic constructs of the model: entity, relationship, attribute. Generalization
hierarchies. Identifiers. Simple mapping rules from ER to relational tables.
Biological databases. Direct access to relational biological databases. The Ensembl database and its
structure (db schema). Use of SQL to query biological data. Application to the Ensembl database for
the extraction of genomic annotations.
Materiale di riferimento
- P. Atzeni, S. Ceri, S. Paraboschi, R. Torlone, Database Systems - Concepts, Languages and Architectures - Mc-Graw Hill, Available on-line at http://dbbook.dia.uniroma3.it/
Chapters: 1(whole), 2 (whole), 3(until §3.1.6 included)-4 (only § 4.2. and related subparagraphs)-5 (only § 5.2. and related subparagraphs)
- Teaching stuff (lecture slides) downloadable from the course web site (only for signed-up students).
Statistics applied to Epidemiology
Programma
Main criteria for the evaluation of scientific studies. Definitions: descriptive statistics, inferential statistics. Collecting data sets: populations and samples. Frequency tables, line graphs, bar graphs, frequency polygons, relative frequency graphs, pie charts, grouped data and histograms, the problem of the bin size selection. Sample mean, geometric sample mean, sample mode, sample deviations, sample absolute deviations, mean absolute deviation, sample variance, alternative expression for the sample variance, sample standard deviation. Accuracy and precision.
Sets of paired data, scatter diagram, best fitting line 'by eye'. Least squares regression line (vertical offsets, horizontal offsets): slope (linear regression coefficient), intercept, centre of the distribution. Qualitative and quantitative evaluation of linear regression. The correlation coefficient: definition, sign convention, covariance, range, alternative expressions. The coefficient of determination: definition, range, significance (for linear regression), geometric interpretation (proportion of variation explained by the linear regression). Interpreting correlation: Evans' guide (1996). Hints of non-linear regression (exponential, logarithmic, trigonometric, power,etc.). Odd ratio.
Necessity and sufficiency in logic. 'Correlation does not imply causation'. Spurious relationships. True positives, true negatives, false positives, false negatives. Graphical representation. Sensitivity, specificity.
Probability. Experiment and outcomes, sample space, events, union, intersection, Venn diagram, null event, disjoint events, complement event, extension to more then two events. Properties of probability (for disjoint and non-disjoint events), experiments having equally likely outcomes, conditional probability and independence, the Bayes' theorem, the Monty Hall problem.
Discrete random variables: probability distribution, expected value, properties of the expected value, variance, alternative expression for the variance, properties of the variance, standard deviation. Continuous random variables, probability density function. Normal continuous random variables, normal probability density function (Gaussian distribution). Standard normal continuous random variables, standard normal probability density function. Properties of the density functions, approximation rule, standardizing normally distributed random variables.
Population and sample: population mean and population variance, sample mean, expected value of the sample mean, variance of the sample mean, standard deviation of the sample mean, central limit theorem. Applications to measurement errors and to biological data sets.
Confidence intervals. P-value, null hypothesis, hypothesis testing and statistical significance, statistical power, sample size issues, dependence of the statistical power on the sample size, rules for the determination of the sample size.
Types of epidemiological studies. Observational studies: ecological, cross-sectional, case-control, cohort. Experimental studies: randomized controlled trials, field trials, community trials. Potential errors in epidemiological studies.
Materiale di riferimento
- Introductory Statistics - Sheldon M.Ross - Elsevier AP (Third Edition)
- Basic epidemiology - R. Bonita, R. Beaglehole, T. Kjellström - World Health Organization (2nd edition)
Moduli o unità didattiche
Informatics and Database
INF/01 - INFORMATICA
SECS-S/01 - STATISTICA
Lectures: 24 ore
Docente: Castano Silvana

Statistics applied to Epidemiology
INF/01 - INFORMATICA
SECS-S/01 - STATISTICA
Lectures: 24 ore
Docente: Di Domizio Alessandro

Professor(s)
Ricevimento:
Su appuntamento tramite email
Online OR Via Celoria 18 - Stanza 7012