Course details
Data Storage and Preparation
UPA Acad. year 2025/2026 Winter semester 5 credits
The course focuses on modern database systems as typical data sources for knowledge discovery and further on the preparation of data for knowledge discovery. Discussed are extended relational (object-relational, with support for working with XML and JSON documents), spatial, and NoSQL database systems. The corresponding database model, the way of working with data and some methods of indexing are explained. In the context of the knowledge discovery process, attention is paid to the descriptive characteristics of data and visualization techniques used to data understanding. In addition, approaches to solving typical data pre-processing tasks for knowledge discovery, such as data cleaning, integration, transformation, reduction, etc. are explained. Approaches to information extraction from the web are also presented and several real case studies are presented.
Guarantor
Course coordinator
Language of instruction
Completion
Time span
- 26 hrs lectures
- 6 hrs seminar
- 6 hrs pc labs
- 14 hrs projects
Assessment points
- 56 pts final exam (written part)
- 20 pts mid-term test (written part)
- 24 pts projects
Department
Lecturer
Burgetová Ivana, Ing., Ph.D. (DIFS)
Kolář Dušan, doc. Dr. Ing. (DIFS)
Rychlý Marek, RNDr., Ph.D. (DIFS)
Instructor
Burgetová Ivana, Ing., Ph.D. (DIFS)
Rychlý Marek, RNDr., Ph.D. (DIFS)
Learning objectives
The aim of the course is to explain the historical development of database technologies, motivation of knowledge discovery from data and basic steps of knowledge discovery process, to explain essence, properties and the use of extended relational and NoSQL databases as data sources for knowledge discovery and to explain approaches and methods used for data understanding and data pre-processing for knowledge discovery.
Students will be able to store and manipulate data in suitable database systems, to explore data and prepare data for modelling within knowledge discovery process.
- Student is better able to work with data in various situations.
- Student improves in solving small projects in a small team.
Prerequisite knowledge and skills
- Fundamentals of relational databases and SQL.
- Object-oriented paradigm.
- Fundamentals of XML.
- Fundaments of computational geometry.
- Fundaments of statistics and probability.
Study literature
- Dunckley, L.: Multimedia Databases: An Object-Relational Approach. Pearson Education, 2003, p. 464, ISBN 0-201-78899-3
Fundamental literature
- Lemahieu, W., Broucke, S., Baesens, B.: Principles of Database Management. Cambridge University Press. 2018, 780 p.
- Kim, W. (ed.): Modern Database Systems, ACM Press, 1995, ISBN 0-201-59098-0
- Melton, J.: Advanced SQL: 1999 - Understanding Object-Relational and Other Advanced. Morgan Kaufmann, 2002, 562 s., ISBN 1-558-60677-7
- Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Third Edition. Morgan Kaufmann Publishers, 2012, p. 703, ISBN 978-0-12-381479-1
- Skiena, S.S.: The Data Science Design Manual. Springer, 2017, 445 s. ISBN 978-3-319-55443-3.
- Shekhar, S., Chawla, S.: Spatial Databases: A Tour, Prentice Hall, 2002/2003, 262 s., ISBN 0-13-017480-7
- Gaede, V., Günther, O.: Multidimensional Access Methods, ACM Computing Surveys, Vol. 30, No. 2, 1998, s. 170-231.
Syllabus of lectures
- Introduction, object-oriented approach in databases.
- NoSQL databases I - introduction to NoSQL, CAP theorem and BASE, key-value databases, data partitioning and distribution.
- NoSQL databases II -data models in NoSQL databases (column, document, and graph databases), querying and data aggregation, NewSQL databases.
- Data preparation - data understanding: descriptive characteristics, visualization techniques, correlation analysis.
- Data preparation - data pre-processing I: data cleaning and integration.
- Data preparation - data pre-processing II: data reduction, imbalanced data, data transformation, other data pre-processing tasks.
- Midterm exam.
- Web scraping.
- Semantic web and linked data.
- Languages and systems for knowledge discovery, real case studies.
- Support for working with XML and JSON documents in databases.
- Spatial databases.
- Indexing of multidimensional data.
Syllabus of seminars
- Objects and documents in databases
- NoSQL databases
- Knowledge discovery from data - data preprocessing
Syllabus of computer exercises
- Objects and documents in databases
- NoSQL databases
- Knowledge discovery from data - data preprocessing
Syllabus - others, projects and individual work of students
Creating an application for processing large structured and unstructured data, which includes, among other things, obtaining and retrieving data, preparing them for further use (e.g., knowledge discovery in databases) and creating descriptive characteristics for selected data.
Progress assessment
- Mid-term written exam; there is no resit; excused absences are solved by the guarantor deputy.
- The implementation and submission of the project results in the prescribed terms; excused absences are solved by the assistant.
- For activities during the semester (except for the final exam), a student must earn at least 20 points for receiving the credit and thus for entering the exam.
- Final exam with; the minimal number of points which can be obtained from the final exam is 20 (otherwise, no points will be assigned to the student); excused absences are solved by the guarantor deputy.
Course inclusion in study plans