


The topic focuses on key techniques for preparing time series data for analysis, such as peak detection, filtering, Fourier analysis (FFT), dynamic time warping (DTW), and prediction models. The analysis of spatio-temporal data has many applications. Sensors and other measurements increasingly produce massive amounts of data with space and time dimensions. Topic TS: Feature Extraction from Time Series data.
#BASIC DATA OF SCIENCE HOW TO#
The topic teaches the most important skills for (a) using probabilistic database technology, and (b) how to represent several kinds of data quality problems as uncertainty in the data. Probabilistic database technology has the potential of representing data quality problems as uncertainty in the data, and storing and querying it. Much effort in data preparation is devoted to dealing with data quality problems. Topic PDBDQ: Probabilistic Databases and Data Quality.This topic is preferably done in combination with "Data Mining". This topic teaches (a) text mining (analysing text directly), (b) rule-based techniques for information extraction, and (c) statistical techniques for information extraction and natural language processing. Most information is available in a form rather unsuitable for processing by computers, namely natural language text. Topic IENLP: Information Extraction Using Natural Language Processing.This topic teaches the most important standards and skills to manipulate data in these standards: (a) XML and its associated standards SQL/XML, XPath and XQuery for publishing and manipulation with both relational as well as XML databases, (b) JSON storage and manipulation in relational databases, and (c) Semantic Web standard RDF with its associated standards SPARQL for remote querying, also known as "Linked Open Data". There exist several data exchange and knowledge representation standards. The topic teaches (a) classification, (b) clustering, and (c) association rule mining. Mocanu)ĭata mining is about discovering patterns in large data sets involving methods from artificial intelligence, machine learning, statistics, and database systems. The topic teaches (a) data warehousing techniques for extracting and transforming data (ETL), (b) modeling data for analytic purposes using the multidimensional modeling approach of OLAP, and (c) data visualization techniques. They are, however, also effective for data science. The skills for Data Preparation and Data Visualization taught are in essence drawn from technologies developed for Business Intelligence. Topic DPV: Data Preparation and Visualization.Projects come from a variety of domains: health, logistics, business intelligence, transport, security, social media, etc. The list of projects and topics will be revised every year. The project grade is the grade for the course. The project is assessed by the project owner and a topic teacher. Supervision is provided during practical sessions twice per week shared with all topics and projects. The practical and project are done in pairs. The projects indicate which technical topics provide the necessary skills for doing the project, so the choice for project and technical topics should be coherent.Įach topic consists of one lecture and a practical for learning the basic skills. The data science skills are offered as technical topics from which the student has to choose two. A project is composed of a real-world data set and a challenge, i.e., what knowledge can potentially be extracted from the data or what the project owner wants to do with the data. There are several projects offered from which the student can choose. The course is assessed with a project that takes about half of the course. The course concept is geared towards self study in an assignment & project-driven manner, i.e., it is designed to offer a rich environment for flexible, effective, and efficient self study with ample guidance and supervision. The goal of the course Data Science is to teach several data science skills needed in various phases of data analysis projects. The need for data scientists and big data analysts is apparent in almost every aspect of our society, including computer science, medicine, physics, and the humanities. There is an increasing need for data scientists and big data engineers seen in job advertisements. They are the driving force behind the successful innovation of Internet companies like Google, Twitter, and Yahoo. Data scientists dig for value in data by analyzing for instance texts, application usage logs, and sensory data. Scientific and economic progress is increasingly powered by our capabilities to explore big data sets. Data Science is the emerging interdisciplinary field that lies at the intersection of computer science, statistics, visualization and the social sciences.
