BIG DATA ENGINEERING FOR ANALYTICS
2015/2016, Semester 2
Non-Faculty-Based Departments (Institute Of Systems Science)
Modular Credits: 4
Understand the growth of big data and need for a scalable processing framework. Understand the fundamental characteristics, storage, analysis techniques and the relevant distributions.
Gain expertise with the fault-tolerant and big data computing framework (e.g. HDFS, MapReduce and Yarn) by setting up Hadoop single as well as cluster nodes for processing big data. Crafting configurable and executable parallel jobs on top of using distributed and shared memory architecture (e.g. HDFS - Hadoop Distributed File System).
Perform data manipulation and querying (including updates, transactions, and indexes) big data applications dealing with high volume using NoSQL. Organize, store the collected data and manipulate by crafting queries. For example, using Hive and other related data tools.
Understand the fundamentals of big data query manipulation, various data storage option and type of aggregated data modeling. Choose an appropriate storage model based on the application requirements.
Understand the distributed computing essentials, storage needs, and relevant architectural mechanism in processing large amounts of structured and unstructured data.
Understand various machine learning techniques in big data context and how to implement these using 'Apache HAdoop Eco System'. Perform collaborative filtering, clustering and categorization.
There are no hard prerequisites in terms of existing courses, but it would be desirable for students to have some of familiarity with distributed computing, business intelligence and business analytics.
Lectures, discussions, case studies, workshops and projects.
This course equips students with the in-depth data engineering and data analytics skills that are required to engineer big data solutions to solve real world business problems. The first half of the course delivers in-depth knowledge of the engineering aspects involved in the storage, processing and visualization of big data sets. It examines state-of-the-art distributed architectures and platforms (both cloud hosted and traditional) and their programming frameworks and libraries. The second half of the course focuses on the data analytics techniques, technologies and tools that combine with these architectures and frameworks to solve real world big data problems.
Workload Components : A-B-C-D-E
A: no. of lecture hours per week
B: no. of tutorial hours per week
C: no. of lab hours per week
D: no. of hours for projects, assignments, fieldwork etc per week
E: no. of hours for preparatory work by a student per week