PROGRAMMING FOR BUSINESS ANALYTICS
2018/2019, Semester 1
School of Business (Analytics & Operations)
Modular Credits: 4
Students are expected to grasp good understanding of fundamental Statistics and become proficient in the extensive use of Python in Data Analytics. Moreover, many useful Python packages are discussed to equip students with solid technical skills to conduct real business analytics projects in the future.
Lecture + Tutorial. The tutorials will start from Week 3
Setup of Python environment
Python Basics I
— Data Structures: Data types, List, Tuple and Dictionary.
Python Basics II
— Conditionals and Flow Control: if-else & if-elif-else
Python Basics III
— For/while loops. Functions and Packages.
Python Basics IV
Basic Object-Oriented Programming (OOP)
Visualizing data. Histogram, Scatterplot, Boxplot, line plot of time series.
Python Package I
— Visualization with matplotlib & seaborn. Basic visualization and plotting tools.
Sampling & summarizing data with summary statistics. Mean/Median, Min/Max, Stdev/Variance, Quantile.
Python Package II:
Scientific Calculations with NumPy: NumPy multi-dimensional array, standard scientific functions such as logarithm.
Parameter estimation & confidence intervals.
Simulating data I: random number generator using NumPy.
Hypothesis testing and power function of a test
Simulating data II
Advanced Python I:
Organizing data with pandas. Data frame structures and data manipulation tools.
Advanced Python II:
Obtaining data from the Internet: crawling data from API (json/XML), crawling data from HTML page.
Advanced Python III:
data cleaning I: tidying data (subset, transformation)
Advanced Python IV:
data cleaning II: merging data.
Linear Regression I:
Predictive modelling, interpretation of regression outputs (coefficient, confidence interval, p-value, R^2)
Linear regression with statsmodels I.
Linear Regression II:
Categorical variables, modeling nonlinearity (transformation, interaction).
Linear regression with statsmodels II.
Linear Regression III:
Advanced topics: model selection, missing data, outliers.
Linear regression with statsmodelss III.
Analytics involves both a theoretical foundation of Statistics and practical capabilities of implementation via programming. The two are typically covered in different courses in an isolated and uncoordinated way. This module, as an introductory course to Data Analytics, aims to bring the theory and practice together and offer a holistic and organically connected view of both sides.
The module starts with basic Python programming. It then walks through Statistics topics from visualizing and summarizing data, to estimating model parameters and hypothesis testing, and to linear regression analysis. For each topic, Python illustrations and experimentations are interwoven inside so as to help students better appreciate statistical theory and also understand how it works in practice. The module finishes with practical issues on acquiring, cleaning, and organizing data using Python. This completes the cycle of data analysis and the students are able to independently execute a basic Data Analytics project.
Basic Python Programming
Functions and Packages
Understanding Data and Visualization
Summary Statistics and Empirical Cumulative Distribution Function
Histogram, Scatterplot, Boxplot, Line plot
Python Implementation: Matplotlib, Seaborn and NumPy packages
Statistical Concepts and Inference Techniques
Sampling and Population
Linear Regression Analysis
Model Assumptions and Interpretations
Categorical Variables and Interaction Effects
Obtaining data from the Internet
As well, several ethical issues will be discussed throughout the whole semester. The specific topics are as follows
Ethics for Data Visualization
Ethics for Data Collection and Analysis
Ethics for Making Generalization based on Sample Data
Class Participation 10%
Group Project 20%
Take-home Quiz 10%
Workload Components : A-B-C-D-E
A: no. of lecture hours per week
B: no. of tutorial hours per week
C: no. of lab hours per week
D: no. of hours for projects, assignments, fieldwork etc per week
E: no. of hours for preparatory work by a student per week