BUSINESS ANALYTICS CAPSTONE PROJECT
2015/2016, Semester 1
School of Computing (Information Systems & Analytics)
Modular Credits: 4
Students are required to work (in teams) through a complete large-scale business analytics project. Related techniques and tools will be also taught in BT3101. Students will also sharpen communication skills through close team interactions, consultations, and formal presentations. Emphasis will be placed on analytics problem formulation, data collection form the Web, data preprocessing by R, data Integration by Pentaho Spoon, statistics analysis by R, visualization by Tableau, and simple decision analysis modeling by Excel. Students will be assessed based on their understanding and ability to apply business analytics knowledge and tools on several real-life datasets to answer questions with significant practical values.
Completed 64 MCs of core modules, and ST3131 Regression Analysis, and DSC3215 Stochastic Models in Management
Lectures, Tutorials, and Project
Tentative Lessons Plan:
• Week 1 August 10: NO CLASS. National Day Holiday.
• Week 2 August 17: by Dr. Huang
o First class, overview of this module and syllabus.
o Overview of 2 main datasets.
o Identifying Business analytics problems and examples related to our datasets.
o What are other datasets publicly available from the Internet?
• Week 3 August 24: Web scraping tutorial by TA.
o Form groups before the class of Week 3. I will use lottery to assign students to finalize group formation in Week 3.
• Week 4 August 31: Data processing by ETL tool: Pentaho Spoon by Dr. Huang.
o LinkedIn Dataset as an example.
• Week 5 September 7: Data cleaning by R by TA.
o Consultation about group topics.
• Week 6 September 14: Regression by R by TA.
o Consultation about group topics.
o Decide your topics before the end of Week 5.
o Proposal due around the end of recess week. More details to be announced.
• Week 7 September 28: Simple Data Mining and Text mining by R by Dr. Huang
• Week 8 October 5: Consultation
• Week 9 October 12: Visualizing data and reporting by Tableau by Dr. Huang or Guest Lecturer from Tableau.
Tableau's data visualization software is provided through the Tableau for Teaching program.
• Week 10 October 19: Consultation
• Week 11 October 26: Review of Decision Models and Statistical Models of each group by Dr. Huang.
o The purpose of this session is to help everyone learn background information about the statistics and decision models used in OTHER GROUP’s projects. As a result, students can better appreciate Week 13’s final presentation. Also, students can learn more decision models and statistical models.
• Week 12: Consultation
• Week 13: Final Presentation
1. Identify and propose one large analytics problem that has real-world commercial value or social value.
• Under this large problem, you need to analyze as many interesting smaller questions as possible.
• Surely you can search on the Internet to find a creative idea.
• One way is you find an academic paper from Financial Times top 45 journals and try to replicate more than “50% of” the study. What does 50% mean?
At least one research question and/or one hypothesis is the same as that in the paper.
The dataset are the same by nature. For example, if the paper is about China Stock Market prices, it is fine that you use any stock market prices from any country in the world. More requirements about dataset later.
Preferably, the analysis has real-world (commercial) value and managerial implications.
2. Dataset requirement
• Must use information from at least one out of following two datasets: (1) Compustat/CRSP and (2) LinkedIn, and
You can also use other large public datasets. But approval from Dr. Huang is required first.
• You are required to find another data source and use at least 3 variables from that source. This dataset can be downloaded directly without programming.
• Must crawl one more dataset from the Internet and you use at least 3 variables. In other words, you need to join 3 datasets. It is fine that if you scrape two sources of data from the Internet.
• Must include at least one variable from text analysis. In other words, at least one of your variables must be created from unstructured textual data. Both datasets do not have this kind of variable. But closely related data source have textual information.
This textual variable can be scraped from the Internet or downloaded from a database.
If you use Compustat/CRSP, there are abundant textual data from the SEC EDGAR annual reports (form 10K). But HTML format varies and not easy to process. For this case, it counts as scrapping from the Internet.
If you use LinkedIn, HTML files have been downloaded but not in DB. Compared with annual reports, at least HTML files are in the same template. For this case, it does not count toward scraping from the Internet unless you write your own program to scrape new LinkedIn data from the Internet.
3. At least one underlying decision modeling, for example,
Inventory model or other decision models learned from OR/MS modules.
Portfolio management, Real Options, project valuation (NPV, IRR…etc.) in finance and accounting.
Consumer choice in marketing.
Pricing, advertising, product line decision, or any other marketing decision models.
Other simple optimization models that can be solved by Excel.
4. For the method of statistical analysis. In addition to basic regression analysis. You have two options. (1) One statistical analysis that is more complicated than OLS/FE/RE/ARIMA OR (2) one analysis by data mining algorithms.
5. Learn using visualization tools: Tableau.
1. Write a proposal about your BA project plan as if to corporate executives to seek approval of your project.
Visualizing and Exploring Data, Descriptive Statistical Measures
If you choose to work on an academic paper, then it is likely that your problem has public policy impacts and social benefits. In this case, you pretend you are writing to government officials to seek approval to fund and support your project.
2. 2 peer reviews, one due in Week 7, along with the submission of proposal. The other one is due in Week 13, along with the submission of the final report.
3. Submitting your codes and datasets for the TA to verify along with your final report. Details will be announced.
4. Presentation of results in Week 13.
5. Write a final report of up to 50 pages (1.5 line spacing, 12 fonts in Times New Roman, and 1 inch margins) including an executive summary. 50 pages include the executive summary, tables, and figures but exclude references. Final report due before Week 13’s presentation.
1. Proposal 10%
2. Peer Review 20%
3. Final Presentation 20%
4. Programs and Codes Checking 20%
5. Final Report 30%
Workload Components : A-B-C-D-E
A: no. of lecture hours per week
B: no. of tutorial hours per week
C: no. of lab hours per week
D: no. of hours for projects, assignments, fieldwork etc per week
E: no. of hours for preparatory work by a student per week