Data Science Presentation

Eric Wolff, Mats Biermann, Laurine Fleck

Our Research Question

To what extent do technology, tourism, pharmaceutical, and energy stocks differ in terms of their price volatility and recovery dynamics during and after the COVID-19 pandemic?

Data Extraction and Cleaning

Stock Market Data

Source Yahoo Finance –> accessed via quantmod package in R
Type of data: Daily Stock Prices from January 2019 until December 2023
Stocks used:

Apple Inc. (AAPL) – Technology
BioNTech SE (BNTX) – Pharmaceutical
Deutsche Lufthansa AG (LHA.DE) – Tourism/Aviation
Shell plc (SHEL) – Energy

Example Stock

Limit to closing values and extracting them
Combining all Stocks closing Value in one table –> difficulties due to differing in number of rows
Solution for that was instead of combining vectors directly, all time series were first aligned by date using merge() on xts objects. Missing trading days were handled via NA values.

Combined Stock List

# Merging all closing-price time series by Date (keeps all dates; missing values become NA)
merged <- merge(LHA_cl, BNTX_cl, AAPL_cl, SHEL_cl, all = TRUE)

COVID-19 Dataset (John Hopkins University)

Data type: Daily confirmed COVID cases, continuously updated and well known for academic research and policy analysis

Cleaning of COVID Data

Changing from Wide Format –> long Format, so every row corresponds
Provincial –> country aggregation
Cumulative –> daily values
Global aggregation

Final COVID Dataset: Daily global confirmed cases, daily global deaths & time index

Visualization of COVID Data

#COVID Confirmed World Plot
ggplot(world, aes(x = date, y = confirmed)) +
  geom_col(width = 0.8) +
  theme_classic() +
  labs(
    title = "Covid-19 Global Confirmed Cases",
    x = "Date",
    y = "Daily confirmed cases"
  ) +
  theme(plot.title = element_text(hjust = 0.5))

Time Series Plots Examples: Shell and BioNtech

The time series plots can be used to identify initial structural breaks, which become particularly important later in the analysis.

Stationarity Tests

ADF test: H₀: time series is not stationary (unit root), H₁: time series is stationary
Stationary if tau value is smaller than critical value (s=5%)

Value of test-statistic is: -1.2465 2.6347

Critical Values (ADF Test):

Test Statistic	1%	5%	10%
τ₂	-3.43	-2.86	-2.57
φ₁	6.43	4.59	3.78

–> Not stationary

-1.2465 > -2.86
H₀ cannot be rejected

Time Series Becomes Stationary

First differences: make the time series stationary
remove trend
changes in prices with stable variance around the mean

–> provides reliable basis for future calculations

Forecasting (Pre-COVID Benchmark)

Our Objective: To generate a forecast for each stock indicating the range within which it would have traded in the absence of the pandemic
We decided for an ARIMA-model and therefore used the Website https://otexts.com/fpp3/arima-reading.html
No continuous timeplot: clean missing dates
Missing days: NA dates –> forward filling

Continuous Forecast

Dark blue: 80% forecast interval
Light blue: 95% forecast interval

–> Range where values are expected

–>Limitation: small data set

–> gross overview, but limitations

new solution: monthly forecasts

Forecast as a basis for determining recovery

New idea: stock recovered when reaching 80% interval

Limitations so far

Stock Data set cleaning because of unalligment in rows
Stocks have different currencies –> not as important at the moment, but could be a problem in the further process
COVID-Data not optional in original form as well –> break down of data
Generally filtering the enormous amount of data according to our “needs”
Only considering stock prices in relation to confirmed Covid cases, also due to the limited time frame
Should we use the stationary time series for the further procedure?