주가 수익률은 정말 정규분포를 따를까?

hedge_hara_hedge 2020. 7. 27. 01:24

2020. 7. 27. 01:24

주가 수익률이 정규분포를 따른다고 가정한 금융 모델들이 많이 있지만, 사실 정말 그런건지 의심이 되었습니다.

그래서 파이썬의 scipy패키지를 이용해서 실험적으로 확인해봤어요.

import numpy as np
import pandas as pd
import FinanceDataReader as fdr

from scipy.stats import norm, laplace, t

import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (12, 9)
plt.rcParams['figure.dpi'] = 300
plt.rcParams['savefig.dpi'] = 300
plt.rcParams['lines.linewidth'] = 1
plt.rcParams['lines.color'] = 'r'
plt.rcParams['axes.grid'] = True

먼저 패키지 import하고 matplotlib.pyplot의 전역 설정을 해줍니다

fig1 = plt.figure(1)

# SPY daily return
ax1 = plt.subplot(511)
ax1.set_title('Historical S&P500(SPY) daily log returns')

# Normal distribution
ax2 = plt.subplot(512, sharex=ax1)
ax2.set_title('Normal dist.')

# Laplace distribution
ax3 = plt.subplot(513, sharex=ax1)
ax3.set_title('Laplace dist.')

# Student's T distribution
ax4 = plt.subplot(514, sharex=ax1)
ax4.set_title('Student T\'s dist.')

# Total
ax5 = plt.subplot(515, sharex=ax1)
ax5.set_title('Comparison')

subplot 설정해주고

df_SPY = fdr.DataReader('SPY')

bday = df_SPY.index
count_bday = len(bday)
bin_div10 = int(count_bday / 10)
print(f'counted business days: {count_bday}')

counted business days: 6934

SPY 가격정보와 장 열린 날 수를 계산해봅시다.

SPY_close = pd.Series.to_numpy(df_SPY['Close'])
log_ret_SPY_close = np.diff(np.log(SPY_close))
ax1.hist(log_ret_SPY_close, density=True, histtype='stepfilled', alpha=0.5, bins=bin_div10)

로그 수익률을 계산합니다. 로그수익률을 사용하면 일반수익률에 비해 아래와 같은 장점이 있습니다.

'0'을 중심으로 대칭적(확률 모형 계산에 용이)
여러 기간의 로그 수익률 간 단순 합산(+)으로 전체 기간의 로그 수익률 계산 가능

로그 수익률에 대해서 더 자세히 알고싶으면 여기:
http://tedware.kr/posts/212
https://en.wikipedia.org/wiki/Rate_of_return#Logarithmic_or_continuously_compounded_return

그리고 histogram으로 누적 분포를 plot해줍니다.

S&P500(SPY ETF)의 일간 주가 로그 수익률과 여러 확률분포의 비교

결과는 미리 보여드릴게요.

맨 위는 실제 과거의 S&P500 주가 일간 수익률

아래 3개는 순서대로 정규분포, 라플라스분포, T분포를 데이터에 fitting한 것입니다.

마지막은 실제 주가분포와 각각의 확률분포 실험을 비교했어요. 결과는..재밌네요! 다시 코드로 돌아가죠.

# imaginary daily price return distribution by Normal distribution
norm_fit_param = norm.fit(log_ret_SPY_close)
print(f'Norm dist fit param: {norm_fit_param}')

img_log_ret0 = norm.rvs(*norm_fit_param, size=count_bday, random_state=None)
print(f'Norm rvs: {img_log_ret0}')

ax2.hist(img_log_ret0, density=True, histtype='stepfilled', alpha=0.5, bins=bin_div10)
# ax2.legend(loc='best', frameon=False)

이제 본격적으로 확률분포를 위 데이터에 fitting시켜봅니다.

정규분포(normal distribution) 먼저 해보죠.

scipy.stats.norm에 내장된 fit 메서드를 사용해서 수익률 데이터를 넣고, 결과를 norm_fit_param 변수에 받아옵니다. 정규분포의 경우 loc(평균)과 scale(분산) 순서로 튜플로 반환할 것입니다.

그리고 해당 확률분포를 따르는 난수를 만들어봅시다. norm.rvs는 주어진 param을 따르는 난수를 생성해주는 메서드입니다. size에 넣는 수 만큼 list에 원소로 집어넣어서 반환해줘요. 지금은 SPY 거래일 수 만큼 난수를 생성해봅시다!

rvs 메서드로 만든 상상의 수익률을 히스토그램으로 plot합니다.

결과는 위에 그림 참고~

# imaginary daily price return distribution by Laplace distribution

laplace_fit_param = laplace.fit(log_ret_SPY_close)
print(f'Laplace dist fit param: {laplace_fit_param}')

img_log_ret1 = laplace.rvs(*laplace_fit_param, size=count_bday, random_state=None)
print(f'Laplace rvs: {img_log_ret1}')

ax3.hist(img_log_ret1, density=True, histtype='stepfilled', alpha=0.5, bins=bin_div10)
# ax3.legend(loc='best', frameon=False)

라플라스분포에 대해서도 똑같이 해줘요. 라플라스 분포도 인자가 loc과 scale 두개이므로 fitting parameter를 튜플로 반환합니다.

# imaginary daily price return distribution by Student's T distribution

t_fit_param = t.fit(log_ret_SPY_close)
print(f'T dist fit param: {t_fit_param}')

img_log_ret2 = t.rvs(*t_fit_param, size=count_bday, random_state=None)
print(f'T rvs: {img_log_ret2}')

ax4.hist(img_log_ret2, density=True, histtype='stepfilled', alpha=0.5, bins=bin_div10)
# ax4.legend(loc='best', frameon=False)

마지막으로 Student's T분포입니다. 정규분포랑 대표적으로 비교되는 확률변수인데, Fat-tail이 두드러지는 확률모델이에요.

사실 정규분포는 꼬리쪽 분포에 사건이 발생할 확률이 매우 낮습니다. Fat-tail 사건(하루만에 폭락, 폭등)에 대한 설명이 너무 설득력이 없는데 주식판에서는 정말 흔하게 발생하는 일입니다. 예를 들면 1929 대공황, 1987 블랙 먼데이, 2000년 닷컴버블, 2008 세계금융위기, 그리고 2020 코로나 붕괴까지..

그리고 평범한 날들은 수익률이 0에 가깝게 움직이는 경우가 훨씬 많았어요.

그림만 봐도.. 주황색 선은 실제 수익률에 잘 fitting이 되지 않아요.

대신 Student's T 분포나 Laplace분포를 이용해 피팅하면 훨씬 설득력있는 확률모형을 만들 수 있을것 같습니다.

Student's T분포는 특성상 fat-tail이 나름 설명돼요. 근데 0에 가까운 움직임은 약간 모자란게 조금 아쉽습니다.

라플라스분포는 특성상 중심이 뾰족하고 fat-tail이 잘 설명되는 특징이 있습니다.

주가 확률 보행 모델을 만들땐 라플라스 분포를 이용하면 좋겠군요.

오늘 참고한 글: https://tradeoptionswithme.com/probability-dsitribution-of-stock-market-returns/

The True Probability Distribution of Stock Market Returns

In This Article, You Will Learn The True Probability Distribution Of Stocks And The Significant Risks Associated With Being Unaware Of It.

tradeoptionswithme.com

참고로 코시분포를 사용하면 진짜 black swan이 모델링될 수도 있습니다. 근데 좀 다루기가 어려운게 평균이나 분산이 존재하지 않는 특이한 확률분포에요. 이걸 써먹으러면 parameter fitting 잘 되어야 할 것 같네요.

저작자표시 비영리 동일조건 (새창열림)

'파이썬' 카테고리의 다른 글

파이썬 pandas-datareader로 네이버 금융 API를 사용할수 있다(?) (0)	2020.07.29
파이썬으로 수정주가 계산하기! (4)	2020.06.23
파이썬 패키지를 이용한 주가 데이터 수집 방법_1 (1)	2020.06.14

주근야투: 퇴근 후 투자 공부

주가 수익률은 정말 정규분포를 따를까?

'파이썬' 카테고리의 다른 글

+ Recent posts

티스토리툴바