Primary share¶
PRIROW - Primary share flag
GVKEY |
IID |
Exchange |
PRIROW |
---|---|---|---|
100131 |
01W |
London Stock Exchange |
01W |
100131 |
02W |
Mexican Stock Exchange |
01W |
100131 |
03W |
Bats Chi-X Europe |
01W |
01W is the primary share.
Data
Security Daily: stock price
Fundamentals Annual (Quarterly): accounting data
Coverage: global
Downlaod data from Compustat Global Security Daily via WRDS web access.
Dataset: Compustat Global Security Daily
Time period: 01 Jul 2020 to 31 Dec 2020
Search the entire database
Variables
iid
ajexdi
prccd
trfd
isin
tpci
fic
monthend
prirow
Output format
Tab-delimited text
Uncompressed
Date format: YYMMDDn8
GVKEY |
IID |
Exchange |
---|---|---|
100131 |
01W |
London Stock Exchange |
100131 |
02W |
Mexican Stock Exchange |
100131 |
03W |
Bats Chi-X Europe |
GVKEY + IID will identify security, e.g. 10013101W.
$\text{adjusted price} = \frac{prccd}{cfacpr} \times trfd$
Import packages
import pandas as pd
import numpy as np
Import Compustat Global data
url = 'https://www.dropbox.com/s/484ynn8mzihgj9q/uk.txt?dl=1'
uk = pd.read_csv(url, sep='\t', low_memory=False, dtype={'gvkey': str})
Clean data
# Keep common shares
uk = uk[uk['tpci']=='0'].copy()
# Make sure fic is GBR
uk = uk[uk['fic']=='GBR'].copy()
# Keep primary share
uk = uk[uk['iid']==uk['prirow']].copy()
# Check duplicates
uk = uk.drop_duplicates(['gvkey', 'iid', 'datadate']).copy()
# Security level ID
uk['stkcd'] = uk['gvkey'].astype(str) + uk['iid']
# Adjusted price
uk['p_adj'] = (uk['prccd']/uk['ajexdi']) * uk['trfd']
Generate date index
date_index = uk.drop_duplicates('datadate')[['datadate']].copy()
date_index = date_index.sort_values('datadate', ignore_index=True)
date_index['date_idx'] = date_index.index + 1
Calculate daily return
uk1 = uk.merge(date_index, how='inner', on='datadate')
uk1 = uk1.sort_values(['stkcd', 'datadate'], ignore_index=True)
uk1['ldate_idx'] = uk1.groupby('stkcd')['date_idx'].shift(1)
uk1['lp_adj'] = uk1.groupby('stkcd')['p_adj'].shift(1)
uk1['date_diff'] = uk1['date_idx'] - uk1['ldate_idx']
uk1['ret'] = uk1['p_adj'] / uk1['lp_adj'] - 1
uk1['date_diff'].value_counts()
1.0 192606
2.0 5
3.0 4
8.0 1
Name: date_diff, dtype: int64
Treat daily return as missing if there is long gap between two dates
uk1['ret'] = np.where(uk1['date_diff']<=3, uk1['ret'], np.nan)
Generate month index
uk_month = uk1.query('monthend==1')[['stkcd', 'datadate', 'p_adj']].copy()
uk_month['yyyymm'] = (uk_month['datadate']/100).astype(int)
uk_month['year'] = (uk_month['yyyymm']/100).astype(int)
uk_month['month'] = uk_month['yyyymm'] % 100
uk_month['month_idx'] = (uk_month['year']-2020)*12 + uk_month['month'] - 6
Monthly return
uk_month = uk_month.sort_values(['stkcd', 'yyyymm'], ignore_index=True)
uk_month['lmonth_idx'] = uk_month.groupby('stkcd')['month_idx'].shift(1)
uk_month['lp_adj'] = uk_month.groupby('stkcd')['p_adj'].shift(1)
uk_month['month_diff'] = uk_month['month_idx'] - uk_month['lmonth_idx']
uk_month['ret'] = uk_month['p_adj'] / uk_month['lp_adj'] - 1
# Monthly return is missing if month gap is not 1 month
uk_month['ret'] = np.where(uk_month['month_diff']==1, uk_month['ret'], np.nan)
Import data
local data_url "https://www.dropbox.com/s/484ynn8mzihgj9q/uk.txt?dl=1"
import delimited "`data_url'", stringcols(1) clear
Clean data
/* Keep common shares */
keep if tpci == "0"
/* Keep primary issue */
keep if iid == prirow
/* fic = GBR */
keep if fic == "GBR"
/* Check duplicates */
duplicates drop gvkey iid datadate, force
/* Adjusted price */
gen p_adj = prccd / ajexdi * trfd
/* Security level id */
gen stkcd = gvkey + iid
order stkcd datadate
Generate date index
preserve
duplicates drop datadate, force
sort datadate
gen date_idx = _n
keep datadate date_idx
save uk_date_idx, replace
restore
Calculate daily return
merge m:1 datadate using uk_date_idx
sort stkcd datadate
bysort stkcd: gen ldate_idx = date_idx[_n-1]
bysort stkcd: gen lp_adj = p_adj[_n-1]
gen date_diff = date_idx - ldate_idx
gen ret = p_adj / lp_adj - 1
tab date_diff
date_diff | Freq. Percent Cum.
------------+-----------------------------------
1 | 192,606 99.99 99.99
2 | 5 0.00 100.00
3 | 4 0.00 100.00
8 | 1 0.00 100.00
------------+-----------------------------------
Total | 192,616 100.00
Treat daily return as missing if there is long gap between two dates
replace ret = . if date_diff>3
rm uk_date_idx.dta
Generate month index
keep if monthend==1
keep stkcd datadate p_adj
gen yyyymm = int(datadate/100)
gen year = int(yyyymm/100)
gen month = mod(yyyymm, 100)
gen month_idx = (year-2020)*12 + month - 6
Monthly return
sort stkcd yyyymm
bysort stkcd: gen lmonth_idx = month_idx[_n-1]
bysort stkcd: gen lp_adj = p_adj[_n-1]
gen month_diff = month_idx - lmonth_idx
gen ret = p_adj / lp_adj - 1
replace ret = . if month_diff!=1