# Class06
## Compustat Global overview
- Data
- Security Daily: stock price
- Fundamentals Annual (Quarterly): accounting data
- Coverage: global
### Web access
Downlaod data from Compustat Global Security Daily via WRDS web access.
- Dataset: Compustat Global Security Daily
- Time period: 01 Jul 2020 to 31 Dec 2020
- Search the entire database
- Variables
- iid
- ajexdi
- prccd
- trfd
- isin
- tpci
- fic
- monthend
- prirow
- Output format
- Tab-delimited text
- Uncompressed
- Date format: YYMMDDn8
### Identifiers
- GVKEY
- Permanent firm level identifier
- ISIN
- Security level identifier
- IID
- Issue ID
|GVKEY |IID |Exchange |
|-------|----|-----------------------|
|100131 |01W |London Stock Exchange |
|100131 |02W |Mexican Stock Exchange |
|100131 |03W |Bats Chi-X Europe |
**GVKEY** + **IID** will identify security, e.g. 10013101W.
### Primary share
- PRIROW
- Primary share flag
|GVKEY |IID |Exchange |PRIROW |
|-------|----|-----------------------|-------|
|100131 |01W |London Stock Exchange |01W |
|100131 |02W |Mexican Stock Exchange |01W |
|100131 |03W |Bats Chi-X Europe |01W |
**01W** is the primary share.
### Issue type code
TPCI
- 0
- Common (ordinary) share
### Country code
- FIC
- 3-letter country code
Check full list of country code
### Stock price
- PRCCD
- Stock price (unadjusted)
- Need to adjust stock splits and dividends before calculating stock return
- AJEXDI
- Adjustment factor
- TRFD
- Total return factor
$\text{adjusted price} = \frac{prccd}{cfacpr} \times trfd$
## Python
- Import packages
```{jupyter-execute}
import pandas as pd
import numpy as np
```
- Import Compustat Global data
```{jupyter-execute}
url = 'https://www.dropbox.com/s/484ynn8mzihgj9q/uk.txt?dl=1'
uk = pd.read_csv(url, sep='\t', low_memory=False, dtype={'gvkey': str})
```
- Clean data
```{jupyter-execute}
# Keep common shares
uk = uk[uk['tpci']=='0'].copy()
# Make sure fic is GBR
uk = uk[uk['fic']=='GBR'].copy()
# Keep primary share
uk = uk[uk['iid']==uk['prirow']].copy()
# Check duplicates
uk = uk.drop_duplicates(['gvkey', 'iid', 'datadate']).copy()
# Security level ID
uk['stkcd'] = uk['gvkey'].astype(str) + uk['iid']
# Adjusted price
uk['p_adj'] = (uk['prccd']/uk['ajexdi']) * uk['trfd']
```
- Generate date index
```{jupyter-execute}
date_index = uk.drop_duplicates('datadate')[['datadate']].copy()
date_index = date_index.sort_values('datadate', ignore_index=True)
date_index['date_idx'] = date_index.index + 1
```
- Calculate daily return
```{jupyter-execute}
uk1 = uk.merge(date_index, how='inner', on='datadate')
uk1 = uk1.sort_values(['stkcd', 'datadate'], ignore_index=True)
uk1['ldate_idx'] = uk1.groupby('stkcd')['date_idx'].shift(1)
uk1['lp_adj'] = uk1.groupby('stkcd')['p_adj'].shift(1)
uk1['date_diff'] = uk1['date_idx'] - uk1['ldate_idx']
uk1['ret'] = uk1['p_adj'] / uk1['lp_adj'] - 1
uk1['date_diff'].value_counts()
```
- Treat daily return as missing if there is long gap between two dates
```{jupyter-execute}
uk1['ret'] = np.where(uk1['date_diff']<=3, uk1['ret'], np.nan)
```
- Generate month index
```{jupyter-execute}
uk_month = uk1.query('monthend==1')[['stkcd', 'datadate', 'p_adj']].copy()
uk_month['yyyymm'] = (uk_month['datadate']/100).astype(int)
uk_month['year'] = (uk_month['yyyymm']/100).astype(int)
uk_month['month'] = uk_month['yyyymm'] % 100
uk_month['month_idx'] = (uk_month['year']-2020)*12 + uk_month['month'] - 6
```
- Monthly return
```{jupyter-execute}
uk_month = uk_month.sort_values(['stkcd', 'yyyymm'], ignore_index=True)
uk_month['lmonth_idx'] = uk_month.groupby('stkcd')['month_idx'].shift(1)
uk_month['lp_adj'] = uk_month.groupby('stkcd')['p_adj'].shift(1)
uk_month['month_diff'] = uk_month['month_idx'] - uk_month['lmonth_idx']
uk_month['ret'] = uk_month['p_adj'] / uk_month['lp_adj'] - 1
# Monthly return is missing if month gap is not 1 month
uk_month['ret'] = np.where(uk_month['month_diff']==1, uk_month['ret'], np.nan)
```
## Stata
- Import data
```{code-block} stata
local data_url "https://www.dropbox.com/s/484ynn8mzihgj9q/uk.txt?dl=1"
import delimited "`data_url'", stringcols(1) clear
```
- Clean data
```{code-block} stata
/* Keep common shares */
keep if tpci == "0"
/* Keep primary issue */
keep if iid == prirow
/* fic = GBR */
keep if fic == "GBR"
/* Check duplicates */
duplicates drop gvkey iid datadate, force
/* Adjusted price */
gen p_adj = prccd / ajexdi * trfd
/* Security level id */
gen stkcd = gvkey + iid
order stkcd datadate
```
- Generate date index
```{code-block} stata
preserve
duplicates drop datadate, force
sort datadate
gen date_idx = _n
keep datadate date_idx
save uk_date_idx, replace
restore
```
- Calculate daily return
```{code-block} stata
merge m:1 datadate using uk_date_idx
sort stkcd datadate
bysort stkcd: gen ldate_idx = date_idx[_n-1]
bysort stkcd: gen lp_adj = p_adj[_n-1]
gen date_diff = date_idx - ldate_idx
gen ret = p_adj / lp_adj - 1
tab date_diff
```
```{code-block} stata-output
date_diff | Freq. Percent Cum.
------------+-----------------------------------
1 | 192,606 99.99 99.99
2 | 5 0.00 100.00
3 | 4 0.00 100.00
8 | 1 0.00 100.00
------------+-----------------------------------
Total | 192,616 100.00
```
- Treat daily return as missing if there is long gap between two dates
```{code-block} stata
replace ret = . if date_diff>3
rm uk_date_idx.dta
```
- Generate month index
```{code-block} stata
keep if monthend==1
keep stkcd datadate p_adj
gen yyyymm = int(datadate/100)
gen year = int(yyyymm/100)
gen month = mod(yyyymm, 100)
gen month_idx = (year-2020)*12 + month - 6
```
- Monthly return
```{code-block} stata
sort stkcd yyyymm
bysort stkcd: gen lmonth_idx = month_idx[_n-1]
bysort stkcd: gen lp_adj = p_adj[_n-1]
gen month_diff = month_idx - lmonth_idx
gen ret = p_adj / lp_adj - 1
replace ret = . if month_diff!=1
```