Class06

Compustat Global overview

  • Data

    • Security Daily: stock price

    • Fundamentals Annual (Quarterly): accounting data

  • Coverage: global

Web access

Downlaod data from Compustat Global Security Daily via WRDS web access.

  • Dataset: Compustat Global Security Daily

  • Time period: 01 Jul 2020 to 31 Dec 2020

  • Search the entire database

  • Variables

    • iid

    • ajexdi

    • prccd

    • trfd

    • isin

    • tpci

    • fic

    • monthend

    • prirow

  • Output format

    • Tab-delimited text

    • Uncompressed

    • Date format: YYMMDDn8

Identifiers

GVKEY
Permanent firm level identifier
ISIN
Security level identifier
IID
Issue ID

GVKEY

IID

Exchange

100131

01W

London Stock Exchange

100131

02W

Mexican Stock Exchange

100131

03W

Bats Chi-X Europe

GVKEY + IID will identify security, e.g. 10013101W.

Primary share

PRIROW
Primary share flag

GVKEY

IID

Exchange

PRIROW

100131

01W

London Stock Exchange

01W

100131

02W

Mexican Stock Exchange

01W

100131

03W

Bats Chi-X Europe

01W

01W is the primary share.

Issue type code

TPCI
0
Common (ordinary) share

Country code

FIC
3-letter country code

Check full list of country code

Stock price

PRCCD
Stock price (unadjusted)
Need to adjust stock splits and dividends before calculating stock return
AJEXDI
Adjustment factor
TRFD
Total return factor

$\text{adjusted price} = \frac{prccd}{cfacpr} \times trfd$

Python

  • Import packages

import pandas as pd
import numpy as np
  • Import Compustat Global data

url = 'https://www.dropbox.com/s/484ynn8mzihgj9q/uk.txt?dl=1'
uk = pd.read_csv(url, sep='\t', low_memory=False, dtype={'gvkey': str})
  • Clean data

# Keep common shares
uk = uk[uk['tpci']=='0'].copy()

# Make sure fic is GBR
uk = uk[uk['fic']=='GBR'].copy()

# Keep primary share
uk = uk[uk['iid']==uk['prirow']].copy()

# Check duplicates
uk = uk.drop_duplicates(['gvkey', 'iid', 'datadate']).copy()

# Security level ID
uk['stkcd'] = uk['gvkey'].astype(str) + uk['iid']

# Adjusted price
uk['p_adj'] = (uk['prccd']/uk['ajexdi']) * uk['trfd']
  • Generate date index

date_index = uk.drop_duplicates('datadate')[['datadate']].copy()
date_index = date_index.sort_values('datadate', ignore_index=True)
date_index['date_idx'] = date_index.index + 1
  • Calculate daily return

uk1 = uk.merge(date_index, how='inner', on='datadate')
uk1 = uk1.sort_values(['stkcd', 'datadate'], ignore_index=True)
uk1['ldate_idx'] = uk1.groupby('stkcd')['date_idx'].shift(1)
uk1['lp_adj'] = uk1.groupby('stkcd')['p_adj'].shift(1)
uk1['date_diff'] = uk1['date_idx'] - uk1['ldate_idx']
uk1['ret'] = uk1['p_adj'] / uk1['lp_adj'] - 1

uk1['date_diff'].value_counts()
1.0    192606
2.0         5
3.0         4
8.0         1
Name: date_diff, dtype: int64
  • Treat daily return as missing if there is long gap between two dates

uk1['ret'] = np.where(uk1['date_diff']<=3, uk1['ret'], np.nan)
  • Generate month index

uk_month = uk1.query('monthend==1')[['stkcd', 'datadate', 'p_adj']].copy()

uk_month['yyyymm'] = (uk_month['datadate']/100).astype(int)
uk_month['year'] = (uk_month['yyyymm']/100).astype(int)
uk_month['month'] = uk_month['yyyymm'] % 100

uk_month['month_idx'] = (uk_month['year']-2020)*12 + uk_month['month'] - 6
  • Monthly return

uk_month = uk_month.sort_values(['stkcd', 'yyyymm'], ignore_index=True)
uk_month['lmonth_idx'] = uk_month.groupby('stkcd')['month_idx'].shift(1)
uk_month['lp_adj'] = uk_month.groupby('stkcd')['p_adj'].shift(1)
uk_month['month_diff'] = uk_month['month_idx'] - uk_month['lmonth_idx']
uk_month['ret'] = uk_month['p_adj'] / uk_month['lp_adj'] - 1

# Monthly return is missing if month gap is not 1 month
uk_month['ret'] = np.where(uk_month['month_diff']==1, uk_month['ret'], np.nan)

Stata

  • Import data

local data_url "https://www.dropbox.com/s/484ynn8mzihgj9q/uk.txt?dl=1"
import delimited "`data_url'", stringcols(1) clear
  • Clean data

/* Keep common shares */
keep if tpci == "0"

/* Keep primary issue */
keep if iid == prirow

/* fic = GBR */
keep if fic == "GBR"

/* Check duplicates */
duplicates drop gvkey iid datadate, force

/* Adjusted price */
gen p_adj = prccd / ajexdi * trfd

/* Security level id */
gen stkcd = gvkey + iid
order stkcd datadate
  • Generate date index

preserve
duplicates drop datadate, force
sort datadate
gen date_idx = _n
keep datadate date_idx
save uk_date_idx, replace
restore
  • Calculate daily return

merge m:1 datadate using uk_date_idx
sort stkcd datadate
bysort stkcd: gen ldate_idx = date_idx[_n-1]
bysort stkcd: gen lp_adj = p_adj[_n-1]
gen date_diff = date_idx - ldate_idx
gen ret = p_adj / lp_adj - 1

tab date_diff
  date_diff |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |    192,606       99.99       99.99
          2 |          5        0.00      100.00
          3 |          4        0.00      100.00
          8 |          1        0.00      100.00
------------+-----------------------------------
      Total |    192,616      100.00
  • Treat daily return as missing if there is long gap between two dates

replace ret = . if date_diff>3

rm uk_date_idx.dta
  • Generate month index

keep if monthend==1
keep stkcd datadate p_adj

gen yyyymm = int(datadate/100)
gen year = int(yyyymm/100)
gen month = mod(yyyymm, 100)

gen month_idx = (year-2020)*12 + month - 6
  • Monthly return

sort stkcd yyyymm
bysort stkcd: gen lmonth_idx = month_idx[_n-1]
bysort stkcd: gen lp_adj = p_adj[_n-1]
gen month_diff = month_idx - lmonth_idx
gen ret = p_adj / lp_adj - 1

replace ret = . if month_diff!=1