Class05¶
Execucomp overview¶
Data source: company’s annual proxy (DEF14A SEC form)
The dataset starts from 1992
Universe
US firms
S&P500 from 1992 and S&P1500 from 1994
There is no ADR firms (ADRs do not need to submit proxy to SEC)
Web access¶
WRDS –> Compustat - Capital IQ –> Execucomp
Identifiers¶
GVKEY CUSIP
Important variables¶
YEAR - This is the fiscal year in Compustat Fundamentals Annual (Quarterly)
CEOANN - Historical flag to indicate CEO in a particular year
TITLEANN - Historical flag to describe the title
- It is useful to determin if the executive officer serves as chairman of the board
EXECID - Permanent ID for each executive
- This ID does not change even the person moved to another company
EXECDIR - Flag to indicate whether executive officer is also a director
GENDER - It indicates female or male.
Understand CEOANN vs PCEO and AGE vs PAGE
EXEC_FULLNAME |
YEAR |
CEOANN |
PCEO |
AGE |
PAGE |
---|---|---|---|---|---|
Timothy D. Cook |
2010 |
CEO |
49 |
59 |
|
Timothy D. Cook |
2011 |
CEO |
50 |
59 |
|
Timothy D. Cook |
2012 |
CEO |
CEO |
51 |
59 |
Timothy D. Cook |
2013 |
CEO |
CEO |
52 |
59 |
Timothy D. Cook |
2014 |
CEO |
CEO |
53 |
59 |
Timothy D. Cook |
2015 |
CEO |
CEO |
54 |
59 |
Timothy D. Cook |
2016 |
CEO |
CEO |
55 |
59 |
Timothy D. Cook |
2017 |
CEO |
CEO |
56 |
59 |
Timothy D. Cook |
2018 |
CEO |
CEO |
57 |
59 |
Timothy D. Cook |
2019 |
CEO |
CEO |
58 |
59 |
Timothy D. Cook |
2020 |
CEO |
CEO |
59 |
59 |
Python¶
Import packages
import pandas as pd
import numpy as np
Import Execucomp data
url = 'https://www.dropbox.com/s/u1safsdjh4n7pt3/dataucomp.txt?dl=1'
data = pd.read_csv(url, sep='\t', encoding='ISO-8859-1')
len(data)
297233
Attention
You will get UnicodeDecodeError if you do not use encoding argument when importing data. This is caused by special charcater, e.g. é. See the link to learn about encoding: https://en.wikipedia.org/wiki/ISO/IEC_8859-1
Clean the data
data.columns = data.columns.str.lower()
data = data.drop_duplicates(['gvkey', 'year', 'execid'])
len(data)
297229
CEO duality
data['chair_id'] = np.where(data['titleann'].str.contains('chmn|chairman'), 1, 0)
data['duality'] = np.where((data['ceoann']=='CEO') & (data['chair_id']==1), 1, 0)
Percentage of female executives or CEOs
# Count number of executives
pct_female = (data.groupby(['gvkey', 'year', 'gender'])
['execid'].count().unstack())
# Fill missing values
pct_female = pct_female.fillna(0)
# Total number of executives
pct_female['total'] = pct_female['FEMALE'] + pct_female['MALE']
# Count number of female CEO
female_ceo = (data[(data['ceoann']=='CEO') & (data['gender']=='FEMALE')]
.groupby(['gvkey', 'year'])['execid'].count().to_frame('n_female_ceo'))
# Merge datasets
pct_female = pct_female.join(female_ceo, how='left')
# Fill missing values
pct_female = pct_female.fillna(0)
# Percentage of female executives
pct_female['pct_female'] = pct_female['FEMALE'] / pct_female['total']
# Percentage of female CEOs
pct_female['pct_female_ceo'] = pct_female['n_female_ceo'] / pct_female['total']
# Calculate averge percentage by year
pct_female = (pct_female.groupby(level=1)
[['pct_female', 'pct_female_ceo']].mean())
# Average during the sample period
pct_female.mean()
pct_female 0.068210
pct_female_ceo 0.004514
dtype: float64
Stata¶
Import Execucomp data
local data_url "https://www.dropbox.com/s/u1safsdjh4n7pt3/dataucomp.txt?dl=1"
import delimited "`data_url'", clear
Clean the data
duplicates drop gvkey year execid, force
CEO duality
gen chair_id = 1 if strpos(titleann, "chmn")>0 | strpos(titleann, "chairman")>0
gen duality = 1 if chair_id==1 & ceoann=="CEO"
replace duality = 0 if duality==.
Percentage of female executives or CEOs
/* Count total number of executives */
bysort gvkey year: egen n_total = count(execid)
/* Count number of female executives */
gen gender_id = 1 if gender=="FEMALE"
replace gender_id = 0 if gender=="MALE"
bysort gvkey year: egen n_female = sum(gender_id)
/* Count number of female CEO */
gen gender_ceo = 1 if gender=="FEMALE" & ceoann=="CEO"
replace gender_ceo = 0 if gender_ceo == .
bysort gvkey year: egen n_female_ceo = sum(gender_ceo)
/* Keep firm level data */
duplicates drop gvkey year, force
/* Percentage of female executives */
gen pct_female = n_female / n_total
/* Percentage of female CEO */
gen pct_female_ceo = n_female_ceo / n_total
/* Calculate average percentage by year */
collapse (mean) pct_female=pct_female pct_female_ceo=pct_female_ceo, by(year)
/* Average during the sample period */
tabstat pct_female pct_female_ceo, s(mean)