# Class05
## Execucomp overview
- Data source: company’s annual proxy (DEF14A SEC form)
- The dataset starts from 1992
- Universe
- US firms
- S&P500 from 1992 and S&P1500 from 1994
- There is no ADR firms (ADRs do not need to submit proxy to SEC)
### Web access
WRDS --> Compustat - Capital IQ --> Execucomp
### Identifiers
- GVKEY
- CUSIP
### Important variables
- YEAR
- This is the fiscal year in Compustat Fundamentals Annual (Quarterly)
- CEOANN
- Historical flag to indicate CEO in a particular year
- TITLEANN
- Historical flag to describe the title
- It is useful to determin if the executive officer serves as chairman of the board
- EXECID
- Permanent ID for each executive
- This ID does not change even the person moved to another company
- EXECDIR
- Flag to indicate whether executive officer is also a director
- GENDER
- It indicates female or male.
- Understand CEOANN vs PCEO and AGE vs PAGE
|EXEC_FULLNAME |YEAR |CEOANN |PCEO |AGE |PAGE |
|----------------|-----|-------|-----|----|-----|
|Timothy D. Cook |2010 | |CEO |49 |59 |
|Timothy D. Cook |2011 | |CEO |50 |59 |
|Timothy D. Cook |2012 |CEO |CEO |51 |59 |
|Timothy D. Cook |2013 |CEO |CEO |52 |59 |
|Timothy D. Cook |2014 |CEO |CEO |53 |59 |
|Timothy D. Cook |2015 |CEO |CEO |54 |59 |
|Timothy D. Cook |2016 |CEO |CEO |55 |59 |
|Timothy D. Cook |2017 |CEO |CEO |56 |59 |
|Timothy D. Cook |2018 |CEO |CEO |57 |59 |
|Timothy D. Cook |2019 |CEO |CEO |58 |59 |
|Timothy D. Cook |2020 |CEO |CEO |59 |59 |
## Python
- Import packages
```{jupyter-execute}
import pandas as pd
import numpy as np
```
- Import Execucomp data
```{jupyter-execute}
url = 'https://www.dropbox.com/s/u1safsdjh4n7pt3/dataucomp.txt?dl=1'
data = pd.read_csv(url, sep='\t', encoding='ISO-8859-1')
len(data)
```
```{attention}
You will get UnicodeDecodeError if you do not use encoding argument when importing data.
This is caused by special charcater, e.g. é.
See the link to learn about encoding: https://en.wikipedia.org/wiki/ISO/IEC_8859-1
```
- Clean the data
```{jupyter-execute}
data.columns = data.columns.str.lower()
data = data.drop_duplicates(['gvkey', 'year', 'execid'])
len(data)
```
- CEO duality
```{jupyter-execute}
data['chair_id'] = np.where(data['titleann'].str.contains('chmn|chairman'), 1, 0)
data['duality'] = np.where((data['ceoann']=='CEO') & (data['chair_id']==1), 1, 0)
```
- Percentage of female executives or CEOs
```{jupyter-execute}
# Count number of executives
pct_female = (data.groupby(['gvkey', 'year', 'gender'])
['execid'].count().unstack())
# Fill missing values
pct_female = pct_female.fillna(0)
# Total number of executives
pct_female['total'] = pct_female['FEMALE'] + pct_female['MALE']
# Count number of female CEO
female_ceo = (data[(data['ceoann']=='CEO') & (data['gender']=='FEMALE')]
.groupby(['gvkey', 'year'])['execid'].count().to_frame('n_female_ceo'))
# Merge datasets
pct_female = pct_female.join(female_ceo, how='left')
# Fill missing values
pct_female = pct_female.fillna(0)
# Percentage of female executives
pct_female['pct_female'] = pct_female['FEMALE'] / pct_female['total']
# Percentage of female CEOs
pct_female['pct_female_ceo'] = pct_female['n_female_ceo'] / pct_female['total']
# Calculate averge percentage by year
pct_female = (pct_female.groupby(level=1)
[['pct_female', 'pct_female_ceo']].mean())
# Average during the sample period
pct_female.mean()
```
## Stata
- Import Execucomp data
```{code-block} stata
local data_url "https://www.dropbox.com/s/u1safsdjh4n7pt3/dataucomp.txt?dl=1"
import delimited "`data_url'", clear
```
- Clean the data
```{code-block} stata
duplicates drop gvkey year execid, force
```
- CEO duality
```{code-block} stata
gen chair_id = 1 if strpos(titleann, "chmn")>0 | strpos(titleann, "chairman")>0
gen duality = 1 if chair_id==1 & ceoann=="CEO"
replace duality = 0 if duality==.
```
- Percentage of female executives or CEOs
```{code-block} stata
/* Count total number of executives */
bysort gvkey year: egen n_total = count(execid)
/* Count number of female executives */
gen gender_id = 1 if gender=="FEMALE"
replace gender_id = 0 if gender=="MALE"
bysort gvkey year: egen n_female = sum(gender_id)
/* Count number of female CEO */
gen gender_ceo = 1 if gender=="FEMALE" & ceoann=="CEO"
replace gender_ceo = 0 if gender_ceo == .
bysort gvkey year: egen n_female_ceo = sum(gender_ceo)
/* Keep firm level data */
duplicates drop gvkey year, force
/* Percentage of female executives */
gen pct_female = n_female / n_total
/* Percentage of female CEO */
gen pct_female_ceo = n_female_ceo / n_total
/* Calculate average percentage by year */
collapse (mean) pct_female=pct_female pct_female_ceo=pct_female_ceo, by(year)
/* Average during the sample period */
tabstat pct_female pct_female_ceo, s(mean)
```