# --- Python Setup ---
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Suppress future warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=UserWarning)

# Display settings
sns.set(style="whitegrid")
pd.set_option('display.float_format', lambda x: f'{x:,.2f}')
plt.rcParams["figure.figsize"] = (8, 5)

# --- Load and Prepare Data ---
file_path = "/Users/jasoncherubini/Dropbox (Personal)/7 - Wasteland/2025.03/DATA_For_JASP.xlsx"
data = pd.read_excel(file_path, sheet_name=0, skiprows=1)

# Rename columns
data.rename(columns={
    "What is your age?": "Age",
    "What is your gender?": "Gender",
    "What is your highest level of education?": "Education",
    data.columns[4]: "Accredited",
    data.columns[5]: "Professional_Investor",
    data.columns[6]: "Entrepreneur",
    data.columns[7]: "Raised_CF",
    data.columns[8]: "Invested_Outside_CF",
    data.columns[9]: "Investing_Experience"
}, inplace=True)

# Convert to numeric
vars_to_use = ["Age", "Gender", "Education", "Accredited", "Professional_Investor",
               "Entrepreneur", "Raised_CF", "Invested_Outside_CF", "Investing_Experience"]
data[vars_to_use] = data[vars_to_use].apply(pd.to_numeric, errors='coerce')

Introduction

This report analyzes investor demographics and behavior in the context of equity-based crowdfunding. We explore age, gender, education, and experience to uncover patterns in how people engage with crowdfunding platforms.


Descriptive Statistics

data[vars_to_use].describe()
##          Age  Gender  ...  Invested_Outside_CF  Investing_Experience
## count 320.00  320.00  ...               320.00                320.00
## mean    2.98    0.54  ...                 0.75                  3.58
## std     0.93    0.50  ...                 0.43                  0.97
## min     1.00    0.00  ...                 0.00                  1.00
## 25%     2.00    0.00  ...                 0.75                  3.00
## 50%     3.00    1.00  ...                 1.00                  4.00
## 75%     3.00    1.00  ...                 1.00                  4.00
## max     6.00    1.00  ...                 1.00                  5.00
## 
## [8 rows x 9 columns]

Narrative:
These statistics show most respondents are mid-career professionals with moderate investment experience. Standard deviations suggest diversity in participation, providing a strong foundation for segmentation strategies.


Age Distribution

sns.histplot(data['Age'].dropna(), bins=range(int(data['Age'].min()), int(data['Age'].max()) + 1), color='skyblue', edgecolor='white')
plt.title("Participant Age Distribution")
plt.xlabel("Age Group (Ordinal Code)")
plt.ylabel("Number of Participants")
plt.show()

Narrative:
Participants cluster in the middle age groups, suggesting crowdfunding activity is concentrated among professionals in their 30s and 40s. Younger and older groups are less represented.


Gender Distribution

sns.countplot(x=data['Gender'].dropna().astype(int), palette=sns.color_palette("pastel"))
plt.title("Gender Distribution of Respondents")
plt.xlabel("Gender (0 = Female, 1 = Male)")
plt.ylabel("Count")
plt.show()

Narrative:
The sample skews slightly male, aligning with broader early-stage investment trends. However, a visible share of female participants indicates inclusive access and opportunity for targeted outreach.


Investing Experience by Gender

sns.boxplot(x=data['Gender'], y=data['Investing_Experience'], palette=sns.color_palette("pastel"))
plt.title("Investing Experience by Gender")
plt.xlabel("Gender (0 = Female, 1 = Male)")
plt.ylabel("Self-Rated Investing Experience (1–5)")
plt.show()

Narrative:
Men report marginally higher investing experience. While ranges overlap significantly, this suggests confidence or access gaps that could be addressed through inclusive financial education.


Raised vs. Invested

sns.stripplot(x='Raised_CF', y='Invested_Outside_CF', data=data, jitter=True, alpha=0.6, color="#7FC97F")
sns.regplot(x='Raised_CF', y='Invested_Outside_CF', data=data, scatter=False, color="darkred")
plt.title("Raised vs Invested in Startups")
plt.xlabel("Raised via Crowdfunding (0 = No, 1+ = Yes/Multiple)")
plt.ylabel("Invested Outside Crowdfunding (0 = No, 1 = Yes)")
plt.tight_layout()
plt.show()

Narrative:
Participants who raised funds via crowdfunding were slightly more likely to have invested in startups. While correlation is weak, this suggests a subset of entrepreneur-investor overlap.


Correlation Heatmap

corr_data = data[vars_to_use].dropna()
corr_matrix = corr_data.corr()

# Draw heatmap, suppress printed array outputs
_ = sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f", square=True,
                cbar_kws={"shrink": 0.8}, linewidths=.5, mask=np.triu(np.ones_like(corr_matrix, dtype=bool)))
plt.title("Correlation Heatmap of Survey Variables")
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

Narrative:
Education, accreditation, and investing experience are moderately correlated. This suggests experienced investors tend to be better educated and formally qualified — a segment that may benefit from tailored offerings.


Summary of Findings