This page offers a description of Exploratory Data Analysis (EDA) and provides an illustration of the technical specifics. This section provides an exhaustive overview of the data for this project, facilitating the identification of data patterns and any potential issues, and preparing for subsequent modeling and analysis.
Univariate Analysis
Numerical Variables: A calculation of the mean daily and weekly return for all cryptocurrencies reveals an overall market trend. For daily data, the mean return is 0.002589, and the distribution is similar to a normal distribution. Similarly, the mean volatility of the foreign exchange market is 0.000646, with a right-skewed distribution. The average return rate of weekly data is 0.002627, showing a left-skewed distribution. The corresponding average volatility of the foreign exchange market is -0.000531, with a right-skewed distribution. In general, the range of weekly volatility is smaller and more stable than daily volatility, but it also exhibits more pronounced skewness.
# Import necessary packagesimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns# Load cleaned datasetscrypto_returns = pd.read_csv('../../data/processed-data/crypto_returns_cleaned.csv')fx_rates = pd.read_csv('../../data/processed-data/fx_rates_cleaned.csv')# Add average rate columnscrypto_returns['Average_Crypto_Return'] = crypto_returns.iloc[:, 1:].mean(axis=1, skipna=True)fx_rates['Average_FX_Change'] = fx_rates.iloc[:, 1:].mean(axis=1, skipna=True)# Function for univariate numerical variable summary and visualizationdef univariate_analysis(df, column_name, title):# Summary statistics summary_stats = df[column_name].describe()print(f"Summary Statistics for {column_name}:\n", summary_stats, "\n")# Visualization plt.figure(figsize=(10, 6)) sns.histplot(df[column_name], kde=True, bins=200, color='skyblue') plt.title(f"Distribution of {title}") plt.xlabel(title) plt.ylabel("Frequency") plt.grid(axis='y', linestyle='--', alpha=0.7) plt.show()univariate_analysis(crypto_returns, 'Average_Crypto_Return', "Average Crypto Return Rate")univariate_analysis(fx_rates, 'Average_FX_Change', "Average FX Percent Change")
Summary Statistics for Average_Crypto_Return:
count 639.000000
mean 0.000346
std 0.027158
min -0.152892
25% -0.010888
50% 0.001190
75% 0.013321
max 0.142093
Name: Average_Crypto_Return, dtype: float64
Summary Statistics for Average_FX_Change:
count 455.000000
mean 0.001834
std 0.041633
min -0.055514
25% -0.002640
50% -0.000204
75% 0.002224
max 0.878334
Name: Average_FX_Change, dtype: float64
# Load cleaned datasetscrypto_returns_w = pd.read_csv('../../data/processed-data/weekly_crypto_returns.csv')fx_rates_w = pd.read_csv('../../data/processed-data/weekly_fx_rates.csv')# Add average rate columnscrypto_returns_w['Average_Crypto_Return'] = crypto_returns_w.iloc[:, 1:].mean(axis=1, skipna=True)fx_rates_w['Average_FX_Change'] = fx_rates_w.iloc[:, 1:].mean(axis=1, skipna=True)# Function for univariate numerical variable summary and visualizationdef univariate_analysis(df, column_name, title):# Summary statistics summary_stats = df[column_name].describe()print(f"Summary Statistics for {column_name}:\n", summary_stats, "\n")# Visualization plt.figure(figsize=(10, 6)) sns.histplot(df[column_name], kde=True, bins=30, color='skyblue') plt.title(f"Distribution of {title}") plt.xlabel(title) plt.ylabel("Frequency") plt.grid(axis='y', linestyle='--', alpha=0.7) plt.show()univariate_analysis(crypto_returns_w, 'Average_Crypto_Return', "Average Crypto Return Rate")univariate_analysis(fx_rates_w, 'Average_FX_Change', "Average FX Percent Change")
Summary Statistics for Average_Crypto_Return:
count 91.000000
mean 0.002627
std 0.071939
min -0.228304
25% -0.034751
50% 0.004806
75% 0.043309
max 0.162050
Name: Average_Crypto_Return, dtype: float64
Summary Statistics for Average_FX_Change:
count 91.000000
mean -0.000531
std 0.008738
min -0.018085
25% -0.005952
50% -0.001914
75% 0.004410
max 0.032417
Name: Average_FX_Change, dtype: float64
Categorical Variables: The mean daily and weekly return/volatility is classified into positive, negative, and neutral market conditions, and pie charts are employed to illustrate the frequency of market rises and falls. As illustrated in the chart, both the cryptocurrency market and the forex market exhibit a slight tendency to appreciate. However, on a weekly basis, the percentage of rises in both markets increased slightly.
The daily (and weekly) volatility profiles of 15 cryptocurrencies and 16 foreign exchange instruments were merged into a single data frame, and a correlation analysis was conducted. As illustrated in the graph, the majority of cryptocurrencies exhibit a positive correlation in returns, and this is similarly observed in the correlation between FX volatility while the ruble being the exception. The correlation between individual cryptocurrencies and individual foreign currencies may be positive or negative, but the strength of these correlations is relatively weak. However, from the weekly data, whether it is the correlation between the currencies within the two markets or between the markets, basically become more obvious, and the overall color of the heatmap has deepened, which may indicate that the use of weekly data in the subsequent analysis will be better to detect any patterns.
# Combine 2 dffull_combined_data = pd.merge( crypto_returns.drop(columns=['Average_Crypto_Return', 'Rate_Category']), fx_rates.drop(columns=['Average_FX_Change', 'Rate_Category']), on='Date')# Calculate correlationfull_correlation_matrix = full_combined_data.drop(columns=['Date']).corr()# Visualizationplt.figure(figsize=(16, 12))sns.heatmap(full_correlation_matrix, annot=True, cmap='coolwarm', center=0, vmin=-1, vmax=1, annot_kws={"size": 6})plt.title("Correlation Matrix of All Cryptos and FX Pairs")plt.show()
# Combine 2 dffull_combined_data_w = pd.merge( crypto_returns_w.drop(columns=['Average_Crypto_Return', 'Rate_Category']), fx_rates_w.drop(columns=['Average_FX_Change', 'Rate_Category']), on='Date')# Calculate correlationfull_correlation_matrix_w = full_combined_data_w.drop(columns=['Date']).corr()# Visualizationplt.figure(figsize=(16, 12))sns.heatmap(full_correlation_matrix_w, annot=True, cmap='coolwarm', center=0, vmin=-1, vmax=1, annot_kws={"size": 6})plt.title("Correlation Matrix of All Cryptos and FX Pairs")plt.show()
Chi-square Test
A cross-tabulation is created with the daily (and weekly) rate categories of cryptocurrencies and foreign exchange. Subsequently, a chi-square test is performed to ascertain whether the volatility of the two markets is significantly linked under a simple binary classification model.
For daily data, the Chi-squared test yielded a value of 0.022 with a p-value of 0.882, indicating that there is an absence of a statistically significant linear relationship between the two markets with regard to daily up and down events. Nevertheless, further investigation is required to ascertain whether a potential relationship exists between the two on more complex characteristic dimensions.
For weekly data, the Chi-squared test yielded a value of 9.536 with a p-value of 0.002, indicating that we should reject the null hypothesis that the cryptocurrency market is independent of the ups and downs of the forex market. Since this is a simple Chi-square test result that only shows the overall association, the specific causal relationship or degree of effect requires further analysis and modeling.
# Import needed libraryfrom scipy.stats import chi2_contingency# Create crosstablecategory_crosstab = pd.crosstab( crypto_returns['Rate_Category'], fx_rates['Rate_Category'], rownames=['Crypto Returns'], colnames=['FX Changes'], dropna=False)# Chi-sqaure testprint("\nObserved Frequency Table:")print(category_crosstab)chi2_contingency(category_crosstab)# Visualizationplt.figure(figsize=(8, 6))sns.heatmap(category_crosstab, annot=True, cmap="Blues", fmt='d', cbar=False)plt.title("Observed Frequency Table: Crypto vs. FX Market Movements")plt.xlabel("FX Market Changes")plt.ylabel("Crypto Market Changes")plt.show()
Observed Frequency Table:
FX Changes -1 1
Crypto Returns
-1 113 101
1 130 111
# Create crosstablecategory_crosstab_w = pd.crosstab( crypto_returns_w['Rate_Category'], fx_rates_w['Rate_Category'], rownames=['Crypto Returns'], colnames=['FX Changes'], dropna=False)# Chi-sqaure testprint("\nObserved Frequency Table:")print(category_crosstab_w)chi2_contingency(category_crosstab_w)# Visualizationplt.figure(figsize=(8, 6))sns.heatmap(category_crosstab_w, annot=True, cmap="Blues", fmt='d', cbar=False)plt.title("Observed Frequency Table: Crypto vs. FX Market Movements")plt.xlabel("FX Market Changes")plt.ylabel("Crypto Market Changes")plt.show()
Observed Frequency Table:
FX Changes -1 1
Crypto Returns
-1 30 9
1 22 30
Feature Pairing
To provide further support for the conclusion of the chi-square test, we present a comparison of the average daily volatility of cryptocurrencies with that of FX, using a scatterplot with a trendline to illustrate the relationship between the two. As evidenced by the data, both markets exhibit a concentration around 0. The forex market displays greater stability, while the cryptocurrency market demonstrates heightened volatility. Furthermore, the two markets do not exhibit a discernible linear relationship.
We also performed the same processing on the weekly data, and although it showed more linear relationship than the daily data, it was still not obvious and needed to be further tested in the subsequent analysis.
# Feature Pairingplt.figure(figsize=(10, 6))combined_data = pd.merge(crypto_returns, fx_rates, on='Date')sns.regplot( data=combined_data, x='Average_Crypto_Return', y='Average_FX_Change', scatter_kws={'alpha':0.5}, line_kws={"color": "red"})plt.ylim(-0.075, 0.075)plt.title("Relationship Between Average Crypto Return and FX Change")plt.xlabel("Average Crypto Return Rate")plt.ylabel("Average FX Percent Change")plt.grid(True, linestyle='--', alpha=0.7)plt.show()
# Feature Pairingplt.figure(figsize=(10, 6))combined_data_w = pd.merge(crypto_returns_w, fx_rates_w, on='Date')sns.regplot( data=combined_data_w, x='Average_Crypto_Return', y='Average_FX_Change', scatter_kws={'alpha':0.5}, line_kws={"color": "red"})plt.ylim(-0.075, 0.075)plt.title("Relationship Between Average Crypto Return and FX Change")plt.xlabel("Average Crypto Return Rate")plt.ylabel("Average FX Percent Change")plt.grid(True, linestyle='--', alpha=0.7)plt.show()
Summary
Key Findings
The forex market is, on the whole, more stable than the cryptocurrency market, which displays greater volatility.
There is some correlation between the weekly ups and downs of the two markets.
A significant linear relationship between average daily or weekly cryptocurrency returns and FX volatility is not evident.
Individual outliers (e.g., a data point exhibiting volatility in excess of 0.8 in the foreign exchange data) are retained for subsequent in-depth analysis.
Next Steps
Feature Selection: Ascertain how to generate more informative features, such as those pertaining to volatility classification.
Preparation for Modelling: Clustering or dimensionality reduction methods (e.g., PCA, t-SNE) can be employed to facilitate the exploration of patterns across markets.