This page elucidates and exemplifies the technical aspects of data collection. To investigate the impact of cryptocurrency market volatility on the traditional currency market, it is necessary to obtain data on the price and volume of major cryptocurrencies and foreign currencies relative to the U.S. dollar over a specified time period.
Methods
The data collection process for this project employed a combination of programmatic crawling and open data interfaces. The historical cryptocurrency and FX data were obtained from Yahoo Finance.
Data Sources
Yahoo Finance: As a widely utilized financial data platform on a global scale, Yahoo Finance offers a substantial repository of historical price data, encompassing the cryptocurrency and foreign exchange markets. Here’s link to the website: https://finance.yahoo.com/
Cryptocurrency Data Collection Methodology
The 15 cryptocurrencies with the highest market cap were selected for analysis. The historical data for these cryptocurrencies against the US dollar (e.g., “BTC-USD”) is accessible directly through the yfinance library.
The yfinance.download() function is employed to facilitate the retrieval of daily frequency data from 2022-04-01 to 2024-01-01 for each cryptocurrency. The downloaded data set comprises the opening price (Open), the high price (High), the low price (Low), the closing price (Close), the adjusted closing price (Adj Close), and the trading volume (Volume).
Subsequently, the data for each cryptocurrency is stored separately as comma-separated values (CSV) files in the “data/raw-data/” folder.
Forex Data Collection Methodology
In the case of foreign exchange, our attention is directed towards the exchange rate pairs of the US dollar with major global currencies, in addition to a selection of emerging market currencies. A total of 16 currency pairs were selected to ensure comprehensive coverage of major economies and a diversified regional distribution in the study.
Daily exchange rate data was obtained from 2022-04-01 to 2024-01-01 through the use of the yfinance tool. The retrieval of data is consistent with that of cryptocurrencies, and the data for each currency pair also contains the following information: open, high, low, close, and adjusted close.
Subsequently, the data for each currency pair is stored separately as CSV files in the “data/raw-data/” folder.
Tools and Reproducibility
The data was collected and previewed using Python and its associated data science tools, namely the data science library pandas and the plotting library matplotlib. The core tool used for this data collection phase was the Python library yfinance.
Code
In the following code, we first utilized the yfinance library to directly download historical price data for both cryptocurrencies and foreign exchange pairs from Yahoo Finance. A simple loop is applied to fetch daily data for each specified objective over the given time range. Once the data was retrieved, we stored each dataset in a separate CSV file.
# Import packagesimport yfinance as yfimport pandas as pdimport matplotlib.pyplot as pltimport os# Define the function of getting crypto datadef get_crypto_data(crypto_symbol, start_date, end_date): crypto_data = yf.download(crypto_symbol, start=start_date, end=end_date)return crypto_data# Paramscrypto_list = ["BTC-USD", "ETH-USD", "XRP-USD", "USDT-USD", "SOL-USD", "BNB-USD", "DOGE-USD", "ADA-USD", "USDC-USD", "STETH-USD", "WTRX-USD", "TRX-USD","AVAX-USD", "WSTETH-USD", "TON11419-USD"]start_date ="2022-04-01"end_date ="2024-01-01"# Get datacrypto_data_dict = {}for symbol in crypto_list: crypto_data = get_crypto_data(symbol, start_date, end_date) crypto_data_dict[symbol] = crypto_data# Preview dataprint(f"----- {symbol} -----")print(crypto_data.head())print("---------------------\n")# Save to csvfor symbol, df in crypto_data_dict.items(): filename =f"{symbol}.csv" filepath = os.path.join("../../data/raw-data/", filename) df.to_csv(filepath, index=True)
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
# Define function of getting fx datadef get_fx_data(symbol, start_date, end_date): fx_data = yf.download(symbol, start=start_date, end=end_date)return fx_data# Paramscurrency_pair_list = ["EURUSD=X", "JPY=X", "GBPUSD=X", "AUDUSD=X", "NZDUSD=X", "CNY=X","HKD=X", "SGD=X", "INR=X", "MXN=X", "PHP=X", "IDR=X","THB=X", "MYR=X", "ZAR=X", "RUB=X"]start_date ="2022-04-01"end_date ="2024-01-01"# Get datafx_data_dict = {}for symbol in currency_pair_list: fx_data = get_fx_data(symbol, start_date, end_date) fx_data_dict[symbol] = fx_data# Preview dataprint(f"----- {symbol} -----")print(fx_data.head())print("---------------------\n")# Save to csvfor symbol, df in fx_data_dict.items(): filename =f"{symbol}.csv" filepath = os.path.join("../../data/raw-data/", filename) df.to_csv(filepath, index=True)
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
In this phase of the data collection process, we successfully employed the yfinance Python library to crawl historical price data for a range of cryptocurrencies and forex. We ensured that the methodology and process were transparent and repeatable.
Technical Challenges
It is essential to guarantee a uniform time span and data frequency for cryptocurrencies and forex data. Moreover, some cryptocurrencies and cold pairs may exhibit gaps or unstable data, for example, some cryptocurrencies with high market cap may be released after April 1, 2022. This requires particular attention during the subsequent data cleaning process.
Conclusions and Future Work
This phase establishes the groundwork for subsequent data cleaning, EDA, modeling, and machine learning analysis. Subsequent steps are required to guarantee a clean and functional data structure and format through data cleaning. Additionally, analysis strategies can be adapted and new data science questions formulated based on the initial findings.