An Empirical Risk and Return Analysis of Select Stocks in NASDAQ 100

Stock market indices are considered to be a powerful economic indicator. These indices can be classified based on the methodology of weight allocation for each stock and the rules governing the entry, retention and exit criteria of various stocks in the index. This paper presents a descriptive and an exploratory analysis carried out on the daily returns data of NASDAQ 100 (^NDX) index and shortlist of 20 stocks in the index. Random sampling was conducted at the sector level strata of all stocks that make up the index. This approach was followed to avoid selection bias and that stocks from the varied sectors are represented equally for this analysis. R-squared values and correlation coefficients were used to determine the explain-ability and relationship between the stock returns and the index returns respectively. The paper applied descriptive univariate analysis on daily returns at an individual stock level and at an aggregated sector level. Inter-relationship between stocks and the index returns was carried out by computing Pearson’s correlation coefficient across the different combinations of stocks and index return values. Linear regression was carried out identify the explain ability of the variance in the returns of from the index to the returns from the stocks. All analysis was carried out using the python and the stats-models library. As anticipated, the returns of randomly picked 20 stocks were able to explain ~85 % of the variance of the returns of index. One of the primary focus of the paper was to explore whether NASDAQ-100 index can explain the variability of the technology stocks relatively more than the stocks that belong to other sectors in its portfolio owing to the nature of most stocks that make up the index.


Introduction
The National Association of Securities Dealers Automated Quotations (NASDAQ) is a stock exchange based in New York, USA. It was founded in 1971. (Nasdaq, 2021). Based on the market capitalization of shares traded, it is the world's second largest stock exchange, next to the New York Stock Exchange (NYSE). The Nasdaq 100 index (^NDX) is a basket of 100 largest most liquid stocks listed in the Nasdaq stock exchange. This index, unlike the Nasdaq Composite features companies from variety of industries except the industry that constitutes financial institutions. The industries that constitute the Nasdaq 100 are retail, biotechnology, industrial, technology, healthcare among others. The index is built using a modified capitalization weighting approach. The weight of the stocks in the index are based on their market capitalizations, ensuring checks and measure to cap the influence of its largest components. This is accomplished by quarterly reviews and weight re-adjustment activities if the distributions requirements are not met. A large portion of the index covers the technology sections accounting up to 56% of the index's weight followed by consumer services, healthcare, telecommunications and other industries. In this study, a sample of 20 stocks were selected, in the US markets, stocks are less correlated to the overall market more than they do elsewhere, an ideal number of stocks in any portfolio would be around 20 to 30. Existing research conducted in the area revealed that prior to the emergence of online investing the number hovered in the range of 20 to 30. ISSN 1927-5986 E-ISSN 1927-5994 Zaimovic (2021. The study is of relevance considering the time frame considered and would provide avenues of further research in a post COVID scenario.

Literature Survey
Inspite of several research done in the area, very limited research can be found considering the time frame considered in this paper applying to the NASDAQ listed stocks. Gautami (2018) studied the fluctuations in share prices of selected Indian companies. Trading stocks provides free float of shares coupled with transparent assessment through stock market transactions. The study explored the risk and return analysis of chosen stocks in india. Risk might be described as the variance in real return and return can be defined as the addition in the worth of stock. The profit from an investment portfolio helps a financial investor to assess the monetary performance of their respective investment. Sushma & Vikas (2019) evaluate the risk and return of the eight NSE listed companies alongside a secondary target and studied their volatility prior and after the demonetization. The techniques and methods utilized for examination were mean, standard deviation, beta, relationship, covariance and T-test. Examination was finished by utilizing the closing prices of every month for every selected companies.
According to Sonia & Ganesh (2021), the Indian Financial Services industry is diverse. This development of the financial services area drove numerous financial investors to redirect their investment towards the stock market. To build an alluring portfolio, the singular financial investor has to perform out a risk and return investigation well ahead. Balaji (2018) conducted studies on the risk and return of selected company stocks in auto industry and analyzed the performance of five Indian auto giants: The information analyzed for the review is 5 years, i.e., January 1, 2012-March 31, 2017. Techniques utilized for the review are risk, return, positioning strategy, graphical strategy.
Modelling daily returns using the linear regression approach requires some basic assumptions on the data points to be validated to ensure authenticity and reliability of the results. A thorough literature review was carried out to understand the previous work carried out to validate the distribution assumptions of the daily stock return data. Models previously proposed by (Bachelier 1900) assume that stock prices movements are normally distributed. (Osborne 1959) shows that logarithms of changes in the stock prices are mutually independent with a common probability distribution. Further, suggestion that stock prices will need to follow a normal distribution was then put forth. (Mandelbrot 1967) proposed that stock returns follow a stable Paretian distributions due their flatter tails. (Fama 1965) further supported the claims with a demonstration of confirming the flat tails of the stock prices and put forth that they have higher peaks than normal distribution. (Praetz 1972) examined the weekly data from Sydney stock exchange for a period of 8 years and concluded that Student-t distribution can be used an alternative to explain stock price behavior. (Blattberg & Gonedes 1974) used daily and weekly returns of stock of Dow Jones industrial (DJI) and estimated that Student-t distribution performs better than the normal distribution, but normality cannot be rejected from the monthly return data. (Hagerman 1978) proposed a mixture of normal distribution and Student-t distribution can be alternative to representing characteristics of stock return data. (Borowski, 2018) research of 65 stock indices concluded that distribution of daily returns can be normal only in short time intervals.
There has been a large body of research that has been carried out to further understand the distribution of stock returns distributions which would further make it easy to carry out parametric modelling on the underlying data. The timeframe, nature of the economy, market scenario (bear vs bull) and the return calculation time intervals being considered to identify returns determine the distribution assumptions that can be taken forward. These distributional assumptions are crucial for deciding pricing strategies, to be able to understand the behavior of the of stock at various levels and its indices to aid decision making during stock selections for entry and exist. A critical understanding of the distribution of stock returns and the correlations with the market indices can be used collectively to create a reasonable framework driven by statistical analysis that can drive scientific investment.
Patel and Surti (2020) Srivastava (2020) took the 10 years of data on monthly basis to correlate the directional movement of FMCG and the Nifty 50 index using correlation, regression, and ANOVA. They observed a strong positive correlation between the Nifty 50 and Nifty FMCG sector which means any change in the Nifty 50 index would result in a similar proportionate change for Nifty FMCG and related FMCG companies. Selected FMCG companies were Procter & Gamble (P&G), ITC, Hindustan Unilever Limited (HUL), and Godrej Consumer Products. The coefficient of correlation between Nifty 50 and Nifty FMCG was found 0.94 confirming the statement of strong correlation. Further regression analysis for Nifty 50 as the dependent variable and Nifty FMCG as the independent variable was performed to cross-verify the statistical significance. The R square value came out 0.89 (rounded to 2 digits) which means 89% of the Nifty FMCG index can be explained by the Nifty 50 index. As a next step to evaluate the relation between both indexes and each FMCG stock, regression analysis was performed by taking these companies as dependent variables. R-square value for Nifty 50 and P&G was 0.94 which shows strong relation and de-scribes that 94% of the P&G stock closing price can be explained by the Nifty 50 index. R-square values for the rest of FMCG stocks were above 0.8 hence it was concluded that FMCG stocks are strongly correlated with both indexes as well as they are statistically significant.
The study conducted by Rane and Gupta (2021) presented the outcome of regression analysis to predict the relationships between stock prices and various financial ratios for the Nifty Bank index which is a sub-index of Nifty 50. The duration for data extraction for analysis was taken from 2010-2019. Financial ratios related to the banking sector such as Net NPA ratio, capital adequacy ratio (CAR), net interest margin (NIM), return on equity (ROE), earning per share (EPS), dividend payout ratio, and net profit margin (NPM), was chosen in such a way that they can predict the company's performance. The aim of the study was to develop a model using a panel data regression model instead of a simple linear regression model due to the multi-dimensional nature of the data. With a 95% confidence level, four ratios (NPA, NPM, EPS, and ROE) out of the selected seven ratios are significant. With an adjusted R square value of 74.3%, which tells that four significant ratios out of seven can predict stock prices with 74.3% accuracy. The regression equation showed a negative coefficient for NPA that means if the NPA ratio (higher the ratio, lower the bank credibility) increases stock price will go down. Srivastava, H. (2020) studied theoretical and empirical relationships between different sectoral indices listed at National Stock Exchange (NSE). Selected indexes were NIFTY IT, NIFTY Bank, NIFTY Media, NIFTY FMCG, NIFTY Auto, NIFTY Realty, NIFTY Metal, NIFTY Financial Seer-vices, and NIFTY Pharma. Data collection for correlation and regression analysis was from January 2012 to April 2018 on weekly basis. They categorized into three parameters: weakly correlated, moderately correlated, and strongly correlated to understand the correlation strength among the different indices based on R-square value. It was found that various indices are moderately or strongly correlated with each other except a few such as NIFTY Metal with NIFTY IT and NIFTY Media, NIFTY Bank with NIFTY Media, and NIFTY Pharma. Biswas (2018) has performed multiple regression analysis, dropping variable analysis, Volatility, and Granger causality test using the ARCH (autoregressive conditionally heteroscedastic) mod-el. The period for the data analysis was from 2008 to 2018. The objective of the study was to find out how IT (Information Technology) stock prices affect the Nifty 50 index. From multiple regression analysis, they showed that except Wipro, all other IT stocks are statistically significant. Even after dropping the Wipro from regression, still, all IT stocks were highly significant. Granger causality test which is used to check the relation between two-time series data, predicted that IT stocks like Wipro, TCS, and Infosys independently do not affect Nifty 50 rather there is a joint significance of these stocks on the Nifty 50. They also found that Nifty 50 Granger cause Tech Mahindra and Wipro stock price which means if Nifty 50 gets affected, both stocks get affected. The third model tested, the ARCH model predicted that Infosys serves as one of the external causes for the volatility of Nifty 50 other than internal causes.

Methodology -Sampling
10-year, daily, adjusted closing price data between 2010-2020 for selected tickers were queried using the Yahoo finance API via python. A shortlist of 20 tickers was carried out independently using a stratified random sampling approach. The sampling activity was stratified at the sector level to ensure equal representation of across the industries represented in the index. The sample were picked from a list of companies listed on the NASDAQ stock exchange and a part of the NASDAQ 100 (^NDX) index. Daily returns were calculated and further analyzed.
Below is a table capturing the stocks that were selected based using the technique of stratified random sampling. The stratification was carried out at a sector level. A cut-off to the IPO year was also applied to be on or before 2010 to ensure sufficient 10-year stock return data can be gathered for the selected stocks. The choice of 20 stocks provide a healthy sample size for a robust study. ISSN 1927-5986 E-ISSN 1927-5994

Research Design -Analysis
Descriptive univariate analysis was carried out on daily returns at an individual stock level and at an aggregated sector level where aggregation was carried out by considering the average return for the stock belonging to the industry being considered. Inter-relationship between stocks and the index returns was carried out by computing Pearson's correlation coefficient across the different combinations of stocks and index return values. Linear regression was carried out identify the explain ability of the variance in the returns of from the index to the returns from the stocks. All analysis was carried out using the python and the stats-models library (Seabold, Skipper & Perktold, 2010)

Descriptive Analysis
T-Mobile US, Vertex Pharmaceuticals, and Illumina seem to have the highest daily movements over the period of 2010 to 2020. The seems to be a signal of stronger positive movement in returns of stocks as compared to the negative movement. ISSN 1927-5986 E-ISSN 1927-5994

Correlation between Stock Returns and the Index
The correlation matrix below shows the Pearson's correlation coefficient for the different combination of the ten-year stock and index return values. From the table, it is evident that there is a positive correlation between the return of the NASDAQ -100 index (^NDX) and the returns from the 20 stocks. Stocks belonging to the information technology and industrial sector has the strongest positive correlation with returns of the index relative to the other industries represented in the sample. There are signs of high correlation in returns among stocks that are in the same sector and industry as well.  From the above sector level correlation matrices, it is evident that all the stock that have been identified from the pool of eligible stocks have a positive correlation with the NASDAQ-100 index. Daily returns of the stocks belonging to the information technology sector have the higher correlation with the daily returns of index in discussion. This agrees in with the fact that, like the NASDAQ Composite index, Nasdaq-100 (^NDX) Index is heavily weighted towards technology companies. (Tretina, 2021)

Regression with Returns of NASDAQ-100 Index (^NDX) as the Dependent Variable
For the regression model, daily returns from the select stocks were considered as independent variable to understand the extent of variance in the return of the index that can answered by the return of the randomly samples stocks. This help is providing an understanding of the performance of index and its dependence of its returns on the individual returns of the stocks that it represents.
The regression model below, with an r-square value of 0.85 implies that the independent variable together can explain the ~ 85% of the variation of the returns in of the index in which they are represented. Owing to the nature of sampling, it is safe to say that the index is representing the returns of all the stocks across all the sectors that it covers. It is important to note that both the stocks representing the utilities sector have insignificant p-values/confidence intervals which is probably due to the under representation of stocks belonging to the utility sector in the index.  [1] R² is computed without centering (uncentered) since the model does not contain a constant.

Regression with the Returns of the Stocks Averaged at the Sector Level as the Dependent Variable
For the regression model, returns from the index (NDX^) were considered as independent variable to understand how much of the variance in the daily returns of the stocks averaged at a sector can answered by the daily returns of the index. This is to understand the extent to which the daily returns of the index represent the daily returns of the sectors (for the stock captured in the index). Below are the results of the stepwise linear regression carried out with the ten years daily returns of the NASDAQ-100 index as an independent variable and ten years daily returns of the shortlisted stocks averaged at a sector level as dependent variable. The return of the index can explain the average returns of the stocks belonging to the industrial and information technology sectors more than those belonging to other sectors .  ISSN 1927-5986 E-ISSN 1927-5994   [1] R² is computed without centering (uncentered) since the model does not contain a constant.

OLS Regression Results
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.   [1] R² is computed without centering (uncentered) since the model does not contain a constant.

OLS Regression Results
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified   [1] R² is computed without centering (uncentered) since the model does not contain a constant.

OLS Regression Results
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.  [1] R² is computed without centering (uncentered) since the model does not contain a constant.

OLS Regression Results
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.    [1] R² is computed without centering (uncentered) since the model does not contain a constant.

OLS Regression Results
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Conclusion
The objective of the report was to understand the relationship between the daily returns of the index and the daily returns of the stocks that it represents. It is evident that due to nature in which the index is constructed, it fulfills the role of encapsulating the performance of the cohort of stocks it represents.
In case of an index like NASDAQ-100 which is meant to represent a stock of companies belonging to a broader sector, it is evident from the evidence above that the returns of the index tend towards the performance of the stocks belonging the sector that is strongly represented. A large portion of the index covers the technology sector accounting up to 56% of the index's weight, which is evident in relatively higher R-squared values between the returns of the index (^NDX) aggregated returns information technology sector. The theme of strong relationship of the index returns with the averaged sector returns of industrial and information technology sectors is also back by relatively strong correlation between the daily returns of index and daily returns of the shortlisted stocks. When the daily returns of the stocks represented in the index(^NDX) is modelled using the daily returns of the index, returns of the index explain the variability of the averaged returns of the stocks from the companies belonging to the industrial and the information technology sectors relatively higher than those stocks of companies belong to a different sector. The above analysis confirms the strong relation between the performance of the index and the performance of the individual stocks it represents. This characteristic of index is what allows investors and economists to rely on it profoundly for decision making.