
These were 2 group projects that I contributed to for my coursework.
For the wildfire analysis project, we leveraged R and SQL to manage a dataset of over 1.8 million records of fires in the US. There are a wide variety of factors that contribute to the appearance of wildfires in the US. Utilizing data visualizations built with both base R and the ggplot package, this analysis has shown that wildfires have grown increasingly prominent so more research will need to be done in order to prevent future damage. We were unable to find significant correlation between wildfire appearance and most other meteorological variables. Since many causes of wildfires cannot be predicted in advance, such as, arson and campfires, they would not appear to be correlated with grander meteorological variables. Still, more work should be done as global temperatures continue to rise in order to prevent future damage.
In the Used Cars Project, we built three linear regression models to predict and find factors that contribute to used car prices from a Kaggle used car sales dataset containing 9 numerical variables and 10 categorical variables. We first created a model manually based on knowledge of used car sales as well as observation of diagnostic plots and graphs from this used car sale dataset, like the correlation matrix. We additionally created several automatically generated models based on AIC and BIC. All diagnostic plots (fitted vs. residuals, QQ, residuals vs. leverage, scale-location) were reasonable and suggested that our model was valid and satisfied the MLR assumptions.
