Selected Code

Below are some examples of the work I’ve done for the Long Term Ecological Research (LTER) Network synthesis working groups. I get to work on a variety of tasks as a data analyst for the LTER Network Office, which allows me to broaden my flexible skill set.

Data Wrangling 🛠

The Plant Reproduction working group wanted plant trait data to research why some plants produce large amounts of seeds every few years. To assist this group, my team and I retrieved data from the TRY Plant Trait Database, which included traits such as plant seed mass, lifespan, flowering season, and seed persistence. Then we wrangled the TRY data so that we can combine it with a larger data set of plant traits that the working group already had. After much wrangling, we refined the integrated master data set to include 56 total variables for over 100 species. This data set was then used for further downstream analyses.

Links: GitHub repo

Exploratory Graphing 📈

The Plant Reproduction working group requested me to help them explore a potential analysis for one of their manuscripts. They were interested in a time series plot showing the total seed mass production in grams per year at specific plots at a site. I manipulated the columns in the aforementioned integrated master data set to calculate the grams of seed per species per year. Then I graphed the time series, with a separate panel for each plot.

Links: GitHub repo

Spatial Data 🌎

The Silica Export working group wanted to investigate drivers of riverine silicon exports and they requested for my team and I to identify and extract from various spatial data sets. In order to accomplish this, we first searched online for the data sets that best suited our needs. Then we gathered watershed shapefiles and extracted the spatial data within each watershed. Finally, we summarized the extracted values and exported them in a harmonized format that was easy to use for downstream analyses.

Links: GitHub repo

Text Mining 📑

The Ecosystem Transitions working group needed to review over 3000 papers in order to prepare a meta-analysis on ecosystem transitions. They split the reading assignments between their group members, but to speed the process along, they requested for me to find a way to quickly decide whether a paper is worth reading or not.

So I created a script that filters and ranks the abstracts and titles based on positive and negative keywords. The more positive keywords an abstract and its associated title have, the more likely it was for the group to include the full paper in their meta-analysis. On the other hand, if a paper’s abstract/title has more negative keywords, it means that the group will probably not be interested in this paper.

After discussing with the group, we decided that it would be helpful for the group members to see how many negative keywords were in each paper’s abstract and title. So I added a column to each person’s reading assignment list that shows the count of negative keywords for each abstract and title. That way, they could prioritize reading the abstracts that have the least negative keywords and save time by discarding the ones that have the most negative keywords.

In the end, the group was able to use my work to decide on the 700 or so papers that will be included in round 2 of the meta-analysis.

Links: GitHub repo