My Plans

Posted by: Chris Sept. 21, 2020, 9:08 p.m. | (Comments)


I plan on making content with the eBird basic dataset and the R package AUK. The eBird dataset I have downloaded is around 210GB uncompressed, so if I want to analyze the data with R, I need to filter out what I need first. According to the AUK page, filtering the dataset can take several hours. 

Before I can start writing a post, I will have ideas for content that may be of interest. Once I have an idea of the locations my content will be addressing, I can start filtering the dataset. Then, depending on how many records there are for that specific location, I will start my preliminary draft with all the information about the location. I am looking to describe what biomes surround the location and what species of birds can be found there.


Figure 1: Observation graph on the Canada Goose sightings

The eBird site already has some graphs on species in certain locations that look like figure 1. Since I am working with R, I will be using packages like ggplot2 to create nice looking plots. Although I am not a professional analyst, there are some guidelines I will try to follow for eBird data usage. The best practices for using eBird data guide produces figure 2, in order to find some kind of bias.

Figure 2: Example of spatial bias. Checklists are mostly found in heavily populated areas, while Wood Thrush sightings are not near checklists. Grabbed from here.


I will pose questions and try answering them using the eBird dataset. If I can't answer them, or the analysis does not show anything interesting, then I may just leave the post as is and leave it up to interpretation. Some examples of questions I want to look into are how birding activity has changed during the COVID-19 months and what the American Kestrel population looks like in the United States.

In some cases, I may consult not just the eBird data, but other citizen science projects like iNaturalist. I imagine there are users who contribute to multiple projects. Although, sites like iNaturalist do not seem to have a dataset I can use. Instead, they have an API I can grab data from. I just hope the iNaturalist API is more flexible than the eBird API.


Aside from pure analytical posts, I want to write about my own interests. I mostly want to have somewhere I can write about my feelings on certain things like a journal. For instance, I really enjoy music, and I would like to write about why I like certain music and what it may make me feel.

Figure 3: Me with the Calabrese brothers. Calabrese is a horror punk band. A very blurry picture that is hiding my tears.

As with anyone, my taste in music shifts as life goes on. Depending on where I am in life, some music may hit me differently. There was a point in school where I was fed up with the whole student situation, and I really enjoyed super angry music during that time. 

There are also some experiences I would like to write about like my schooling and job search. Some school quarters and semesters were very tough, and I believe it's healthy to talk about it. My time searching for work is also very annoying, because it can be difficult to get interviews. Even if no one reads it, I can at least release it from my mind. 


Since my blog is mostly focused on birds and their observations, I am calling everything else I analyze as 'Other'. There are video games, music and movies I would love to do basic analysis on. I may also take social media posts and analyze those.


Music is probably the runner up to birds, in regards of my interest in data analysis. I have done some analysis on music, and I really enjoyed that. Since I am not the only one who loves music, I am hoping I can offer some interesting content to music lovers.

Figure 4: Truer words have never been written. Rights to Simon Noh.

I haven't figured it out yet, but I may focus my analysis on artists that have a decent sized discography. Then when I analyze genres, I will narrow it down to one genre per post, or something. I do have quite a lot of ideas that could become reality, but I would have to look into what is available.


I guess games would be the next topic I want to analyze. Although, I am not sure how easy it is to grab data on games. I assume that the data on games is more of a private matter that game studios might not want released. I'm sure they pay people to work with their data. Despite that, I am planning on contacting studios to see if they allow for that. Apex Legends is the main game I want to do some analysis on, but if any game I enjoy is able to release some data, I would love that.

Movies, etcetera

I do enjoy movies, but not too much. There are some films I absolutely loved, and I may write about those. Other than movies, I am looking at social media posts and content online. I am probably not going to write about these topics too often, but I do think it would be interesting.


I like to write, because I want to create content that others may read. I am also making content I would like to see, while improving my skills. Hopefully all of this is interesting!