I am a rising senior at Duke studying Computer Science and Statistics with a minor in Math. After graduating, I hope to work as a full-time data scientist or software engineer. I am especially passionate about the math behind machine learning algorithms and ethical AI. Aside from crunching numbers and programming, I am a huge basketball fan and love learning languages.
If you are interested in talking or have any questions, feel free to reach out to me through my:
- Email: [email protected]
- LinkedIn: Caleb Kornfein
- Cell: 707-329-5415
Project 1: 2021 IEEE GRSS Data Science Competition
The Task: Develop deep learning models that take advantage of various channels of satellite data to spot remote areas around the globe where humans live without electricity.
My Work: I acted as project leader for a team of four undergraduate students. I also built and trained various convolutional neural network architectures using PyTorch on a Duke compute cluster to classify 50m x 50m tiles as containing a human settlement without electricity or not.
The result: Together, we placed top 25 in the international challenge, in which we competed against research labs and PhD student teams from around the globe.
What I Learned: PyTorch, convolutional neural networks, bash scripting
Favorite Memory: A flurry of zoom calls on the night of our submissions -- we were simultaneously nervous for the results, proud of what we had accomplished, and connected as a team.
<iframe width="560" height="315" src="https://www.youtube.com/embed/YtoEv-HFCBA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
The Project: Worked with three other Duke undergraduates to build our idea of what an improved Dukehub system might look like. Dukehub is a website used by all Duke students for course enrollment and important academic information.
My Work: I created the designs database schematics, handled filters and querying for courses using Django and PostgreSQL, and added frontend elements such as the heading bar.
What I Learned: Database schema design, Django, SQL, PostgreSQL
The Project: My home in Sonoma county, California, has been ravaged by wildfires in recent years. On this project, I worked with Sonoma State University Professor Matthew Clark as part of the Soundscapes to Landscapes project to see how the sound ecosystems had changed as a result of the fires using audio data from hundreds of recorders placed around Sonoma County.
My Work: First we ran spectrograms, image representations of frequency patterns over time, through a pretrained CNN encoder to create embeddings. I then visualized the embeddings using dimension reduction techniques such as UMAP in order to discern different landcover types and sound changes.
What I learned: UMAP, Seaborn, CNN architectures, more about landscapes in my home county
The Project: As part of a Bayesian Statistics final project, my partner and I decided to use NLP to analyze the transcripts of the 2020 Democratic Debates
My Work I collected the texts from each of the 2020 Democratic Debates and cleaned the text using NLP techniques such as removal of stop words, tokenization, and stemming. Then, I trained a Latent Dirichlat Analysis model on the corpus of text to find the topics most frequently discussed at the debates.
What I learned: Latent Dirichlet Analysis, NLP best practices
Interesting Takeaways: Of the 13 topics identified, only topic 7 discussed US foreign policy. The major democratic candidates -- Elizabeth Warren, Joe Biden, and Bernie Sanders -- all spent around 1-2% of their time discussing this topic. In contrast, candidates Tulsi Gabbard, Jay Inslee, and Beto O'Rourkey spent the most time on the issue, with Beto leading the pack at 28% of his time.