Project Proposal


Lukas Holliger – G7 – February 10, 2023

Project Summary

My project contains multiple phases and parts in order to reach multiple end goals. My goal is to design a series of programs to analyze users and projects on MIT’s programming website Scratch. I have previously worked on a few iterations of the project, but it has been multiple years since any serious work has been done.

My goal is twofold. First, I plan on writing a series of programs to collect all the data I need in a timely fashion that allows the project to continue well after the semester concludes. I also plan on analyzing the data collected for trends and to answer various questions that come up during research, such as “Do more complex projects get views?” or “Do people start to make more complex projects of time?”.

I feel this needs to be done because, simply put, no on else has been dumb enough to try it (and yet I’ve done it 2 times at this point), and also because it’ll give good insight into how new programmers learn and grow in a specific community. This community is one I grew up in and eventually led me to where I am with computer science (and Georgia Tech) today, so seeing how effective the website is as a teaching tool is important to me.

Statement of Positionality

In relation to the subject, I am simply a previous user. I used the website for multiple years and have since moved on to more complex things personally, so I spend no time really writing or creating on Scratch. The website is also comprised mostly of students between Elementary school and High School age, so as time goes on I have less and less to discuss with the user base as I am one of the older members.

I value the preservation of media, as well as a way to keep things in check. My project multiple years ago began as to see what the moderators on the site deleted, and that still exists in my purposes, but I also believe that things created shouldn’t ever go missing. It’s terrible to see resources online go away over time, and this site, like many others, may go down one day. I hope with my collected data I could at least also have an archive incase things go south for Scratch. Since my “subject” is an entire website, there isn’t exactly a strong “identity” or “value system” they follow, besides the fact that it is a programming website meant to teach kids how to program for the first time with block-based programming.

Background

Most of the background needed was provided in the previous sections, but for the most part can be summed up into the following:

  • Scratch is a programming website created by MIT that teaches people how to program with block-based programming
  • The website has a large public community where people can share, like, love, and follow users and their projects
  • I have collected data from this site in the past, but have never run any analysis on the information collected
  • The site has seemed to become a little less active over time, and one of my fears is it could go offline at some point, in which case I would like to have an archive of what existed.
  • No one else has tried to collect data or analyze it from this community except for me in the past.

Target Audience Description

Since I have multiple different deliverable elements, I have various different target audiences.

For the site I plan on creating in general, my target audience is the users of the website. Specifically, those wanting to know their performance at any given time as well as their performance over time. These people tend to be younger, and I do not plan on being their direct point of contact, rather I plan on making tools for developers to use to make websites that use my data. (This has been done in the past and is currently happening, I have two friends hosting the websites ScratchStats and Ocular which both use my collected data). This audience has a very wide range in expectations, as they have what I have already created, which has slowly diminished in quality and has shown very little signs of revival, but there is also the group that believes I can collect and show anything immediately, which is impossible as there’s 50 million accounts and hundreds of millions of projects.

My second deliverable element, the research, is meant for the older audience for people who have used the website in the past, or want to see if the website is a good thing for their school, family, or community. This will likely be in the form of a paper, which may not be terribly easy to interpret for some groups. These people should have some basic knowledge around computers and programming, and should have a clear grasp on complex English and some data analysis.

Primary Research Experience

Nearly this entire project is primary research. Since I will be writing the collectors myself, as well as analyzing it myself, the actual “deliverable” portion will have very little detail outside of what I collected myself using my data collection systems. So the “primary research” includes researching the Scratch website and analyzing the data my system outputs.

Another area of primary research is writing the actual programs. I will be utilizing the “rust programming book” to learn and create a program in Rust in order to rank the data in the fastest manner possible. Since this is the source of all information in the language, it is primary research, rather than using external sources such as Stack Overflow or Medium where it is people’s takes on the language and its functions.

Detailed Deliverables

  • Indexing program

This will be a system written in TypeScript that will scour the Scratch website and collect data. This program will not be very open to the public and will for the most part be written privately. At the end of the project, the source may be distributed (it may be helpful for internships), but its ability to collect mass amounts of data with very little limitations may make it dangerous in the wrong hands.

  • Ranking program

This will be a system that takes data directly from the “indexing program” and sort it immediately, as to provide a rapid analysis of incoming data for users. This program will be written in Rust and will be publicly available through its entire development. This program will be usable outside of my use case, so it may be helpful for future programmers trying to design a system similar to mine.

  • API [Application Programming interface] (ScratchDB)

My system has been called ScratchDB since its near start, and this will be the “v4” iteration of the project. The source code likely not be open to the public until release, or may never be made public. This system will be accessible online for developers well after this course is over, as it will replace my current system running since mid-2019. This will be written in likely in TypeScript and utilize data from the “ranking program” as well as databases that store information from the “indexing program”

  • Analysis Paper

This will be a document explaining the findings I found in my collected data over time. The actual question answered will be determined as I continue to write these programs, but will likely be something along the lines of “how effective is Scratch at teaching people how to program”, where I would analyze project complexity over time, as well as site performance. All of this is within the scope of the data I collect, and will help answer questions I have had about Scratch since I first started writing programs for it.

When it comes to programs, the source code can be made public upon request, but for most of what I have written in the past few years, only myself and some friends have ever seen the source as it is easier to keep things private, as I sometimes include private authentication details that allow access to my personal hardware.

Resources

All tools needed to conduct my research will be written and designed by me. In order to research how to design these programs, free and public information is available online. Due to the complexity of my projects, some discussions with professors may be in order to determine the most efficient way to do my work. When it comes to permission, it may be a good idea to get some permission from Scratch, but my projects in the past have not needed them and have actually been utilized by moderators on the site. Permission will be requested by email if applicable.

To compose my project, all the tools needed will likely be local editors such as WebStorm, IntelliJ, or Apple Pages, all of which I own and am able to use for creating my systems. Training is needed in the part of making my programs, but as previously discussed, that is part of primary research or talking to professors or friends in the field. Training will be received either my chat communications on platforms such as Discord, or in person.

Detailed Timeline

  • Phase 1: Project Design (Weeks 6-7)

During this phase, the general structure and testing of the project will come into place. This will include various research posts to different elements of the indexer and the ranking system as to find the most efficient components and best algorithms to rapidly analyze and sort data. Very little programming may happen during this time, but it may blend into Phase 2 based on the information found.

  • Phase 2: Project Writing (Weeks 8-11)

During this phase, the indexer as well as the ranking system will be produced. There may be some preliminary research to the data collected to see if it is viable to create a paper based on my analysis. This will be the bulk of the project and will produce the systems that will be used well into the future. Some of the information will be made public, but all the steps of development will be documented over time.

  • Phase 3: Analysis and API (End of Semester)

During this time, I will compose the analysis of my findings into a paper and presentation for the course. This may include a user-interactive website (possibly provided through one of my friends, and I simply will be producing the API that their systems use), or may include something such as a web-based presentation of live data. I hope to have an interface for people to see more inner-workings of the system as to get feedback

  • Phase 4: Post Course (Beyond Scope)

Over time after the course, I hope to continue running the system and possibly getting more people on board in its development. The future of programming is in the people who are getting into it, and as currently I am the “elusive developer who has been called be he/she/they since no one actually knows me”, giving some more time into helping people with their projects may be a help to future people trying projects such as me.

List of Proposed Resources

  • Scratch Team. (n.d.). Imagine, program, share. Scratch. Retrieved February 10, 2023, from https://scratch.mit.edu/
  • rust-lang. (2021, February 17). The rust programming language. The Rust Programming Language – The Rust Programming Language. Retrieved February 10, 2023, from https://doc.rust-lang.org/book/ 
  • Holliger, L. (2020, April 30). Version (v4). ScratchDB. Retrieved February 10, 2023, from https://scratchdb.lefty.one. 
  • Holliger, L., & Sun, A. (n.d.). Scratch. personal. 
  • Microsoft. (n.d.). The starting point for learning typescript. TypeScript. Retrieved February 10, 2023, from https://www.typescriptlang.org/docs/ 
  • Holliger, L. (2020, April 20). ScratchDB Source Code. Retrieved February 10, 2023, from https://github.com/lholliger/scratchdb
  • cheeriojs. (n.d.). Cheerio. cheerio. Retrieved February 10, 2023, from https://cheerio.js.org/ 
  • Faulkner, F. (n.d.). Data Structures and Algorithms. Georgia Tech.
  • Landry, R. (n.d.). Intro to Object Oriented Programming. Georgia Tech.  
  • Scratch Team. (n.d.). Scratch api. Scratch API – Scratch Wiki. Retrieved February 10, 2023, from https://en.scratch-wiki.info/wiki/Scratch_API

This is by no means an exhaustive list of sources, nor will all of these sources be used. Any citation including my own name tends to either be an interview or a previous system I created.


Leave a Reply

Your email address will not be published. Required fields are marked *