MSPi was a project created for my very last capstone class of my major. It was a horizontally-scalable security camera system built on top of Raspberry Pis with camera vision and livestream features. I was the project lead, and also the main developer of the livestream capabilities. Although it has its flaws, I’m overall really happy with how it turned out.
The initial plan as outlined in our requirements documentation was to create a camera that can recognize people, livestream, record, and send alerts. The camera should be able to connect to some client on a web or mobile application.
We first had to consider the limitations of our skills, along with the two month development timeframe we were given. Although we had a team of six people, no one had any prior skills with creating and interacting with embedded systems. We decided to base the camera on a Raspberry Pi since it can be programmed in either C or Python. We picked Python because of its relevancy in the machine learning world. We then decided to base the backend off of Firebase because we determined that writing an entire NodeJS backend would take up too much time. For the web client, we decided to use Vue because me and another team member have experience in it and can teach other people if necessary. The livestreaming feature uses WebRTC since it is built into every modern browser. For the mobile client, we had one team member experienced in Android development so we made an Android native app. Since iOS has stricter security rules than Android, we decided not to make an iOS app to save time.
After creating the design documentation and drawing the software architecture, we then listed the features by difficulty that are necessary to make a minimum viable product. The most difficult features were livestreaming and facial recognition. Since one of our team members has experience in Python, he was assigned to develop the embedded systems technology, including facial recognition. I have experience creating web application while also owning a Raspberry Pi, so I took on the role of developing the livestreaming feature. I then allocated two people to work on the web client, and the other two to work on the mobile client.
We used Agile Scrum as our software development methodology, and averaged about ~80 hours of total development hours per week. Each sprint lasted a week, and we had 7 sprints in total. After every 2 sprints, we had to demo to the professor what was completed.
For the first demo, the camera was capable of recognizing a human body, taking a picture, and uploading it to Firebase to be displayed by the web/mobile client.
The bulk of development was in the second demo, where the camera was able to record video when it sees a person and also livestream on demand to the web/mobile client.
For the third demo, the camera was able to recognize faces that was uploaded to its system through the web/mobile client and do 2-way communication livestreaming. We also implemented a system to have multiple cameras linked to one account and users can switch between them. These features were initially not in our feature requirements, but our professor told us to do it so we can’t refuse.
Taking pictures/recording during livestreaming was something that sounds simple but turns out to be a massive pain. Due to the fact that the camera can only be accessed by one program at a time, if the camera was recording when someone tries to access the livestream, either one or the other would crash. If we had more time to develop this feature, I would have looked into creating a virtual camera splitter by reading the camera buffer file /dev/video0 and duplicating its output into two separate device files for each program to read. I’m unsure whether this would actually work, but unfortunately we ran out of time and had to default to turning off recording when it’s livestreaming and vice-versa.
Livestreaming was also a beast all on its own. Because we used Firebase instead of creating our own backend, we did not have access to web sockets on the server to facilitate the peer-to-peer connections needed for WebRTC to work. What I ended up doing was creating a pseudo signalling server using Firestore as the facilitator. Each camera and user had their own unique id, so we created a way to link their ids together so it can be listened to on the database. The camera/user can then initiate and execute a handshake on the database and connect successfully for livestreaming. The upside to all of this is that WebRTC works on all modern browsers, so we did not have to develop a native mobile livestreaming solution. The mobile app can simply open a webview to our website.
I’m very content with how it all turned out and proud of the team that I led. From the very beginning, everyone was able to have fun while still hitting all of the milestones. Being able to joke around while working is always great for morale. Even the logo was initially created by me as a joke, but everyone liked it so it stuck. Someone said that we should make merchandise as a joke, and now we have t-shirts with the logo printed on it. As my final capstone class in my major, I’d consider myself pretty lucky that I had the team that I did.
Oh yeah, and we got an A on the project. It was one of the best projects out of all the capstone classes that semester.