How a Cryptocurrency Can Improve Data Collection

In Uncategorized by fossil0 Comments

It’s been a long while since my last post (or well the ML Quiz I wrote). I’ve started researching blockchain and cryptocurrencies for the past 2 months and came across a particular upcoming project that peeked my interest. While I won’t go into specifics about crypto itself, I wanted to talk about this project’s initiative and how it could (if they execute) aid the deep learning community.

As we all know, supervised machine learning problems require labeled data. But you know this. Computer vision related problems that involve face recognition/image classification require hundreds of millions labeled image data. Of course this process of labeling can’t (and shouldn’t) be done via software. This implies people have to manually label images of cats, dogs, trucks and street signs. Companies can outsource their data collection work via micro tasks on crowdsourcing platforms (e.g. Amazon Mechanical Turk). However, there are currently both minor and major roadblocks that come in the way with regards to multiple factors such as the: approval processes for micro-task workers; manual verification of results; and high fees for requesters.

Gems Protocol (built on the Ethereum blockchain) allows one to build platforms that involve verifiable tasks – yes this would include labeling data!. The white paper itself states: ‘The first module the Gems team will build focuses on labeling data for AI.’ Of course this provides great practical use for machine learning engineers who need data to train their models (e.g convolutional neural networks). Their system is sophisticated enough to handle various flaws in other systems. 

I’ll simply highlight the 4 practical solutions the protocol offers against current crowdsourcing  micro-task platforms. (you can read more in depth details on their whitepaper).

  1. No Fees – Besides gas used on the Ethereum network, there’s no centralized network fees for transactions. Fees on current platforms pressure employer/workers since they lower profits made by both of them. 
  2. No Required Identity Verification – Workers do not need to provide full identity information in order to complete tasks. Other platforms go far as to link your own Facebook account when you sign up. Don’t worry though, Gems has a way to deal with anonymous actors who pose a threat.
  3. Payments on the Blockchain, NO BANKS – Workers will be paid with the platform token GEM. Workers do not require banks or wait for check deposits to receive payments.
  4. Modules – The team will implement UI/UX open source modules on the platform. This improves workers’ usability and promotes reusability. Some current systems require requesters to waste time building their UI/UX which can affect worker performance. Most happen to also be closed source software.

So how does this ultimately aid deep learning? If the Gems team can execute their platform with excellence, then they will change the game of crowdsourcing micro-tasks. This will ultimately improve worker and requester satisfaction – aiding the various roadblocks involved in producing high-quality labeled data. As a machine learning engineer myself, I’ve had to painstakingly label data myself. While I never interacted with the Amazon Mechanical Turk service, I see overwhelming potential within Gems team and their platform. Stay tuned with the team’s progress by following their blog here

Asides from the machine learning aspect, this will be a complete game changer overall if the team succeeds. In the future of my machine learning career, if I am leading a project which requires an enormous about of labeled data, I can go this route and get back to my data analysis without worry about unlabeled data.

Leave a Comment