An Old Trick for Crowdsourcing Proprietary Data Could Work Well for New AI Startups
Thanks for this. I own an architecture AI startup laiout.com - so when I saw your example for architects, I sat up real straight. We’re having a lot of trouble finding good data...
"2. Legal document analysis: Law firms and legal professionals often have access to large collections of legal documents, such as contracts, court rulings, or patent filings."
None of the lawyers I work with (and it's a lot) would even share the time of day, unless they could bill you for 6 minutes (1/10th of an hour).
The monster legal research firms (Lexis, Thompson) would never let their proprietary data escape to the wild. The IP copyright battles will begin in 3.... 2....1.
As the Legal Bestie you know this will never work.
You should check out the BioDAO model David - definitely looking to capitalize on the ideas laid out here: https://www.molecule.to/blog/biodaos-are-community-owned-research-translation-engines-not-investment-daos. https://www.vitadao.com/ https://www.hairdao.xyz/ https://www.athenadao.co/ - for example.
I think cybersecurity also has a very big use case when it comes to sharing. People fight the same battles but always hand to hand without knowledge of how others have won against the same opponent.
I want to express my sincere appreciation for your invaluable contributions as an entrepreneur and investor. As someone passionate about developing innovative AI solutions, I have found your insights to be incredibly inspiring and thought-provoking.
Your recent article on the challenges facing startups developing AI models for industry verticals resonated with me. I was particularly impressed by your suggestion of crowdsourcing data from professionals in these industries as a potential solution. I believe this approach is not only brilliant and innovative, but also disruptive, and has the potential to overcome the challenges of obtaining diverse and accurate training datasets. With my background in AI data engineering and computer vision, including deep learning models and LLMs, I can attest that this is the perfect time for such an idea. Moreover, integrating these datasets into large models or building LLMs not only enhances existing models but also democratizes the use of generative AI.
I am currently facing challenges in collecting image datasets for a model I have already developed, and I appreciate your thoughts on how crowdsourcing can help to obtain high-quality training datasets. I also appreciate your perspective on the potential benefits of crowdsourcing, particularly the data network effect that can create a strong moat around the business. We must proactively address the risks and downsides related to the give-to-get model, such as data quality, privacy, and intellectual property concerns, to ensure the long-term success of our AI models.
Once again, thank you for your valuable insights and thought leadership in this space.
Great post David
Love seeing how GPT-4 contributed to blog post. Publishing GPT thread may be a good model for student assignments.
Great read thank you to you. I didn't know about Jigsaw!
Usefulness of data to the AI (optimization function) can be used to provide better pricing for the point system (not all data are alike in terms of information value). Furthermore, you can encode in the pricing system better rewards for dat from minorities and not just the edge cases so that you penalize bias and unfairness accruing in the dataset. This is what I call "Incentivized Regularization".
The latter point is much more important for use cases that use customer private data (like healthcare, recruiting, social media, etc.) vs industry data (like architecture, manufacturing, etc.). Fortunately there is new cryptography (ZKPs and FHE) that enables verification of data provenance and training on private data.
Then you can also use cryptocurrencies so that the rewarding system is community governed to add fuel to an explosive biz model. Some AI people, like Ali Yahya at a16z have been early to this idea (https://a16zcrypto.com/content/article/long-tail-problem-in-a-i/) as they are closer to the crypto space where, let's put it this way, a lot of "token experimentation" have been tried to acquire users, volume, etc.
Not written by ChatGPT.
I'm on the other side of architect drawings, reading them to make the building conform. It would be interesting to have an AI Architect design a building from the ground up given nothing but customer requests. It would need material physics, suppliers and council regulations etc, basically bypassing the architect.
"Show me an energy efficient 2 storey home that conforms to local regulations and doesn't look out of place with surrounding houses..."
It is generous of you to be so transparent with your creation process, thanks.
I included your chat with this blog post in one document here:
This is such a powerful & succinct way to incentivise folks to share proprietary data. Potentially this is a way how Crypto & AI can exist together, where AI creates value & Crypto helps capture it. We're exploring & building something on these lines in the Music Industry
Profound and prolific - great read!
I used to call that model "Forced Participation" :-S
I like your "Give-to-Get" brand much better :-)
I need some business startup help, I have a model to build this device but im more a Tesla than an Edison https://en.m.wikipedia.org/wiki/Solomon's_shamir
The term "the worm" refers to not a worm but what the shamir might look like from 40 miles away at night lit up by torches or whatever they used for lighting pre flood.
It's kinda a "he who has the most gold wins" type of a situation kinda like Teslas hydro electric energy production model, which all modern hydro-electric dams are based on, if I release the plans the chicoms, Indians, Pakistanis would say hey great idea and we'd have nuclear war over global looting rights.
Hit me up if you can get me 10 minutes with Musk, maybe we can get this rolling before I check out which is pretty soon.
First hash on crowdsource company eval framework (w/ gpt help)
1. Market opportunity of vertical
2. Data acquisition and management: Evaluate ability to acquire and manage large amounts of data from users, including:
* Variety and diversity of data sources
* Compatibility of different data formats and standards
* Cost-effectiveness of sources
3. AI model and technology: Evaluate AI model and technology, including:
* Data processing capabilities
* User interface
* Consider accuracy, scalability, and user experience
4. User engagement and retention
5. Ethics and compliance
I put a bunch of industry maps / profit pool thoughts on my dancohen.substack.com
Cost of sharing a contact number is low.
Medical records come in bulks controlled by medical universities. The data is anonymized, nevertheless it's an expensive ask for those who have such data. Which means the value provided by the service has exceed the cost.
In case of Artists. Bad artists have no proprietary style. Recognized ones are unlikely to designate it to a startup for everyone to use, because that's kind of everything they have.
Finance and investing. Well that may be fun. Zero working strategies will be shared. Zero non-working strategies have value.
Manufacturing and Science — I have no comments yet about this, maybe I should ask GPT4 :-) But Legal will likely sue the hell out of OpenAI and their copycats, but I'm sure you know a ton more about Law.
A good read, and thought provoking, thanks, David!
I saw this comment on a video and it made me think about HVAC20.com, we are looking forward not backward, running not stopping:
"At 77, I've had several careers. In the 90's it was software engineer (code monkey) and I worked for several "established" software companies that went broke. Each for the same reason - they were attacked by start ups and instead of continuing to improve, they spent resources on "Stopping the competition". It's happening with EVS. IF the U.S. puts financial barriers in Tesla's growth path, Tesla will NOT take the path of self destruction. Tesla WILL continue on it's path to success - Uncle Sam will be humiliated."