Thanks for this. I own an architecture AI startup laiout.com - so when I saw your example for architects, I sat up real straight. We’re having a lot of trouble finding good data...

Expand full comment

If you try it, let me know how it works out.

Expand full comment

I will, thanks for writing the article.

Expand full comment

Good work here Sacks

Expand full comment

I feel like it could be done at scale, albeit slowly with a lot of hand's on coordination by going state-by-state through architectural archives, building and permit department databases.

It would likely have to interact with so many different entities with so many different frame works for api exchange of data if these departments are even storing in databases where you could interact easily with their stored design files.

A better alternative would be some sort of partnership with any of the major 3D design, architectural design, rendering design software companies for easy access to all sorts of shapes, materials, designs, architectural styles and possibly even saved data, building templates, etc.

The Give to Get model could definitely work well if you built a compelling, easy to use, highly useful software and then build and execute a massive growth push.

Expand full comment

We went down that path. Architectural drawings are copyrighted (I’m an architect too). The lawyers agreed we could use some of the data to “train” as long as we destroyed it immediately. (We weren’t re-selling their product)

Expand full comment


I bet those guys have an incredible dataset.

Expand full comment

Ian-- if you hit me up with an email (plg@slyk.io), I'll transfer ownership of https://laiout.slyk.io/ to you-- would love your feedback and see if it helps you incentivize data, quality control, and training. For context-- elaboration/productization of Sacks/Jigsaw give & earn to get & own model here: https://open.substack.com/pub/timparsa/p/sacks-and-slyk-ai-startup-launcher?r=4gw8s&utm_campaign=post&utm_medium=web

Expand full comment

"2. Legal document analysis: Law firms and legal professionals often have access to large collections of legal documents, such as contracts, court rulings, or patent filings."

None of the lawyers I work with (and it's a lot) would even share the time of day, unless they could bill you for 6 minutes (1/10th of an hour).

The monster legal research firms (Lexis, Thompson) would never let their proprietary data escape to the wild. The IP copyright battles will begin in 3.... 2....1.

As the Legal Bestie you know this will never work.

Expand full comment

I assumed that patent filings, court rulings are always public.

Expand full comment

I think focusing on the reaction of incumbents to models that threaten their revenue model is a bad way to evaluate an idea.

OFC a law firm using billable hours doesn't want efficiency gains. But a startup that can get access to a certain type of contracts and use AI to instantly deliver a standard contract sounds like a win for consumers.

Expand full comment


I think cybersecurity also has a very big use case when it comes to sharing. People fight the same battles but always hand to hand without knowledge of how others have won against the same opponent.

Expand full comment

Dear David,

I want to express my sincere appreciation for your invaluable contributions as an entrepreneur and investor. As someone passionate about developing innovative AI solutions, I have found your insights to be incredibly inspiring and thought-provoking.

Your recent article on the challenges facing startups developing AI models for industry verticals resonated with me. I was particularly impressed by your suggestion of crowdsourcing data from professionals in these industries as a potential solution. I believe this approach is not only brilliant and innovative, but also disruptive, and has the potential to overcome the challenges of obtaining diverse and accurate training datasets. With my background in AI data engineering and computer vision, including deep learning models and LLMs, I can attest that this is the perfect time for such an idea. Moreover, integrating these datasets into large models or building LLMs not only enhances existing models but also democratizes the use of generative AI.

I am currently facing challenges in collecting image datasets for a model I have already developed, and I appreciate your thoughts on how crowdsourcing can help to obtain high-quality training datasets. I also appreciate your perspective on the potential benefits of crowdsourcing, particularly the data network effect that can create a strong moat around the business. We must proactively address the risks and downsides related to the give-to-get model, such as data quality, privacy, and intellectual property concerns, to ensure the long-term success of our AI models.

Once again, thank you for your valuable insights and thought leadership in this space.

Sincere Regards,


Expand full comment

Great post David

Expand full comment

Love seeing how GPT-4 contributed to blog post. Publishing GPT thread may be a good model for student assignments.

Expand full comment

Great read thank you to you. I didn't know about Jigsaw!

Usefulness of data to the AI (optimization function) can be used to provide better pricing for the point system (not all data are alike in terms of information value). Furthermore, you can encode in the pricing system better rewards for dat from minorities and not just the edge cases so that you penalize bias and unfairness accruing in the dataset. This is what I call "Incentivized Regularization".

The latter point is much more important for use cases that use customer private data (like healthcare, recruiting, social media, etc.) vs industry data (like architecture, manufacturing, etc.). Fortunately there is new cryptography (ZKPs and FHE) that enables verification of data provenance and training on private data.

Then you can also use cryptocurrencies so that the rewarding system is community governed to add fuel to an explosive biz model. Some AI people, like Ali Yahya at a16z have been early to this idea (https://a16zcrypto.com/content/article/long-tail-problem-in-a-i/) as they are closer to the crypto space where, let's put it this way, a lot of "token experimentation" have been tried to acquire users, volume, etc.

Not written by ChatGPT.

Expand full comment
Apr 7, 2023·edited Apr 7, 2023

Came here to say smth like this. ^ You can add two buzzy categories together AI+blockchain!

In theory, this would allow users to not just sell their data once (get points when you add data) but rev share each time their data is purchased (aka used by an AI model). And this could be enforced by smart contracts on the blockchain. Then again, like many web3 projects, you can do this without the blockchain using "dumb contracts" and add blockchain if there are truly inefficiencies in the model.

And that Ali Yahya post is an interesting read. Thanks for sharing. I'm not sold that DAOs are a better organizing system/management philosophy than an old-fashioned, revenue-generating, private business.

Expand full comment

Nice comment. I tend to agree with you on DAOs specifically. Decentralizing governance is not necessary for this to work but the whole thing is an experiment worth running for sure. Hit me up on Twitter if interested to talk about this more. I am building it :)

Expand full comment

I'm on the other side of architect drawings, reading them to make the building conform. It would be interesting to have an AI Architect design a building from the ground up given nothing but customer requests. It would need material physics, suppliers and council regulations etc, basically bypassing the architect.

"Show me an energy efficient 2 storey home that conforms to local regulations and doesn't look out of place with surrounding houses..."

It is generous of you to be so transparent with your creation process, thanks.

I included your chat with this blog post in one document here:


Expand full comment

We’re working on exactly that. I’ve successfully created the AI model (patent pending) but we need more data...

Expand full comment

Working on this too, we should connect.

Expand full comment

This is such a powerful & succinct way to incentivise folks to share proprietary data. Potentially this is a way how Crypto & AI can exist together, where AI creates value & Crypto helps capture it. We're exploring & building something on these lines in the Music Industry

Expand full comment

Profound and prolific - great read!

Expand full comment

I used to call that model "Forced Participation" :-S

I like your "Give-to-Get" brand much better :-)

Expand full comment

At a former startup I used to tell customers they would be joining the "Data Consortium" because it sounded fancy and made it easier for the CMO to pitch the CEO before the CTO could block it. (I was convincing marketers to GIVE their advertising performance data to GET better ad bechmarks and recommendations)

Expand full comment

I need some business startup help, I have a model to build this device but im more a Tesla than an Edison https://en.m.wikipedia.org/wiki/Solomon's_shamir

The term "the worm" refers to not a worm but what the shamir might look like from 40 miles away at night lit up by torches or whatever they used for lighting pre flood.

It's kinda a "he who has the most gold wins" type of a situation kinda like Teslas hydro electric energy production model, which all modern hydro-electric dams are based on, if I release the plans the chicoms, Indians, Pakistanis would say hey great idea and we'd have nuclear war over global looting rights.

Hit me up if you can get me 10 minutes with Musk, maybe we can get this rolling before I check out which is pretty soon.

Expand full comment

First hash on crowdsource company eval framework (w/ gpt help)

1. Market opportunity of vertical

2. Data acquisition and management: Evaluate ability to acquire and manage large amounts of data from users, including:

* Variety and diversity of data sources

* Compatibility of different data formats and standards

* Cost-effectiveness of sources

3. AI model and technology: Evaluate AI model and technology, including:

* Algorithms

* Data processing capabilities

* User interface

* Consider accuracy, scalability, and user experience

4. User engagement and retention

5. Ethics and compliance

I put a bunch of industry maps / profit pool thoughts on my dancohen.substack.com

Expand full comment

Cost of sharing a contact number is low.

Medical records come in bulks controlled by medical universities. The data is anonymized, nevertheless it's an expensive ask for those who have such data. Which means the value provided by the service has exceed the cost.

In case of Artists. Bad artists have no proprietary style. Recognized ones are unlikely to designate it to a startup for everyone to use, because that's kind of everything they have.

Finance and investing. Well that may be fun. Zero working strategies will be shared. Zero non-working strategies have value.

Manufacturing and Science — I have no comments yet about this, maybe I should ask GPT4 :-) But Legal will likely sue the hell out of OpenAI and their copycats, but I'm sure you know a ton more about Law.

A good read, and thought provoking, thanks, David!

Expand full comment

I saw this comment on a video and it made me think about HVAC20.com, we are looking forward not backward, running not stopping:

"At 77, I've had several careers. In the 90's it was software engineer (code monkey) and I worked for several "established" software companies that went broke. Each for the same reason - they were attacked by start ups and instead of continuing to improve, they spent resources on "Stopping the competition". It's happening with EVS. IF the U.S. puts financial barriers in Tesla's growth path, Tesla will NOT take the path of self destruction. Tesla WILL continue on it's path to success - Uncle Sam will be humiliated."

Expand full comment