June 23, 2023

#001: How to Use Data to Find a Startup Idea

There's never been a better time to start a business.

Arguably the biggest challenge faced by any aspiring entrepreneur is how to pick a good idea for your startup. The key to ensuring that you are tackling the right problem—one that meets an urgent customer need and can support a growing business—is data. 

A data-driven business idea will help you avoid many of the pitfalls that lead to startups going bust. When you follow the data, you’ll be less likely to rely on potentially misleading gut instinct and the temptation to chase the hot new thing. You’ll be better equipped to match a business idea to the right industry and ensure that the business idea is solving real customer problems. You’ll also have the information you need to understand the structure and size of a market, the customer buying process, and demand patterns for your product. 

This article is a guide to the various data sources that you can use to explore and validate business ideas. We’ll cover both primary and secondary data sources, and compare their strengths and weaknesses. The data sources discussed in this article will be useful for any type of business, but we’ll mostly be discussing this data in the context of B2B and B2C tech startups.

The best place to start looking for business ideas is in secondary data.

This is data that has been collected by third parties from primary sources (i.e., customers). Secondary data is an efficient way to quickly get a high-level understanding of a market, product, or customer segment. It is often provided in a pre-processed format such as an industry report, which saves you a lot of time that would otherwise be spent collecting, cleaning, and processing the data yourself. 

There is a massive market for secondary data—often called market intelligence—but in this article we’ll mostly focus on free and publicly available data sources. Although private secondary data can be immensely valuable, it is often very expensive and should only be used to further validate a well-defined business idea. Fortunately, there is an abundance of high-quality publicly available secondary data if you know where to look.

In this article, we’ll examine 5 secondary data sources: government data, industry data, investor data, marketplace data, and trend data.

1. Government Data


The North American Industry Classification System (NAICS) is the standard used by Federal statistical agencies to group and compare businesses within an industry.  When you search for a keyword in the NAICS database, it will return a list of related sub-sectors and industry groups, and a unique 2 to 6 digit code for that industry. These lists are useful for identifying business ideas within a large market, searching for other data using NAICS codes, and identifying expansion opportunities into adjacent industry sub-sectors. 

How to use it

The NAICS database is published by the US Census Bureau and you can search the database using keywords to find a code for a particular industry or industry sub-sector. 

For example, if you search the keyword ‘software’ within NAICS, it will return a list of 49 national industry groups ranging from “mass reproducing CD-ROMs” to “Welding robot applications.” If you select the code for one of these subgroups, it will show a condensed list of similar industry groups and a definition for those groups.

One of the most powerful ways to use NAICS is to get a quantifiable high-level overview of a particular industry. Once you have found the NAICS code for an industry from the Census Bureau website, you can use that code to get data on the number of businesses, the distribution of businesses based on the number of employees, and the distribution of businesses based on revenue. This information is available through the NAICS Association, a for-profit entity that provides a limited amount of free market sizing data. This will help you identify broad industry trends and the size of the business opportunity within a particular industry. 


There are a variety of industries not included in the NAICS database because they are largely government dominated. They include: utilities, public transportation, libraries, and municipal waste removal. See this list for industries not included in NAICS. 


County Business Patterns is an annual series that provides subnational economic data by industry. This series includes the number of establishments, employment during the week of March 12, first quarter payroll, and annual payroll. CBP data is particularly useful for B2B businesses that want to get a better understanding of their customers. You can use this data to identify geographies where businesses in various industries are concentrated, the average size of businesses, and estimate annual sales for those businesses.

How to use it

County Business Patterns is published through the US Census Bureau, which provides both queryable tables and raw data in CSV format. You can search tables using information such as NAICS industry codes, zip codes, number of employees, and business payroll.


CBP data excludes self-employed individuals, railroad employees, agricultural production employees, and most government employees. The data is also updated annually, which means it won’t be as up-to-date as similar data published by the US Bureau of Labor Statistics. 


The Bureau of Labor Statistics publishes data for over 100 industries. Pages for each industry display a current "snapshot" of national data obtained from different BLS surveys and programs. This includes workforce statistics, earnings/hours, prices, workplace trends, and more. The BLS industry dataset is useful for a high level overview of the market for any industry or industry subsector. It will show you the number of businesses operating in that industry on a quarterly basis, which allows you to track industry growth overtime. You can also see data on labor productivity within the industry, which can be useful for understanding how a given industry can benefit from tech-driven productivity enhancement. 

How to use it

Use the NAICs code for an industry to search the database or search for an industry alphabetically. Once you have located the database for your industry, this can help you understand how businesses within a particular industry deploy their capital using a top down approach based on NAICS and BLS data.

To do this, start with the NAICS data that reports the number of businesses by revenue and employee size. For most industries, you’ll find that the number of businesses segmented by revenue and employee headcount is roughly the same. Now, multiply the number of employees at a business of a given size by the average salary paid to those employees that is found in the BLS data. This will give you an approximate sense of how much businesses in that industry spend on labor. Based on this information, you can get an approximate range of margins for businesses in that industry based on their size. This data will be useful in helping you determine the size of your addressable market. 

For example, the NAICS data shows that there are 279,420 law offices (NAICS code 541110) with $500,000 to $999,999 in annual sales. The vast majority of these law offices (268,118) have 1-4 employees. The BLS data shows that lawyers make a median salary of $133,260. This means that a typical small law office will have labor costs ranging from $133,260-$533,040 on revenues ranging from $500,000-$999,000. Although this doesn’t account for other costs borne by these firms, we can get a rough range of their margins. 


The BLS data is grouped by industry, rather than industry sub-sector. As such, it only provides an approximate overview of the size of a particular industry group within a broader market. 


The Current Industrial Report (CIR) program provides monthly, quarterly, and annual measures of industrial activity. The CIR data is primarily useful for ideating B2B businesses that will service customers in manufacturing and industrial production, which ranges from the aerospace industry to footwear. Although most of the data is years out of date, these tables can still be useful for identifying niches in various manufacturing industries that could serve as an entry point for a new business. 

How to use it

The Census Bureau publishes its data in tables grouped by industrial sector.


The vast majority of the data in the CIR database is years out of date. This data is only useful for businesses that operate in industries with a substantial manufacturing component.

2. Industry Data


Industry associations provide a collective voice for individual businesses within an industry. Members of an association share information, discuss current issues facing their industry, develop industry-wide standards, create codes of conduct, and more. You can browse or find specific industry groups through a number of directories, most notably the  Directory of Associations, Encyclopedia of Associations, and National Trade and Professional Associations Directory.

How to use it

Industry associations often publish official trade publications or independent reports. You can find individual reports on their website and through this searchable index of industry trade publications. It is also advisable to subscribe to the association’s email newsletters to stay up to date on the latest news. 

If you can’t find the right datasets for your library, search for a roster of association members and reach out to them on LinkedIn or email. These people will likely help direct you to publicly available data and reports. They may also be willing to share paywalled reports for free. 


Data provided by industry associations can be of questionable quality. Industry associations are first and foremost advocacy organizations for an industry, which gives them an incentive to spin data to support their industry. Be wary of figures published by industry associations if you don’t have access to the raw data that was used to derive those figures. Many industry associations charge a membership fee for access to their reports, data, and other materials. 

3. Investor Data

There are 3 primary sources of investor data: (1) venture capital and private equity firm portfolios; (2) public market data; and (3) aggregated investment data. 

How to use it


Venture capital and private equity firms often make investments in specific industries and sub-sectors. The companies they invest in are usually listed on their website. Start by finding a handful of companies operating in your target industry and use tools like Pitchbook, Crunchbase, or Grata to identify investors that have backed those companies. For venture-backed companies, look at who invested in their seed or Series A round, which will reveal emerging opportunities within that industry and sub-sectors or product categories where you may find heightened competition. A benefit of analyzing VC portfolios is that it can help you validate your idea–if VCs are willing to invest in companies that have similar products or operate in similar niches, this is a good signal that your business idea is sound. For companies owned by private equity firms, examine their portfolio for legacy incumbents in your industry or sub-sector. Unlike VCs, PE firms typically invest in more mature companies that are generating free cash flow or have the potential to in the future. Companies in PE firms tend to be much older on average compared to VC-backed companies, which means you can use these portfolios to identify “sleep incumbents” whose products you can improve upon with your own business.


When a company is publicly traded, it is legally required to produce reports that detail important information about the business. Prior to filing for an initial public offering, for example, a company must issue an S-1 form, which contains information about their business model, competition, products, and risks faced by the business. Once a company is publicly listed, it must issue regular quarterly and annual reports (Forms 10-K and 10-Q) that provide updates on the state of the business. These reports can provide a wealth of information that is useful for identifying product ideas, underserved industry segments, and risk factors of launching a business into a particular industry. The SEC provides a queryable database called EDGAR that you can use to find all the relevant filings for publicly traded companies in your industry.


There are a variety of services such as Pitchbook, Crunchbase, or Grata that provide detailed investment information about companies in public and private markets. This includes information about the amount of investment in the company, the investors who have backed the company, investment multiples, and more. This information is particularly useful for understanding the growth prospects of VC and PE-backed businesses. 


Examining VC and PE portfolios only provides information about certain types of companies. Venture capitalists, for example, will typically only invest in businesses that have extremely high growth potential and make investments during the earliest phases of a company’s life. PE firms, by contrast, tend to invest in more mature businesses with significant cash flow potential. As such, these portfolios will only provide you with a limited view of the types of viable businesses in a given industry. Furthermore, platforms like Pitchbook or Crunchbase that provide public and private investment data are typically only accessible after you purchase an expensive license.


Consulting firms frequently produce reports on industry trends and issues that can be an invaluable resource for finding business ideas. These market research reports are based on extensive amounts of primary data collected by the firm, which is compiled into easily digestible reports that can reveal untapped opportunities within an industry.

How to use it

Use this list of market research firms to identify consultancies that cover the industries you’re interested in. Some of the more well-known market research firms include McKinsey, Accenture, Dun & Bradstreet, IBIS World, Gartner, and Statista. The vast majority of these consulting firms charge high prices for their reports, but some firms—notably McKinsey and Accenture—provide high-quality research reports for free on their site that are a great starting point for identifying opportunities in tech-driven industries. 


The market research reports provided by consultancies are unrivaled in the depth of their research, but unfortunately the vast majority of these reports are paywalled. The cost of accessing a single report or dataset can range from several hundred to several thousand dollars, which typically isn’t worth it when you’re still searching for business ideas. 

4. Marketplace Data


Amazon data is particularly useful for e-commerce, direct-to-consumer, and B2C businesses. The platform offers a variety of free and paid tools that you can use to explore product offerings, identify customer pain points, and find untapped opportunities. 

How to use it

If you want to understand what consumer products are most in demand, Amazon’s most-wished-for list is an invaluable resource that allows you to search for the most popular products by category. Once you’ve honed in on some promising products, spend time reading customer reviews (both negative and positive) to identify common pain points and must-haves. If you want more than anecdotal customer feedback, you can use tools like Kimola, Commerce.ai or Amazon’s Comprehend to collect all the reviews for a product and do sentiment analysis to better understand how customers view the product in aggregate. These tools will typically allow you to do a limited amount of sentiment analysis for free and have affordable paid tiers for further analysis. Another useful tool for analyzing product trends on Amazon is AMZScout, a plugin that allows you to search product history data to see how demand and prices have changed over time. 


Many of the tools required to analyze data on Amazon are only available on paid subscriptions. Furthermore, the heterogeneity of products on Amazon make it difficult to make useful comparisons between products. Amazon also has a known problem with fake reviews and products, which can distort data about customer preference, product price, and product availability. 


Many industries have a dominant software provider that runs a marketplace where businesses can sell applications that interface with the platform. These platform marketplaces can be a useful starting point for identifying business ideas for software companies in a given industry. You can quickly see the types of apps you’re competing against, customer pain points with those apps, and price points. 

How to use it

Find the dominant software platform in your industry using a web search. Visit the platform’s marketplace and browse its application library to see the types of apps that are available to customers. Some examples of platform marketplaces include: Shopify’s App Store, Procore’s Marketplace, and Epic’s App Orchard.   


These app marketplaces only show software that interfaces with the dominant platform, which means it excludes many of the competitors in a given industry. While these app marketplaces can be useful for business ideation, you should conduct further research to better understand the competitive landscape for a particular idea. 

5. Trend Data


Google Trends is a powerful free tool that can help you identify business opportunities by examining what people are searching for over time. The tool includes search data from 2004 to the present that can be segmented across timeframes as small as 30 minutes. All the trend data for a chosen topic is indexed and normalized, which means that it is pulled from an unbiased sample of Google searches (i.e., it is not an exact number of search terms, but indexed from 1-100) and relative to all searches at a given time/location.

How to use it

Trends allows you to break down search terms by category. For example, if you’re trying to search for software related queries in a particular industry, you can select “Computers and Electronics” or “Internet and Telecom” as a category to help differentiate it from other searches. You can also query by search type (e.g., web searches, image searches, news searches, or YouTube searches), which can be useful for refining the trends data to meet your needs. For example, a web search for “software” may indicate customer interest whereas a YouTube search for “software” is more likely to represent someone searching for a tutorial.

Google Trends is offered for free, but it is limited to high-level overviews of search trends. If you want to explore a search trend in more depth or include searches on other platforms besides Google, there are a variety of paid and free tools such as Glimpse and TrendHunter available to offer more granular data about those search terms. One particularly powerful and unique search tool is SpringWise, which maintains a 10,000+ strong “Innovation Library” full of various businesses operating in a range of different sectors that you can browse for business ideas. 


Google Trends can provide some insight into how interest in various products have changed over time, but it lacks a lot of important context that is required to understand whether a business idea is viable. 


If you want to get a deeper understanding of a customer’s needs or the types of businesses that exist in an industry you’ll want to use a more refined approach to analyzing keywords. There are a variety of tools that allow you to see how customers search for products, which you can use to come up with business ideas that can solve highly specific and underserved challenges. These tools were mostly designed for existing businesses to help them refine SEO, but they can be a potent source of business ideation as well. 

How to use it

There are many low cost tools you can use to see what kinds of keywords people are searching for such as Keyword Discovery and Word Tracker. One of the most effective ways to use these tools for business ideation is to use “action queries” that include a question, an action, and an industry or sub-sector. For example “how to reduce costs house construction.” This is a starting point for revealing more specific challenges that customers within an industry are facing that can be the foundation of a great business. These tools will also reveal how frequently people are searching about these problems, which is a very rough proxy for customer demand.

Other keyword research tools, such as Moz or Ahrefs work in reverse. To pull business ideas out of these tools, start by identifying competitors within an industry and use the tools to discover the keywords that drove customers to their website. If you find that customers are searching for a solution that isn’t offered by that competitor, there is the potential to launch a business serving that need.  


Keyword research is a very labor intensive way to collect data for business ideation. You will have to constantly iterate on search terms to identify the keywords that customers are using before you have a meaningful signal that points to underserved pain points.


Social media platforms are a goldmine for business ideation. You can use these platforms to efficiently search for customer requests from competitors and identify customer pain points. The key is identifying the right social communities for your industry. Social platforms like Facebook, LinkedIn, and Twitter are useful for finding customers who are dissatisfied with existing products. Forums such as Reddit and Quora are useful for finding communities dedicated to particular products or industries, which allows you to understand the needs of customers within those groups. 

How to use it

The most straightforward way to get business ideas from social media platforms is to search for keywords related to a particular industry or product on the platform. But if you want to dive deeper, you can use social influence tools like Kred or PeerIndex to identify key influencers or decision-makers in a given industry. By engaging with these influencers you can get a better understanding of customer pain points and problems with existing solutions. If you want a more granular understanding of an industry or product, you can use tools like Hootsuite or BrandMentions to discover what influencers and competitors in that space are discussing and analyze the language they use to discuss key issues. 


Social media is tailored for polarizing content, which means that data collected on these platforms about an industry, a product, or a competitor will tend to cluster around extremes. You should exercise caution when using social media data and understand that this data may not be representative of the average consumer for a particular product. Social media platforms have also recently started restricting access to their user data, which makes robust quantitative analysis increasingly difficult.

After you review secondary data, begin collecting primary data.

After you’ve conducted a thorough review of secondary data, you should have a rough idea for a business that will meet underserved needs in a well-defined and underserved industry segment. But before you launch your startup, it is critical to validate that business idea by collecting your own primary data.  


For solo entrepreneurs with a limited budget, there are 3 main ways to gather primary data:

  1. One-on-one interviews with customers
  2. Surveys 
  3. Observation


1. Interviews

Before conducting an interview, you need to determine who you’ll be speaking to. Is the customer the same as the end user of your product? If not, you will need to interview both the customer and user to validate your business idea. This will ensure you avoid building a product for the customer that doesn’t solve the user’s real problems. 

A common mistake in customer interviews is to ask the customer what they want. This seems counterintuitive, but customers often don’t know what they want or the solution they want is too idiosyncratic to offer as a product to a large customer base.  

Instead, use interviews to understand a customer’s problems. Ask them to describe their daily workflow and pain points. If you’re talking to a consumer, ask them what led them to make a previous purchase in your product category (e.g., price, functionality, etc.) 

If you’re struggling to come up with interview questions, take the time to write out a set of hypotheses about your business. Some examples include:

  • A customer will pay $__________ for this product. 
  • A customer would prefer to pay for this product through a monthly subscription  rather than based on usage. 
  • A customer will only use this product if it has ___________ as features. 
  • The most useful feature of this product is ________________.

Once you have some hypotheses, you can formulate questions that will test them. Some good questions you should ask during your interview include:

  • What made you decide to purchase your existing solution?
  • Who made the decision to purchase that solution?
  • How do you use the product in your daily workflow/life?

Remember to let the customer do most of the talking during an interview. You are there to listen and learn from them, not to try to sell them your product or business idea. 

You should aim to conduct at least a dozen customer interviews. This will ensure that you’ve actually found the right customer profile for your business idea. Your goal is to create an “ideal customer profile,” which represents a hypothetical customer that would benefit the most from the product offered by your company. Your ICP will be immensely useful once you’re ready to launch your product because you will be able to target customers that share the same traits and are the most likely to purchase your product. 

In a B2B context, some questions you should be able to answer about your ideal customer include:

  • Number of employees at the business
  • Age of the business
  • Revenue of the business
  • Industry or industry niche

For a consumer-facing business, your ideal customer profile should include basic demographic information (age, location, income, sex, etc.) and key behavioral traits (how often do they purchase/use similar products). 

2. Surveys

One-on-one interviews are a great way to understand a particular customer’s pain points, but if you only rely on individual interviews you run the risk of building a product that doesn’t really meet the needs of most of the customers in your market. Once you have a few individual customer interviews under your belt, you should consider creating a brief survey to distribute to dozens of other customers to validate the information you collected during your interview. There are plenty of free tools available that you can use to create, distribute, and analyze survey data. One of the most popular options is Google Forms.

These surveys should be based around customer responses from your interview and include a mix of open-ended and quantifiable questions. If a customer interview revealed that a certain product feature was the most important purchase factor for them, ask survey respondents to rank the importance of that feature on a numerical scale. Make sure to include a free form response option about features as well. As an example, your survey question flow may look something like this:

  • On a scale of 1-10, how important is ___________ feature to your business?
  • What is the most important feature for your business and why?

When designing your survey, be sure to collect basic information about the respondent so you can see how they stack up to the ICP you created during your interview.

The challenging part of launching a survey is finding participants. The best way to find survey participants is to use industry association rosters, social media groups, and forums. However there are also paid services like SurveyMonkey that can find targeted survey audiences based on your industry sector.

3. Observation

If you’re a B2B business, visit the customer’s workplace and watch them work. If you’re a consumer-facing business, talk with people who have used your competitors’ products and watch them use it. This will help you identify areas where you can alleviate inefficiencies or pain points that they may not even be aware of. 

Some things to watch for as you observe your customer:

  • Manual tasks that could be automated (e.g., data entry)
  • Pen-and-paper processes that could be digitized (e.g., intake forms)
  • Repetitious processes that could be batched (e.g., processing invoices) 
  • Distributed tasks that could be centralized (e.g., using many different apps)
  • Are there things that the customer did that ran counter to your expectations?
  • How does the customer interact with their customers? What are their customers’ common complaints? 

Try to quantify processes wherever possible. If you spend a day watching a customer work, count how many times they do a task and use a stopwatch to record how long certain tasks take to complete.  This will help you identify tangible value propositions for your business such as time or cost savings.

Primary and secondary data is the foundation of a solid business idea.

By using these tools, you can start to identify promising areas for deeper exploration and understand how your business fits into the broader industry landscape. This data will give you a general sense of who your customers and competitors are, how many customers and competitors there are in your target industry, and a basic understanding of how customers in your industry spend money. 

Over the next few weeks, we’ll use the data you collected from these sources to show you how to get a more robust understanding of your market, competition, and customers. Be sure to subscribe so that you won’t miss future editions and visit our website for more actionable information on launching your business.

Get started with Firstbase

Start, grow, and manage your business. We're with you each step of the way.