How We Identified All 9,716,205 Products Sold on Shopify

Shopify is the leading cloud-based ecommerce platform with an astonishing market cap of 10 billion dollars. According to their 2016 annual report they claim to have 430,000 merchants.

Since all stores they host use images, files and assets hosted on the Shopify CDN we can easily identify all domains using Shopify as they contain the string ““.


Our June web crawl contains over 3 billion pages, and we were able to uncover a total of 94,135 unique domains and 100,96,766 pages hosted by Shopify, with a total of 9,716,205 unique products.

That’s a lot of products! But, we wanted to dig even deeper so we extracted product attributes from each page.

The most expensive products for sale

Since shop owners can set any price, there are a number of listings priced at over $1,000,000 that are not true products and are either test listings or fake product listings. For this analysis we looked at products priced around $500,000.

The most expensive products appear to be real estate (who knew you could buy a house on Shopify?), jewelry, artwork, rare domain names and cars. You can even find a 1.8 petabyte storage solution and fusion machine for sale.

Toren 150 Myrtle Avenue - For Sale / Studio / Downtown Brooklyn $496,000
Audemars Piguet Jules Audemars Grand Complication 25866OR.OO.D002CR.02 $494,380
Louis Glick Starburst Diamond Ring $475,000
Emerald Columbian $475,000
Hearts On Fire Illa Constellation Diamond Bracelet $475,000
REDOUTE Pierre Joseph (1759-1840). An Original Watercolour of a Bouquet of Red Rose of Sharon. 1835. $455,000
Emerald Cut Diamond = 12.57 ct VS1 L and 2 Shields Platinum Ring GIA # 2155746063 DX0744 SMNTX0024 $437,254
Monrovia Media Cabinet $425,000
Early Paul Evans Studio Forged Front Cabinet 1964 $425,000
Patek Philippe 5029J Minute Repeating Limited Edition Watch $425,000
Nautilus-E24 $392,000
Ava Pendant White by TECH Lighting - FJ Freejack (male adapter only no ceiling canopy) / Satin Nickel / 12V Halogen $383,200
Finibus Bonorum et Malorum - Red / L $380,000
Overwatch Mei Climatologist Role Game Anime Cosplay Costumes RC-1021 - S / Full Set / Female $375,032
18th Century Boiserie from a French Chateau Complete Room $370,000
45W - "NL-MH" Post Top Lamp- 1000 Pack $359,950
36"W Marquee Chandel-Air $352,000
Burma Ruby and Diamond Bracelet - Gold $345,000
Nacre Noa Necklace - Black Pearl $340,000
MegaMc® 1648 HF 220-240V 50/60Hz 3 Phase Fusion Machine Package $331,901

What are the top currencies used on Shopify?

According to Shopify’s prospectus, their total addressable market in its key geographies is 10 million merchants.

No surprise that USD is the top currency (although Shopify is based in Canada), but there are over 3 million products for sale in other currencies, proving Shopify is seeing growth all over the world.

CurrencyProducts Available
USD 6,601,417
GBP 849,999
CAD 526,354
AUD 481,324
EUR 303,615
INR 158,615
NZD 92,605
JPY 80,765
DKK 67,658
SGD 66,005
MXN 49,764
ZAR 30,115
HKD 23,742
TWD 17,458
NOK 14,484

Which domains sell the most products?

Some merchants don’t carry any inventory and are instead using Shopify for drop shipping, something Shopify actively encourages. Drop shipping allows anyone to take orders for products on Shopify, then turn around and place those orders on behalf of the customer on sites like Alipay.

We identified a number of merchants with over 1,000 products for sale, showing Shopify sellers are more diverse than just small business owners.

DomainProducts Available 73,096 29,848 29,636 27,426 24,882 23,631 18,823 18,605 16,581 16,152 15,185 13,976 13,832 13,702 13,476 11,358 11,342 11,193 11,170 11,104 11,037 11,037 10,682 10,649 10,348

The most popular domains hosted on Shopify

We looked a the number of links each Shopify store received on the web and calculated PageRank using Apache Spark.

DomainRank 3,111 3,613 6,376 6,735 8,512 8,924 10,998 11,103 11,583 13,609 14,149 14,759 15,971 16,208 17,034 17,665 18,127 18,361 18,837 20,286 21,168 21,676 22,440 22,606 22,613 23,169 23,573 23,609 25,545 25,897 25,926 26,354 26,361 26,780 26,826 26,916 27,151 27,238 27,256 27,784 28,052 28,216 28,437 28,549 28,721 28,822 29,027 29,216 29,232 29,672 29,689 30,280 31,178 31,193 31,284 31,506 31,784 31,845 31,915 31,943 31,964 31,973 31,986 32,756 32,765 33,173 33,178 34,378 34,505 34,586 35,011 35,544 36,221 36,247 36,393 36,956 37,533 37,691 38,950 39,438 39,607 40,482 40,776 41,869 43,279 43,508 43,912 45,173 48,297 49,677 50,096 50,847 50,923 54,228 54,667 55,143 55,959 56,477 57,198 57,955 58,217 58,680 58,908 59,339 59,506 59,827 60,866 60,871 60,872 61,163 61,282 61,708 61,855 62,284 62,577 62,885 63,017 63,328 63,912 64,398 64,638 64,652 64,661 64,898 64,972 65,457 66,907 67,098 67,112 67,150 67,182 67,214 67,463 68,496 68,789 68,969 69,356 69,802 70,011 70,133 70,308 70,390 72,181 72,385 72,427 72,514 73,354 73,381 73,402 73,573 73,851 74,239 74,701 75,371 75,696 75,915 76,596 77,041 77,708 78,171 78,592 78,595 79,036 79,840 80,294 80,365 81,082 81,089 81,096 81,473 81,595 81,748 81,967 83,016 83,267 83,284 83,515 84,000 84,041 84,616 84,811 84,928 85,075 85,079 85,099 85,112 85,180 85,284 85,311 85,387 85,488 85,497 85,522 85,571 85,653 85,668 85,786 86,301 86,511


Try out the search we used for this analysis on our search engine or contact us about running queries against our crawl index.

About Us

NerdyData provides reports on which websites use a certain piece of source code.

If your competitors have a common piece of code, for example TrendyLibrary.js, all you have to do is search for that term, and we will show you all of the websites who use your competitor’s technology for your sales team to call!

December 2016

How We Found All Of Optimizely’s Clients

For those who aren’t familiar with Optimizely, they are a leader in the growing A/B testing industry.  Amazingly, they’ve managed to get their installation code down to just one single line of JavaScript as pictured below:


With one simple query we uncovered a total of 577,395 sites containing that Optimizely JavaScript library:


That’s a lot of clients! But, we wanted to dig even deeper and find all distinct Optimizely CDN URLs which contain their Optimizely client numbers. Using a regular expression search we were able to extract a list of over 12,000 URLs used on the top 1 million sites.


Try out this and other awesome search tools within our search and regular expression interfaces.

About NerdyData

Our search engine is different from search engines you’ve used before. Traditional search engines are geared towards providing answers, whereas our goal is to give you the best list of results for a query.

Our crawler has visited over 140 million homepages and collected terabytes of HTML, JavaScript, and CSS code. We’ve also designed several search interfaces that allow anybody to query against the source code of webpages, or download a list of sites containing a specific term.


October 2016

How We Found Every Single Vulnerable Website

If you’re a security researcher and you’ve found an exploit in a commonly distributed web application, you may want to find sites that contain that vulnerable application so you can notify them.

The question is how do you find them?


Google Hacking Is Now Obsolete

Maybe you’ve heard of Google Hacking, a technique hackers use to find websites that contain a common filename or block of text that is present in a vulnerable piece of software by searching to find all sites containing them.  An example of this would be a Google query like



Powered by XOOPS 2.2.3 Final

If you are familiar with this method of vulnerability hunting, or this sort of thing interests you, you’ll be excited to know we’ve taken Google Hacking to another level.

How Does This Method Differ?

Traditional search engines only let you query the text of a webpage, not the markup. You can now find all websites that have a common piece of HTML code or JavaScript, in addition to a block of text. Here are some examples of what can done:

Websites running WordPress that are using version 3.5

Query: <meta name="generator" content="WordPress 3.5" />

imageClick to see query results

Websites with an upload form on their homepages

Query: name="MAX_FILE_SIZE"

imageClick to see query results

Websites using the Invision Power Board Forum

Query: ipsBadge

imageClick to see query results

New flaws in web application security measures are constantly being researched, both by hackers and by security professionals. Most of these flaws affect all dynamic web applications whilst others are dependent on specific application technologies.

In both cases, one may observe how the evolution and refinement of web technologies also brings about new exploits which compromise sensitive databases, provide access to theoretically secure networks, and pose a threat to the daily operation of online businesses.


March 2016

Mixpanel Vs. Goliath

In a vast sea of analytic platforms, how many users choose Mixpanel over the competition?


It takes just 5 minutes to setup, and once you start watching the real-time data flood in, it’s clear that Mixpanel is not only the most “modern” and sleek analytics platform to-date, but also provides a unique take on customer-oriented statistics and insight.

This isn’t a blog post about why Mixpanel is better — instead we want to show you some interesting statistics that exemplify the uphill battle Mixpanel faces in competing with the analytics juggernauts.

There is no denying that Google Analytics and Omniture dominate the online analytics industry. But just how big are they?  We researched the topic and found:

After seeing these numbers we thought, “Well, Mixpanel has a low adoption rate among all webmasters, but maybe their target market is larger web companies”.  So we narrowed our search to just the top 1 million sites on the internet (based on traffic)  Mixpanel appears on just 540 websites out of the top one million. 

How hard would it be for Mixpanel to convince Google Analytics users to make the switch?

Upon further inspection we found 87% of domains that have Mixpanel code also use Google Analytics.

It’s tough to get out of Google’s shadow, so how will Mixpanel convince webmasters to pick them as their primary analytic platform?

About NerdyData

Our crawler has visited over 140 million homepages and collected terabytes of HTML, Javascript, and CSS code. We’ve also designed several search interfaces that allow anybody to query against the source code of webpages, or download a list of sites containing a specific term.


June 2014

How Facebook Tricks Webmasters To Collect Users Web Surfing History


With the recent announcement that Facebook will begin selling your web browsing history to advertisers, we thought we’d take a look at how they actually get your web browsing history in the first place.

Most people assume that Facebook tracks them when on, but you don’t have “Facebook” installed on your computer and you don’t “open up Facebook” to surf the web.  Where do they get data from?

Even without visiting,, or, you’re likely to encounter elements from these sites almost seven times a day. The trackers come in the shape of cookies, JavaScript, 1-pixel beacons, and Iframes, and cute looking widgets.

These elements have the ability to ping Facebook’s servers with:

  • The URL of the page you’re viewing
  • The site that referred you to that page
  • The browser you’re using
  • The OS you’re using
  • Your approximate geographic location
  • The size of your screen
  • If you’re logged into Facebook they can associate you with your Facebook profile.

The Facebook Like Button

One very popular widget on the internet is the Facebook like button. Facebook’s Like button has made it easy for hundreds of millions of Web users to share content with their friends on the social networking site. The button appears on more than one-third of the top thousand websites and has been integrated into everything from Bing search results to countless blogs around the ‘net. What users may not realize is that the soft blue thumbs-up is tracking their surfing habits, even if it doesn’t get clicked.


Any time the Like button is displayed, information is zapped back to Facebook’s servers.

Facebook Connect and Your Privacy

Facebook Connect is the next iteration of the Facebook Platform that allows users to “connect” their Facebook identity, friends and privacy to any site. Even if you never login to a site using Facebook Connect, the fact that they have the Facebook Connect JavaScript snippet present on their site means Facebook can see that you are present on that site.

Over 50,000 sites use Facebook Connect, and if you’ve visited one of them, you’ve been tracked.


Like Boxes Are Creeping On You Too


The Like Box is a special version of the Like Button designed only for Facebook Pages. It allows admins to promote their Pages and embed a simple feed of content from a Page into other sites. As this is a JavaScript widget, every time it is loaded it pings information about you back to Facebook servers.

We found over 1 million websites that have this box.  (and additionally show pictures of the followers faces)


What Can you Do About It?

Twitter and Pinterest, which track people with their Tweet and PinIt buttons, offer users the ability to opt out. And Google has pledged it will not combine data from its ad-tracking network DoubleClick with personally identifiable data without user’s opt-in consent. Facebook does not offer an opt-out in its privacy settings.

Instead Facebook asks members to visit an ad industry page, where they can opt out from targeted advertising from Facebook and other companies. The company also says it will let people view and adjust the types of ads they see.

September 2013

How To Find New Clients For Your SEO Agency

NerdyData is a search engine for source code.  This post outlines some ways an SEO agency can use our tool to discover potential new clients, en masse.

It’s a gold rush out there for SEO agencies. As businesses come online in droves, they quickly discover that simply paying someone to develop a website will not get you the traffic you need to be profitable. Everyone wants to be at the top of a hot Google search. A criminal attorney in San Francisco who ranks for criminal attorney in san francisco will likely receive many contacts from people interested in legal representation.

Only a small percentage of websites show up in a top placement in organic search results for popular queries.  There are millions of websites that exist, but are are not optimized in a way that will make them appear for these frequently searched keywords, and so they are displaced by those that do optimize.


An SEO agency exists to bridge the gap between Google’s search algorithm and technologically unsavy business owners.            

We have come up with some ways an SEO agency can surface these poorly optimized sites using our search engine. Here are some examples: 

Search for sites that have “niche” and “location” in their <title> tag or on-page text, but DO NOT have a meta description tag

  • If you’re an SEO agency you could use this type of search to narrow down sites owned by “criminal attorneys” in “san francisco” that most likely doesn’t have an SEO agency because they lack a meta description tag on their web pages.

Additionally, we’ve made a number of tools that let you search within the <title> and Meta Descriptions of websites.

Search for sites that don’t have Facebook or Twitter badges, buttons, or social links on their pages.

  • There’s a good chance these sites do not have an online social presence.  Why don’t they?  These businesses could find new customers by creating a social media presence, but may not know how to create one.

Search for sites that use outdated or poorly optimized software

  • Many small business websites are using a version of a CMS, forum, or blog software that is not optimized for high volume queries in Google.  These sites are likely to already contain content, but are not designed in a way that allows them to capture search traffic for terms relevant to their business.

If you want to perform searches like these, try out NerdyData, a search engine that indexes the full source code of webpages and let’s your query using code snippets, as well as keywords.

Additionally, you can submit a request through this form and we can get in touch with you to help you uncover new business leads for your agency.

Or follow us on Twitter!