Analyze Chrome’s Browsing History with Python

by Karthikeyan KC — updated on May 25, 2016 · in Computing

Analyze Chrome’s browsing history with Python

In this edition, I’ll deal with some Python codes and help you find the websites that you would visit more often. Assuming that you don’t clear your history often, I advise you to prepare yourself a little for any sizeable impact or perhaps, regret, upon seeing your browsing activities.

I am focusing on the more common audiences here. So, I’m going to jump straightaway into the layout of my little python (2.7) script that I used to analyze my Chrome’s history on Windows, which eventually led to a little epiphany moment.

Chrome’s history database

Chrome stores its data locally in an SQLite database. So all we need to do here is write a consistent Python code that would make a connection to the database, query the necessary fields and extract the required data, which is the URLs visited and the corresponding total visit counts, and churn it out like a puppy.

On a Windows machine, this database usually can be found under the following path.

C:\Users\<YOURUSERNAME>\AppData\Local\Google\Chrome\User Data\Default

As my code is focused on Chrome on Windows, I will leave the code for yourself to tweak it for Linux and Mac. The following code gives us the path on Windows.

data_path = os.path.expanduser('~')+"\AppData\Local\Google\Chrome\User Data\Default"
files = os.listdir(data_path)
history_db = os.path.join(data_path, 'history')

Now that we have the path and the database name, we shall do some SQL operations in it. To do that, you’d need to import the sqlite3 library that comes with Python. The variables are self-explanatory. If necessary, refer the sqlite3 documentation.

c = sqlite3.connect(history_db)
cursor = c.cursor()
select_statement = "SELECT urls.url, urls.visit_count FROM urls, visits WHERE urls.id = visits.url;"
cursor.execute(select_statement)

results = cursor.fetchall()

The important thing here is the select_statement. How did I know which fields to fetch from the table? Well, you could either download the SQLite browser and get a glimpse of the schema or get the table and run a master fetch of the table from the console.

So, when the program runs, the fetchall function returns a list of tuples. Go ahead and try printing it. You’d end up with a console full of tuples with the website and the number of times the URL was visited in a particular session or interval, as values. Though it helps us to get a gist of our history, it’s not helpful in this format. The URL is repetitive and I have no idea why the visit count are scattered along the list. However, we can loop every occurrence of the URL, make our own count, and associate these two as a key-value pair in a dictionary.

Parsing URLs

If you print the tuple in the console, we can understand that the counts are much lesser than expected. This is due to the different counts for both HTTP, HTTPS URLs, including the subdomains. While the following parsing part is not the pythonic way to do it, I decided to go with a nifty function that would split the URL into components and return the plain naked domain. For consistency, I’d recommend you to use the urlparse library and use the netloc function to do it in a better way.

def parse(url):
	try:
		parsed_url_components = url.split('//')
		sublevel_split = parsed_url_components[1].split('/', 1)
		domain = sublevel_split[0].replace("www.", "")
		return domain
	except IndexError:
		print "URL format error!"

Creating and sorting the URL, Count dictionary

As seen above, now all we have to do is to loop through the list of tuples, fetch the URL, parse it and then add it to a dictionary as a key, and for every occurrence for the URL, you simply add an incremental counter forwarded as the value.

sites_count = {} #dict makes iterations easier :D

for url, count in results:
	url = parse(url)
	if url in sites_count:
		sites_count[url] += 1
	else:
		sites_count[url] = 1

As Python’s dictionary is intrinsically not ordered, it should be sorted by the value for a clean result. I’ve used the OrderedDict method to do that.

sites_count_sorted = OrderedDict(sorted(sites_count.items(), key=operator.itemgetter(1), reverse=True))

Analyzing and plotting the results

Now that we have all we need in a good shape, all you need to do is to put up a strong face and pile up all the energy in you to feel whatever it is that you would feel when finding your most visited sites. I hardly have any experience in plotting with matplotlib, so the following is the simplest graph I could come up with.

plt.bar(range(len(sites_count_sorted)), sites_count_sorted.values(), align='edge')
plt.xticks(rotation=45)
plt.xticks(range(len(sites_count_sorted)), sites_count_sorted.keys())
plt.show()

While you might choose to plot only the first fifty or hundred sites, I chose to plot every damn URL I visited. This is my graph as on April 20, 2016, showing a zoomed-in snapshot of the primary sites I spend my time on.

The graph showing my most visited sites.

Time well spent!

Complete code – Python history analyzer for Chrome

Feel free to fork it and tweak it.

import os
import sqlite3
import operator
from collections import OrderedDict
import matplotlib.pyplot as plt

def parse(url):
	try:
		parsed_url_components = url.split('//')
		sublevel_split = parsed_url_components[1].split('/', 1)
		domain = sublevel_split[0].replace("www.", "")
		return domain
	except IndexError:
		print "URL format error!"

def analyze(results):

	prompt = raw_input("[.] Type <c> to print or <p> to plot\n[>] ")

	if prompt == "c":
		for site, count in sites_count_sorted.items():
			print site, count
	elif prompt == "p":
		plt.bar(range(len(results)), results.values(), align='edge')
		plt.xticks(rotation=45)
		plt.xticks(range(len(results)), results.keys())
		plt.show()
	else:
		print "[.] Uh?"
		quit()

#path to user's history database (Chrome)
data_path = os.path.expanduser('~')+"\AppData\Local\Google\Chrome\User Data\Default"
files = os.listdir(data_path)

history_db = os.path.join(data_path, 'history')

#querying the db
c = sqlite3.connect(history_db)
cursor = c.cursor()
select_statement = "SELECT urls.url, urls.visit_count FROM urls, visits WHERE urls.id = visits.url;"
cursor.execute(select_statement)

results = cursor.fetchall() #tuple

sites_count = {} #dict makes iterations easier :D

for url, count in results:
	url = parse(url)
	if url in sites_count:
		sites_count[url] += 1
	else:
		sites_count[url] = 1

sites_count_sorted = OrderedDict(sorted(sites_count.items(), key=operator.itemgetter(1), reverse=True))

analyze (sites_count_sorted)

If you have any issues, please do let me know in the comments or at GitHub. Have fun.

This post was first published on April 20, 2016.

Python

Karthikeyan KC

Aeronautical engineer, dev, science fiction author, gamer, and an explorer. I am the creator of Geekswipe. I love writing about physics, aerospace, astronomy, and python. I created Swyde. Currently working on Arclind Mindspace.

37 Responses

ARUN PRAKASH
July 13, 2022
i got an error like this. please say what is the eroor and how to clear it.
files = os.listdir(data_path)
FileNotFoundError: [WinError 3] The system cannot find the path specified: ‘C:\\Users\\Gayathri\\AppData\\Local\\Google\\Chrome\\User Data\\Default’
Reply
AB
October 10, 2020
Is there any way I can make this work in Jupyter notebook?
Reply
Martinez
October 31, 2018
Hello. The code is working but getting an error database locked..how can i Solve this. Thanks alot
Reply
- Karthikeyan KC
  October 31, 2018
  You need to close the browser before you run the script.
  Reply
Balasubramanian
April 20, 2018
(Unicode error) 'unicodeescape' codec can't decode bytes in position 28-29: truncated \UXXXXXXXX escape
Reply
- Balasubramanian
  April 20, 2018
  data_path=os.path.expanduser('~')+"\AppData\Local\Google\Chrome\User Data\Default" taking "(Unicode error) ‘unicodeescape’ codec can’t decode bytes in position 28-29: truncated \UXXXXXXXX escape" Error!!! how can it be solved ?
  Reply
  - balasubramanian
    April 20, 2018
    I have modified Data_path like data_path=os.path.expanduser('~')+ r"\AppData\Local\Google\Chrome\User Data\Default" and applied "r+"\AppData\Local\Google\Chrome\User Data\Default Now getting sqlite3.OperationalError: database is locked
    How to unlock the database? Pls explain.
    Reply
    - Karthikeyan KC
      April 20, 2018
      The database will be locked when the browser is running in the background. You need to close your browser before you run the script.
      Reply
      - Balasubramanian
        April 20, 2018
        now closed and running fine
        Reply
- Karthikeyan KC
  April 20, 2018
  If your ‘\’ is not escaped properly, this might happen. In the following response, I see that you have converted the path into raw string. That should work.
  Reply
  - Balasubramanian
    April 20, 2018
    Now I am getting the error for url, credentials in credential.iteritems():" below AttributeError: 'dict' object has no attribute 'iteritems'.
    How can this be resolved?
    Reply
    - Karthikeyan KC
      April 20, 2018
      Please read the article again. Use Python 2.7.x instead of Python 3. Or, if you are using Python 3, use credential.items().
      Reply
      - Balasubramanian
        April 20, 2018
        Getting below error
        f.write("\n"+url+"\n"+credentials[0].encode('utf-8')+ " | "+credentials[1]+"\n") TypeError: must be str, not bytes
        where we using string? y mentioned not bytes?
        Reply
        
        Balasubramanian
        April 20, 2018
        i modified credential.items() in for url, credentials in credential.items(): statement
        
        Karthikeyan KC
        April 20, 2018
        Please use Python 2.7.x. Or make appropriate changes to the code if you want to use Python 3. Refer this – https://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit
        
        Balasubramanian
        April 20, 2018
        i am using python 3.6… if it is good or not?
        
        Karthikeyan KC
        April 20, 2018
        Python 3.6 is good. But the code above is written in Python 2.7. So if you want the code to run, use Python 2.7. I can’t really make this clear any more. :)
        
        Balasubramanian
        April 23, 2018
        Karthikeyan KC, how to view Browsing Internet Explorer History with Python?
        I need “data_path” and “select_statement ” like Chrome.
        
        Karthikeyan KC
        April 23, 2018
        This script is for Chrome only. For browsers like Firefox it’s only a matter of modifying the SQL query. I can’t help you with internet explorer.
bhavik
February 22, 2018
Hey! nice article , i wanted to know how can i scrape the browsing history for a remote computer or any device and what information do i need and is that possible with python.
Reply
- Karthikeyan KC
  February 22, 2018
  What are you actually trying to accomplish by this? The above code is to get one’s own browser history and it is supposed to be a fun script for learning and analysis purposes. You are asking to remotely access a browser and record its history. I can’t help you here.
  Reply
rutva
January 24, 2018
how to get chrome search history using vb.net. give ans
Reply
- Karthikeyan KC
  January 24, 2018
  Please stop spamming the comment section. You need not ask me more than once. The above script is written in Python. The logic is self explanatory. Go figure out a way to write one in VB .net. Also, note that I’ll have to place a ban on you if you keep spamming the comments.
  Reply
lori
November 11, 2017
I love this! Now can you turn it into something button pushing simple for a Python beginner so I could just run it and see my outputs? Just kidding….. I was looking for something I can use to help me determine some client time for billing and ran across you …… I know this isn’t it, but I still loved it when I ran across it…… you are great! thanks for sharing!! I’ve got to come back to this when I know more about how to work in Python on my machine…. I’m a programmer from the days of Cobol and C (before the +) and I’m self learning web development and doing ok ….. it’s just hard to know where to start sometimes …. some days I think I need to go back to school and start over, but who would pay my bills? lol
Reply
Spunk
September 17, 2017
You write some of the best python tutorials online for specific tasks and include easy to understand introductions. Thank You. I hope in the future you will also keep in mind those using Python 3x, some things like the windows path, and input() have changed and the above code wont work unless the user knows about those changes.
Reply
- Karthikeyan KC
  September 17, 2017
  Thank you Spunk! I will keep that in mind for my upcoming articles. :)
  Reply
phoenixfp
August 30, 2017
Hi
Great tutorial – i was able to extract the history no worries.
I cannot plot the graph though.
I’m wondering if it is because of:
1 – your analysis code looks different to the analyze method in the complete code – is there a reason for this?
2 – perhaps i have not fed (put) the right parameter/arg into my copy of your analze method?
I am realtively new – so any help would be greatly appreciated
regards,
m :)
Reply
- Karthikeyan KC
  August 30, 2017
  Hello! :) This is the part that plots the graph.
```
plt.bar(range(len(results)), results.values(), align='edge')
plt.xticks(rotation=45)
plt.xticks(range(len(results)), results.keys())
plt.show()
```
  Would help if you could give me any info about the errors.
  Reply
Pranjal
July 24, 2017
Hye, I tried following you ! But once I make a connection object aka cursor and try printing my results using fetchall function I am getting an error saying DB is locked. What should I do Now? I am stucked here..
Reply
- Karthikeyan KC
  July 24, 2017
  You need to close the browser before you run the script.
  Reply
  - Pranjal
    August 30, 2017
    I did that too..!!!
    Reply
    - Karthikeyan KC
      August 30, 2017
      You could try copying the database to a new location and then try accessing it from there.
      Reply
RandomeJi
July 7, 2017
is there a way to get the search keyword, page title and URL altogether? By using one SQL statement?
Reply
- Karthikeyan KC
  July 8, 2017
  You can use UNION ALL to do that!
```
SELECT row1 FROM table1
UNION ALL
SELECT row2 FROM table2
UNION ALL
SELECT row5 FROM table3
```
  Reply
tza
January 2, 2017
Hi,
I just tried your code and get the following error:
sqlite3.OperationalError: no such table: urls
Has the name of the table changed?
Reply
- Karthikeyan KC
  January 2, 2017
  No, the table names haven’t changed as of V 55.0.2883.87. You can list all the tables by using the following snippet.
```
#list all tables
list_tables = "SELECT name FROM sqlite_master WHERE type='table';"
cursor.execute(list_tables)
print cursor.fetchall()
```
  The urls table exists.
  Reply

Swoop and Steal Passwords from Chrome and Decrypt with Python - Geekswipe
August 16, 2016
[…] take our previous Python code that we used to analyze our browsing history and tinker it a bit to steal our own passwords from […]