An automated web crawl methodology to analyze the online privacy landscape Public Deposited

http://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/5138jh08q

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • Protecting end-users privacy and building trust are the two most important factors needed to support the growth of ecommerce. The increased dependence on the Internet for a wide variety of daily transactions causes a corresponding loss in privacy for many users, as virtually all websites collect data from users directly or indirectly while performing business with them. In this thesis I have used a web crawler named “iWatch” which serves as an instrument to collect basic statistics on the state of privacy, security, and data-collection practices on the web. I have looked at several interesting practices, and ways of examining the data. This thesis is also meant to serve as a point for reflection and discussion about which practices to observe, and how the raw data from such a system can and should be evolved and made available to a wider audience. The purpose of this thesis is to show web-crawling is a valid approach to mass data collection over the internet with the aim of predicting privacy practices and analyzing how they have evolved in the last three years in terms of geography, legislation, risks, biases and flows. Finally I demonstrate methods to show how to control bias while collecting data, and I propose a probabilistic mathematical model to limit the depth of search to achieve wider breadth for web crawling techniques in the future.
Resource Type
Date Available
Date Copyright
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Non-Academic Affiliation
Keyword
Subject
Rights Statement
Language
File Format
File Extent
  • 709283 bytes
Replaces
Additional Information
  • description.provenance : Made available in DSpace on 2007-11-29T18:07:26Z (GMT). No. of bitstreams: 1 sarkar_dissertation.pdf: 709283 bytes, checksum: 78ba97cb6415eb75b3bccf0ab148e1f2 (MD5)
  • description.provenance : Approved for entry into archive by Linda Kathman(linda.kathman@oregonstate.edu) on 2007-11-29T18:07:26Z (GMT) No. of bitstreams: 1 sarkar_dissertation.pdf: 709283 bytes, checksum: 78ba97cb6415eb75b3bccf0ab148e1f2 (MD5)
  • description.provenance : Approved for entry into archive by Julie Kurtz(julie.kurtz@oregonstate.edu) on 2007-11-28T19:01:21Z (GMT) No. of bitstreams: 1 sarkar_dissertation.pdf: 709283 bytes, checksum: 78ba97cb6415eb75b3bccf0ab148e1f2 (MD5)
  • description.provenance : Submitted by Chandan Sarkar (sarkarc@onid.orst.edu) on 2007-11-27T22:49:22Z No. of bitstreams: 1 sarkar_dissertation.pdf: 709283 bytes, checksum: 78ba97cb6415eb75b3bccf0ab148e1f2 (MD5)

Relationships

In Administrative Set:
Last modified: 08/01/2017

Downloadable Content

Download PDF
Citations:

EndNote | Zotero | Mendeley

Items