Skip to main content
Home
Browse All
Log in
|
Favorites
|
Help
|
English
English
Engish-Pirate
한국어
Search
Advanced Search
Find results with:
error div
Add another field
Search by date
Search by date:
from
after
before
on
from:
to
to:
Searching collections:
CCSU Theses and Dissertations
Add or remove collections
Home
CCSU Theses & Dissertations
Topical Discovery of Web Content
Reference URL
Share
Add tags
Comment
Rate
Save to favorites
Remove from favorites
To link to this object, paste this link in email, IM or document
To embed this object, paste this HTML in website
Topical Discovery of Web Content
View Description
Download
small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution
Print
1803.pdf
Description
Identifier
Thesis
2263
Author
Crocetti, Giancarlo, 1969-,
Title
Topical
Discovery
of
Web
Content
Publisher
Central Connecticut State University
Date of Publication
2012
Resource Type
Master's Thesis
Abstract
The
use
of
analytics
in the
enterprise
in
general
, and in
business
intelligence
(BI)
in
particular
,
is
going
through
a
dramatic
change
in
which
the
use
of
unstructured
data
and
search
technologies
are
playing
a
pivotal
role
.
Researchers
,
business
analysts
,
competitive
intelligence
experts
,
all
are
leveraging
search
technologies
in
order
to
analyze
and
generate
insight
for
specific
and
relevant
questions
.
Unstrructured
data
sources
,
related
to
research
topics
, are
identified
and
indexed
by
search
engines
that will
augment
the
textual
content
with
annotations
and
transform
it
into a
form
suited
for
information
retrieval
.
Even
though
this
process
is
proven
successful
,
it
contains
a
weak
link
represented
by the
manual
selection
of
new
unstructured
data
sources
to be
added
to the
system
over
time
. These
data
sources
represent
a
corpus
of
relevant
documents
related
to a
specific
research
area
and are in the
form
of
web
pages
.
Usually
, the
task
of
identifying
relevant
web
pages
is
outsourced
to
external
companies
that
provide
this
manual
labor-intensive
process
on a
subscription-based
fee
that
can
easily
total
tens
of
thousands
of
dollars
per
year
.
Moreover
,
due
to the
manual
nature
of the
process
, this
task
is
prone
to
many
errors
and
often
results
in
content
of
poor
quality
adding
more
noise
than
knowledge
to the
search
index
. To this
end
, this
thesis
describes
the
theory
and the
implementation
of the
author's
new
software
tool
, the "
Web
Topical
Discovery
System
"
(WTDS)
,
which
provides
an
approach
to the
automatic
discovery
and
selection
of
new
web
pages
relevant
to
specific
analytical
needs
.
We
will
see
how
it
is
possible
to
specify
the
research
context
with
search
keywords
related
to the
area
of
interest
and
consider
the
important
problem
of
removing
extraneous
data
from a
web
page
containing
an
article
in
order
to
reduce
, to a
minimum
,
false
positives
represented
by a
match
on a
keyword
that
is
showing
up
on the
latest
news
box
of the
same
page
. The
removal
of
duplicates
, the
analysis
of
richness
of
information
contained
in the
article
and
lexical
diversity
are
all
taken
into
consideration
in
order
to
provide
the
optimum
set
of
recommendations
to the
end
user
or
system
Notes
"
Submitted
in
Partial
Fulfillment
of the
Requirements
for the
Degree
of
Master
of
Science
in
Data
Mining.
";
Thesis
advisor
:
Roger
Bilisoly.
;
M.S.,Central
Connecticut
State
University,,2012.
;
Includes
bibliographical
references
(pages
79-83)
.
Subject
Business intelligence.
Data mining.
Information retrieval.
Department
Department of Mathematical Sciences
Advisor
Bilisoly, Roger, 1963-
Type
Text
Digital Format
application/pdf
Software
System requirements: PC and World Wide Web browser.
Language
eng
OCLC number
819339218
Rating
Tags
Add tags
for Topical Discovery of Web Content
View as list
|
View as tag cloud
|
report abuse
Comments
Post a Comment
for
Topical Discovery of Web Content
Your rating was saved.
you wish to report:
Your comment:
Your Name:
...
Back to top
Select the collections to add or remove from your search
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
Select All Collections
C
CCSU Student Publications
CCSU Theses and Dissertations
G
GLBTQ Archives
M
Modern Language Oral Histories
O
O'Neill Archives Oral Histories
P
Polish American Pamphlets
Polish Posters
T
Treasures from the Special Collections
V
Veterans History Project
500
You have selected:
1
OK
Cancel