Jul 20, 2012

Issues with writing to the event log from an ASP.net application

During a recent project I faced with a problem :

I had a form whose content needed to be saved to an Oracle DB.
I also had a fall back code which in case of the failure of the former would write the same data to a SQL server DB as a temporary repository of the form data.

For some reasons my code to write to a Oracle DB were not going through and neither was my code to write to a SQL server temporary table.
Automatically I was checking the Event Log and tried to figure out what was wrong.
I had elaborate Event Logging code all over the project to help me understand what is going on under the hood, which will help me debug. But alas, there was no new log created for the web app at all!!!
There were no relevant entries in the Application Log as well.

I tried concentrating on the fix to the original problem, and tried many random SQL Server  and Oracle fixes but I was unable to reach a solution. This went on throughout the day..
After a lot of misses I finally decided to fix the Event Log issue first… (And how prophetic was it!!!)
So I was already aware of the cause, ASP.Net runs on a lower privilege, which prevents it to write to the logs.. I had faced it earlier in a SharePoint project and had added a code to raise the privilege of the particular method which writes to the log.. Should I apply that again? I thought of digging  deeper… After a lot of scouring through the net in MSDN, StackOverflow and multiple other blogs.. I found out that everybody was suggesting a different approach.. Then I came across a MSDN kb article which gave 2 possible fixes, Firstly, to apply some registry hacks which I am sure most IT admins would not allow me to do on a prod server.. and Secondly to add the Event Logs and Event Source Names manually.. There was a third way too, that to raise the privilege of the ASP.net account to a higher one.. or to use a admin account  as the App pool account.. But then  the reason ASP.net was designed to run on a lower privilege was for some reasons.. right? Namely from stopping access to directories outside the current one or to access other resources etc.. That would be defeated.

I liked the second approach and developed a small tool to create the Event Logs and Event Source Names from a Config file.. And lo the event logging was working for my website… So what was happening? It appears that writing a new event is a common activity which a less privileged thread can perform, but Creating Event Logs and Event Source Names is considered to be Admin tasks, which the ASP.net account is not capable of doing.

So after the Event logging issue was sorted out I now concentrated solving the original problem.. The logs clearly showed that the Oracle connection could not be opened due to some tnsnames.ora issue, and the fall back insertion was failing because the SP name which was hardcoded in the app had an extra “_” (underscore) character..
Moral of the story: Event Logging should be set up as early as possible.

Jul 17, 2012

TFS VS 2010 deployment issue: some files were just not getting deployed..

When the daily build ran today morning and after the deployment was successfully completed, I tried testing some code that I had checked in. I got a file not found error for one of the cshtml file that wwas recently added.

We use a Web Deployment Project, so after the daily build the web content gets pushed into the Web server.. Use (Team Build + Web deployment + Web deploy + VS 2010 framework for this.. I used Vishal Joshi’s steps to set it up.. http://vishaljoshi.blogspot.sg/2010/11/team-build-web-deployment-web-deploy-vs.html)

Meanwhile, coming back to the problem, I checked everything.. found that it was present in the TFS, local folder, created another mapped folder and executed “Get Latest version” Command, and found that it was part of the list of files. All other files were getting properly deployed. The problem was with one cshtml file and couple of scripts. All of which were recently introduced.

Then why wasn’t it getting pushed??? After a lot of brain racking and investigation I found the solution:

For some reason the Build Action property of these files were getting set as “None”. After I changed them back to “Content”, they started flowing smoothly into the web server after the build.

How to change?

In Visual Studio, right-click on the file and go to Properties.

Under the file's properties, make sure that Build Action is set to Content. Otherwise it won't be published via web deploy.

image001

Jul 3, 2012

My tryst with the SQL Server 2008 R2 Full Text Search

In a recent project which had a requirement of a search, I had proposed Full Text Search. Even though I had a brief knowledge of how it works through a online tutorial when I was at MS, I never had dirtied my hands in that technology. So here’s how I went about it.
1. Install the Full text Capability.
  • - The Standard installation will not have this feature and you will have to install it over it.  Funnily, it will still show the Full text nodes in the Object explorer, giving a false sense of assurance that you have it installed already. It may even let you create a “Full Text Catalog”.
    • To make sure if you have it installed, run the following query on a new Query Window:
      SELECT FullTextServiceProperty('IsFullTextInstalled')
       
       
  • The URL to the MSDN page for the detailed capability of Full-TextSearch http://msdn.microsoft.com/en-us/library/ms142571.aspx#like
INSTALLATION:
CONFIGURATION:
  • There are two ways of working with Full text search:
    • Search on indexed Columns. Data types supported:
      • char
      • nchar
      • varchar
      • nvarchar
      • text
      • ntext
    • Search based on indexed documents. Extensions Supported:
      • The search uses a concept called iFilter to parse and index the text of the documents.
        By default all MS text type extensions are automatically supported like,
      • .doc
      • .txt
      • docx
      • extensions like pdf etc can be supported after installing the appropriate filters.
Configuring Full text for Database columns:  (excerpted from Pinal Dave’s blog)
Full Text Index helps to perform complex queries against character data. These queries can include word or phrase searching. We can create a full-text index on a table or indexed view in a database. Only one full-text index is allowed per table or indexed view. The index can contain up to 1024 columns. Software developer Monica who helped with screenshots also informed that this feature works with RTM (Ready to Manufacture) version of SQL Server 2008 and does not work on CTP (Community Technology Preview) versions.
    To create an Index, follow the steps:
    1. Create a Full-Text Catalog
    2. Create a Full-Text Index
    3. Populate the Index
    1) Create a Full-Text Catalog



    Full – Text can also be created while creating a Full-Text Index in its Wizard.
    2) Create a Full-Text Index









    3) Populate the Index


    FYI, All the above mentioned steps can also be performed using scripts instead of the wizards.
  • Querying the Full text indexed columns:
  • As the Index Is created and populated, you can write the query and use in searching records on that table which provides better performance.
    For Example,
    We will find the Employee Records who has “Marking “in their Job Title.
    FREETEXT( ) Is predicate used to search columns containing character-based data types. It will not match the exact word, but the meaning of the words in the search condition. When FREETEXT is used, the full-text query engine internally performs the following actions on the freetext_string, assigns each term a weight, and then finds the matches.
    • Separates the string into individual words based on word boundaries (word-breaking).
    • Generates inflectional forms of the words (stemming).
    • Identifies a list of expansions or replacements for the terms based on matches in the thesaurus.
    CONTAINS( ) is similar to the Freetext but with the difference that it takes one keyword to match with the records, and if we want to combine other words as well in the search then we need to provide the “and” or “or” in search else it will throw an error.
    USE AdventureWorks2008GO
    SELECT BusinessEntityID, JobTitle
    FROM HumanResources.Employee
    WHERE FREETEXT(*, 'Marketing Assistant');

    SELECT BusinessEntityID,JobTitle
    FROM HumanResources.Employee
    WHERE CONTAINS(JobTitle, 'Marketing OR Assistant');

    SELECT BusinessEntityID,JobTitle
    FROM HumanResources.Employee
    WHERE CONTAINS(JobTitle, 'Marketing AND Assistant');
    GO


    Besides the CONTAINS and FREETEXT Keywords, there are 2 more keywords which are available. They are CONTAINSTABLE and FREETEXTTABLE.
    The major difference  between these 2 and the previous 2 are that these 2 would dump the content of the search into a table, which can be queried further.
    This is mostly used to sort the fetched results by Ranking. The rank is based on a Relevancy factor associated with each row. SQL Server Free text engine decides on the relevance of the results fetched based on its internal algorithm.
    Definitions:
    CONTAINSTABLE
    http://msdn.microsoft.com/en-us/library/ms189760(v=SQL.90).aspx Returns a table of zero, one, or more rows for those columns containing character-based data types for precise or fuzzy (less precise) matches to single words and phrases, the proximity of words within a certain distance of one another, or weighted matches. CONTAINSTABLE can be referenced in the FROM clause of a SELECT statement as if it were a regular table name.
    Queries using CONTAINSTABLE specify contains-type full-text queries that return a relevance ranking value (RANK) and full-text key (KEY) for each row. The CONTAINSTABLE function uses the same search conditions as the CONTAINS predicate.
    FREETEXTTABLE
    http://msdn.microsoft.com/en-us/library/ms177652(v=SQL.90).aspxReturns a table of zero, one, or more rows for those columns containing character-based data types for values that match the meaning, but not the exact wording, of the text in the specified freetext_string. FREETEXTTABLE can be referenced in the FROM clause of a SELECT statement like a regular table name.
    Queries using FREETEXTTABLE specify freetext-type full-text queries that return a relevance ranking value (RANK) and full-text key (KEY) for each row.
     
    eg.,
    CONTAINSTABLE:
    SELECT FT_TBL.Description, FT_TBL.CategoryName, KEY_TBL.RANK
    FROM Categories AS FT_TBL
    INNER JOIN
    CONTAINSTABLE (Categories, Description, '("sweet and savory" NEAR sauces)
    OR
    ("sweet and savory" NEAR candies)' )
    AS
    KEY_TBL ON FT_TBL.CategoryID = KEY_TBL.[KEY]
    ORDER BY KEY_TBL.RANK DESC

    FREETEXTTABLE:
    SELECT KEY_TBL.RANK, FT_TBL.Description
    FROM Production.ProductDescription AS FT_TBL
    INNER JOIN
    FREETEXTTABLE(Production.ProductDescription, Description, 'perfect all-around bike')
    AS
    KEY_TBL ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
    ORDER BY KEY_TBL.RANK DESC

  • Other important keywords are
    • Near, which does a proximity based search, for the mentioned set of words.
    • FORMSOF : you can choose to have your CONTAINS search expanded to capture different generations of a word by using the FORMSOF term. This term accepts two arguments – INFLECTIONAL or THESAURUS. The INFLECTIONAL argument will expand the search phrase to search on all conjugations and declensions or each word in the search phrase, and the THESAURUS argument will enable a thesaurus expansion on the search phrase.
      • e.g., Select * from TableName where CONTAINS(*,'FORMSOF(INFLECTIONAL,run)')
    • Thesaurus :
      • While querying a Full text indexed column we may also choose to return back synonyms of the searched term by fetching from the thesaurus!
        • FREETEXT and FREETEXTTABLE queries use the thesaurus by default. CONTAINS and CONTAINSTABLE support an optional THESAURUS argument.
        • e.g., Select * from TableName where CONTAINS(*,'FORMSOF(THESAURUS,run)')

    • Types of Search possible through FTS:
    • Improved querying tools
      • Keywords. Document creators (or trained indexers) are asked to supply a list of words that describe the subject of the text, including synonyms of words that describe this subject. Keywords improve recall, particularly if the keyword list includes a search word that is not in the document text.
      • Field-restricted search. Some search engines enable users to limit free text searches to a particular field within a stored data record, such as "Title" or "Author."
      • Boolean queries. Searches that use Boolean operators (for example, "encyclopedia" AND "online" NOT "Encarta") can dramatically increase the precision of a free text search. The AND operator says, in effect, "Do not retrieve any document unless it contains both of these terms." The NOT operator says, in effect, "Do not retrieve any document that contains this word." If the retrieval list retrieves too few documents, the OR operator can be used to increase recall; consider, for example, "encyclopedia" AND "online" OR "Internet" NOT "Encarta". This search will retrieve documents about online encyclopedias that use the term "Internet" instead of "online." This increase in precision is very commonly counter-productive since it usually comes with a dramatic loss of recall.[5]
      • Phrase search. A phrase search matches only those documents that contain a specified phrase, such as "Wikipedia, the free encyclopedia."
      • Concept search. A search that is based on multi-word concepts, for example Compound term processing. This type of search is becoming popular in many e-Discovery solutions.
      • Concordance search. A concordance search produces an alphabetical list of all principal words that occur in a text with their immediate context.
      • Proximity search. A phrase search matches only those documents that contain two or more words that are separated by a specified number of words; a search for "Wikipedia" WITHIN2 "free" would retrieve only those documents in which the words "Wikipedia" and "free" occur within two words of each other.
      • Regular expression. A regular expression employs a complex but powerful querying syntax that can be used to specify retrieval conditions with precision.
      • Fuzzy search will search for document that match the given terms and some variation around them (using for instance edit distance to threshold the multiple variation)
      • Wildcard search. A search that substitutes one or more characters in a search query for a wildcard character such as an asterisk. For example using the asterisk in a search query "s*n" will find "sin", "son", "sun", etc. in a text.

    • The concept of Stopwords/ noisewords:
      • words (SQL Server 2005 calls them noise words, 2008 stopwords). SQL Server has a list of around 150 common words, which are ignored by the Full Text Search engine. e.g. about, after, all, also, an, and, another etc. You can view the English stopwords in SQL Server 2008 like so:
        select * from sys.fulltext_system_stopwords where language_id = 1033
        In my table there is a Question titled “What are building consents?”. If we search for “what are building consents” like so:
        SELECT [Key], [Rank], 1
        FROM CONTAINSTABLE(dbo.Question, QuestionText, '"what*" AND "are*" AND "building*" AND "consents*"', 100)

        because “are” is a stopword (and so is “what”) we won’t get any results!
        There is a workaround – to wipe the list of stop words! The problem with doing this is that the list of stop words is setup server-wide on the SQL Server itself.



    •  Why Full text Search? : Difference between a regular LIKE search and FULL-TEXT search:
      Compiled from other posts and MSDN -
      •  In general, there is a tradeoff between "precision" and and "recall". High precision means that fewer irrelevant results are presented (no false positives), while high recall means that fewer relevant results are missing (no false negatives). Using the LIKE operator gives you 100% precision with no concessions for recall. A full text search facility gives you a lot of flexibility to tune down the precision for better recall.



        Most full text search implementations use an "inverted index". This is an index where the keys are individual terms, and the associated values are sets of records that contain the term. Full text search is optimized to compute the intersection, union, etc. of these record sets, and usually provides a ranking algorithm to quantify how strongly a given record matches search keywords.



        The SQL LIKE operator can be extremely inefficient. If you apply it to an un-indexed column, a full scan will be used to find matches (just like any query on an un-indexed field). If the column is indexed, matching can be performed against index keys, but with far less efficiency than most index lookups. In the worst case, the LIKE pattern will have leading wildcards that require every index key to be examined. In contrast, many information retrieval systems can enable support for leading wildcards by pre-compiling suffix trees in selected fields.



        Other features typical of full-text search are



        •lexical analysis or tokenization—breaking a block of unstructured text into individual words, phrases, and special tokens

        •morphological analysis, or stemming—collapsing variations of a given word into one index term; for example, treating "mice" and "mouse", or "electrification" and "electric" as the same word

        •ranking—measuring the similarity of a matching record to the query string

        -----------------------------------------------------------------------------------
        FTS involves indexing the individual words within a text field in order to make searching through many records quick. Using LIKE still requires you to do a string search (linear or the like) within the field.


        -------------------------------------------------------------------------------------
        Full Text Searching (using the CONTAINS) will be faster/more efficient than using LIKE with wildcarding. Full Text Searching (FTS) includes the ability to define Full Text Indexes, which FTS can use. Dunno why you wouldn't define a FTS index if you intended to use the functionality...




        LIKE with wildcarding on the left side (IE: LIKE '%Search') can not use an index (assuming one exists for the column), guaranteeing a table scan. I haven't tested & compared, but regex has the same pitfall. To clarify, LIKE

        -------------------------------------------------------------------------------------
        FTS is more efficient, powerful (especially for Word Breakers and stemming functionalities) ... but check your requirements because sometimes DBs don't support all languages for example MSSQL doesn't support Greek (check on this page http://msdn.microsoft.com/en-us/library/ms176076(v=sql.110).aspx )





    Continued in Part 2: File based Full text Search